linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
@ 2018-08-17 19:44 Mikulas Patocka
  2018-08-21 10:44 ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Mikulas Patocka @ 2018-08-17 19:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon; +Cc: linux-arm-kernel, linux-mm

Hi

I report this crash on ARM64 on the kernel 4.17.11. The reason is that the 
function move_freepages_block accesses contiguous runs of 
pageblock_nr_pages. The ARM64 firmware sets holes of reserved memory there 
and when move_freepages_block stumbles over this hole, it accesses 
uninitialized page structures and crashes.

00000000-03ffffff : System RAM
  00080000-007bffff : Kernel code
  00820000-00aa3fff : Kernel data
04200000-bf80ffff : System RAM
bf810000-bfbeffff : reserved
bfbf0000-bfc8ffff : System RAM
bfc90000-bffdffff : reserved
bffe0000-bfffffff : System RAM
c0000000-dfffffff : MEM
  c0000000-c00fffff : PCI Bus 0000:01
    c0000000-c0003fff : 0000:01:00.0
      c0000000-c0003fff : nvme

The bug was already reported here for x86:
https://bugzilla.redhat.com/show_bug.cgi?id=1598462

For x86, it was fixed in the kernel 4.17.7 - but I observed it in the 
kernel 4.17.11 on ARM64. I also observed it on 4.18-rc kernels running in 
KVM virtual machine on ARM when I compiled the guest kernel with 64kB page 
size.


Unable to handle kernel paging request at virtual address fffffffffffffffe
Mem abort info:
  ESR = 0x96000005
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000005
  CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp = 00000000791c2068
[fffffffffffffffe] pgd=0000000000000000, pud=0000000000000000
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in: ftdi_sio usbserial fuse vhost_net vhost tun bridge stp llc autofs4 udlfb syscopyarea sysfillrect sysimgblt fb_sys_fops fb font binfmt_misc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat iptable_nat nf_nat_ipv4 iptable_mangle xt_TCPMSS nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_conntrack xt_multiport iptable_filter ip_tables x_tables pppoe pppox af_packet ppp_generic slhc nls_utf8 nls_cp852 vfat fat hid_generic usbhid hid snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_pcm snd_timer snd soundcore nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack sd_mod ipv6 aes_ce_blk crypto_simd cryptd aes_ce_cipher crc32_ce ghash_ce gf128mul aes_arm64 sha2_ce sha256_arm64
 sha1_ce sha1_generic efivars xhci_plat_hcd xhci_hcd ahci_platform libahci_platform libahci libata usbcore usb_common mvpp2 unix
CPU: 3 PID: 14823 Comm: updatedb.mlocat Not tainted 4.17.11 #16
Hardware name: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin, BIOS EDK II Jul 30 2018
pstate: 00000085 (nzcv daIf -PAN -UAO)
pc : move_freepages_block+0xb4/0x160
lr : steal_suitable_fallback+0xe4/0x188
sp : ffffffc0add9f570
x29: ffffffc0add9f570 x28: 0000000000000000 
x27: ffffffffffffff60 x26: ffffff800886ef58 
x25: 0000000000000008 x24: 0000000000000003 
x23: 0000000000000020 x22: 0000000000000000 
x21: 0000000000000003 x20: ffffff800886ed80 
x19: 0000000000000002 x18: ffffffbf02fefe00 
x17: 0000007fa0916380 x16: ffffff80081dc528 
x15: 0000000000000001 x14: 0000000000000020 
x13: 0000000000000068 x12: 0000000000000080 
x11: ffffffbf02fe8020 x10: ffffff800886ef38 
x9 : 0000000000000000 x8 : 0000000000000000 
x7 : 0000000000100000 x6 : ffffffffffffffff 
x5 : fffffffffffffffe x4 : ffffffbf02feffc0 
x3 : ffffffc0add9f5ac x2 : 00000000000000a0 
x1 : ffffffbf02fe8000 x0 : ffffff800886ed80 
Process updatedb.mlocat (pid: 14823, stack limit = 0x000000005d2941e3)
Call trace:
 move_freepages_block+0xb4/0x160
 get_page_from_freelist+0xad8/0xea8
 __alloc_pages_nodemask+0xac/0x970
 new_slab+0xc0/0x348
 ___slab_alloc.constprop.32+0x2cc/0x350
 __slab_alloc.isra.26.constprop.31+0x24/0x38
 kmem_cache_alloc+0x168/0x198
 spadfs_alloc_inode+0x2c/0x88
 alloc_inode+0x20/0xa0
 iget5_locked+0xf8/0x1c0
 spadfs_iget+0x44/0x4c8
 spadfs_lookup+0x70/0x108
 __lookup_slow+0x78/0x140
 lookup_slow+0x3c/0x60
 walk_component+0x1e4/0x2e0
 path_lookupat.isra.11+0x64/0x1e8
 filename_lookup.part.20+0x6c/0xe8
 user_path_at_empty+0x4c/0x60
 vfs_statx+0x78/0xd8
 sys_newfstatat+0x24/0x48
 el0_svc_naked+0x30/0x34
Code: f9401026 d10004c5 f24000df 9a8110a5 (f94000a5) 
---[ end trace def2ceafdfecd702 ]---
note: updatedb.mlocat[14823] exited with preempt_count 1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-17 19:44 A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory Mikulas Patocka
@ 2018-08-21 10:44 ` Michal Hocko
  2018-08-21 12:58   ` James Morse
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2018-08-21 10:44 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-mm, Pavel Tatashin

[Cc Pavel in case he has some ideas]

On Fri 17-08-18 15:44:27, Mikulas Patocka wrote:
> Hi
> 
> I report this crash on ARM64 on the kernel 4.17.11. The reason is that the 
> function move_freepages_block accesses contiguous runs of 
> pageblock_nr_pages. The ARM64 firmware sets holes of reserved memory there 
> and when move_freepages_block stumbles over this hole, it accesses 
> uninitialized page structures and crashes.
> 
> 00000000-03ffffff : System RAM
>   00080000-007bffff : Kernel code
>   00820000-00aa3fff : Kernel data
> 04200000-bf80ffff : System RAM
> bf810000-bfbeffff : reserved
> bfbf0000-bfc8ffff : System RAM
> bfc90000-bffdffff : reserved
> bffe0000-bfffffff : System RAM
> c0000000-dfffffff : MEM
>   c0000000-c00fffff : PCI Bus 0000:01
>     c0000000-c0003fff : 0000:01:00.0
>       c0000000-c0003fff : nvme
> 
> The bug was already reported here for x86:
> https://bugzilla.redhat.com/show_bug.cgi?id=1598462
> 
> For x86, it was fixed in the kernel 4.17.7 - but I observed it in the 
> kernel 4.17.11 on ARM64. I also observed it on 4.18-rc kernels running in 
> KVM virtual machine on ARM when I compiled the guest kernel with 64kB page 
> size.
> 
> 
> Unable to handle kernel paging request at virtual address fffffffffffffffe
> Mem abort info:
>   ESR = 0x96000005
>   Exception class = DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
> Data abort info:
>   ISV = 0, ISS = 0x00000005
>   CM = 0, WnR = 0
> swapper pgtable: 4k pages, 39-bit VAs, pgdp = 00000000791c2068
> [fffffffffffffffe] pgd=0000000000000000, pud=0000000000000000
> Internal error: Oops: 96000005 [#1] PREEMPT SMP
> Modules linked in: ftdi_sio usbserial fuse vhost_net vhost tun bridge stp llc autofs4 udlfb syscopyarea sysfillrect sysimgblt fb_sys_fops fb font binfmt_misc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat iptable_nat nf_nat_ipv4 iptable_mangle xt_TCPMSS nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_conntrack xt_multiport iptable_filter ip_tables x_tables pppoe pppox af_packet ppp_generic slhc nls_utf8 nls_cp852 vfat fat hid_generic usbhid hid snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_pcm snd_timer snd soundcore nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack sd_mod ipv6 aes_ce_blk crypto_simd cryptd aes_ce_cipher crc32_ce ghash_ce gf128mul aes_arm64 sha2_ce sha256_arm64
>  sha1_ce sha1_generic efivars xhci_plat_hcd xhci_hcd ahci_platform libahci_platform libahci libata usbcore usb_common mvpp2 unix
> CPU: 3 PID: 14823 Comm: updatedb.mlocat Not tainted 4.17.11 #16
> Hardware name: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin, BIOS EDK II Jul 30 2018
> pstate: 00000085 (nzcv daIf -PAN -UAO)
> pc : move_freepages_block+0xb4/0x160
> lr : steal_suitable_fallback+0xe4/0x188
> sp : ffffffc0add9f570
> x29: ffffffc0add9f570 x28: 0000000000000000 
> x27: ffffffffffffff60 x26: ffffff800886ef58 
> x25: 0000000000000008 x24: 0000000000000003 
> x23: 0000000000000020 x22: 0000000000000000 
> x21: 0000000000000003 x20: ffffff800886ed80 
> x19: 0000000000000002 x18: ffffffbf02fefe00 
> x17: 0000007fa0916380 x16: ffffff80081dc528 
> x15: 0000000000000001 x14: 0000000000000020 
> x13: 0000000000000068 x12: 0000000000000080 
> x11: ffffffbf02fe8020 x10: ffffff800886ef38 
> x9 : 0000000000000000 x8 : 0000000000000000 
> x7 : 0000000000100000 x6 : ffffffffffffffff 
> x5 : fffffffffffffffe x4 : ffffffbf02feffc0 
> x3 : ffffffc0add9f5ac x2 : 00000000000000a0 
> x1 : ffffffbf02fe8000 x0 : ffffff800886ed80 
> Process updatedb.mlocat (pid: 14823, stack limit = 0x000000005d2941e3)
> Call trace:
>  move_freepages_block+0xb4/0x160
>  get_page_from_freelist+0xad8/0xea8
>  __alloc_pages_nodemask+0xac/0x970
>  new_slab+0xc0/0x348
>  ___slab_alloc.constprop.32+0x2cc/0x350
>  __slab_alloc.isra.26.constprop.31+0x24/0x38
>  kmem_cache_alloc+0x168/0x198
>  spadfs_alloc_inode+0x2c/0x88
>  alloc_inode+0x20/0xa0
>  iget5_locked+0xf8/0x1c0
>  spadfs_iget+0x44/0x4c8
>  spadfs_lookup+0x70/0x108
>  __lookup_slow+0x78/0x140
>  lookup_slow+0x3c/0x60
>  walk_component+0x1e4/0x2e0
>  path_lookupat.isra.11+0x64/0x1e8
>  filename_lookup.part.20+0x6c/0xe8
>  user_path_at_empty+0x4c/0x60
>  vfs_statx+0x78/0xd8
>  sys_newfstatat+0x24/0x48
>  el0_svc_naked+0x30/0x34
> Code: f9401026 d10004c5 f24000df 9a8110a5 (f94000a5) 
> ---[ end trace def2ceafdfecd702 ]---
> note: updatedb.mlocat[14823] exited with preempt_count 1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-21 10:44 ` Michal Hocko
@ 2018-08-21 12:58   ` James Morse
  2018-08-23 11:02     ` Mikulas Patocka
  0 siblings, 1 reply; 18+ messages in thread
From: James Morse @ 2018-08-21 12:58 UTC (permalink / raw)
  To: Michal Hocko, Mikulas Patocka
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-mm, Pavel Tatashin

Hi guys,

On 08/21/2018 11:44 AM, Michal Hocko wrote:
> On Fri 17-08-18 15:44:27, Mikulas Patocka wrote:
>> I report this crash on ARM64 on the kernel 4.17.11. The reason is that the
>> function move_freepages_block accesses contiguous runs of
>> pageblock_nr_pages. The ARM64 firmware sets holes of reserved memory there
>> and when move_freepages_block stumbles over this hole, it accesses
>> uninitialized page structures and crashes.

Any idea if this is nomap (so a hole in the linear map), or a missing struct page?


>> 00000000-03ffffff : System RAM
>>    00080000-007bffff : Kernel code
>>    00820000-00aa3fff : Kernel data
>> 04200000-bf80ffff : System RAM
>> bf810000-bfbeffff : reserved
>> bfbf0000-bfc8ffff : System RAM
>> bfc90000-bffdffff : reserved
>> bffe0000-bfffffff : System RAM
>> c0000000-dfffffff : MEM
>>    c0000000-c00fffff : PCI Bus 0000:01
>>      c0000000-c0003fff : 0000:01:00.0
>>        c0000000-c0003fff : nvme
To test Laura's bounds-of-zone theory [0], could you put some empty space between the 
nvme and the System RAM? (It sounds like this is a KVM guest). Reducing the amount of 
memory is probably easiest.


>> The bug was already reported here for x86:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1598462
>>
>> For x86, it was fixed in the kernel 4.17.7 - but I observed it in the
>> kernel 4.17.11 on ARM64. I also observed it on 4.18-rc kernels running in
>> KVM virtual machine on ARM when I compiled the guest kernel with 64kB page
>> size.

I'm not sure this is the same bug.

[1] reports hitting a VM_BUG, this is a dereference of -ENOENT:
>> Unable to handle kernel paging request at virtual address fffffffffffffffe

Does your kernel have HOLES_IN_ZONE enabled? (It looks like it depends on NUMA)
Could you reproduce this with CONIG_DEBUG_VM enabled?

move_freepages() uses pfn_valid_within(), so it should handle missing struct pages in 
this range.


>> CPU: 3 PID: 14823 Comm: updatedb.mlocat Not tainted 4.17.11 #16
>> Hardware name: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin, BIOS EDK II Jul 30 2018
>> pstate: 00000085 (nzcv daIf -PAN -UAO)
>> pc : move_freepages_block+0xb4/0x160
>> lr : steal_suitable_fallback+0xe4/0x188

Any chance you could addr2line these?


>> Call trace:
>>   move_freepages_block+0xb4/0x160
>>   get_page_from_freelist+0xad8/0xea8
>>   __alloc_pages_nodemask+0xac/0x970
>>   new_slab+0xc0/0x348
>>   ___slab_alloc.constprop.32+0x2cc/0x350
>>   __slab_alloc.isra.26.constprop.31+0x24/0x38
>>   kmem_cache_alloc+0x168/0x198
>>   spadfs_alloc_inode+0x2c/0x88
>>   alloc_inode+0x20/0xa0
>>   iget5_locked+0xf8/0x1c0

>>   spadfs_iget+0x44/0x4c8
>>   spadfs_lookup+0x70/0x108

Hmmm. What's this?


Thanks,

James


[0] https://www.spinics.net/lists/linux-mm/msg157223.html
[1] https://www.spinics.net/lists/linux-mm/msg156764.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-21 12:58   ` James Morse
@ 2018-08-23 11:02     ` Mikulas Patocka
  2018-08-23 11:10       ` Michal Hocko
  2018-08-23 14:06       ` James Morse
  0 siblings, 2 replies; 18+ messages in thread
From: Mikulas Patocka @ 2018-08-23 11:02 UTC (permalink / raw)
  To: James Morse
  Cc: Michal Hocko, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-mm, Pavel Tatashin



On Tue, 21 Aug 2018, James Morse wrote:

> Hi guys,
> 
> On 08/21/2018 11:44 AM, Michal Hocko wrote:
> > On Fri 17-08-18 15:44:27, Mikulas Patocka wrote:
> > > I report this crash on ARM64 on the kernel 4.17.11. The reason is that the
> > > function move_freepages_block accesses contiguous runs of
> > > pageblock_nr_pages. The ARM64 firmware sets holes of reserved memory there
> > > and when move_freepages_block stumbles over this hole, it accesses
> > > uninitialized page structures and crashes.
> 
> Any idea if this is nomap (so a hole in the linear map), or a missing struct
> page?

The page for this hole seems to be filled with 0xff.

> > > 00000000-03ffffff : System RAM
> > >    00080000-007bffff : Kernel code
> > >    00820000-00aa3fff : Kernel data
> > > 04200000-bf80ffff : System RAM
> > > bf810000-bfbeffff : reserved
> > > bfbf0000-bfc8ffff : System RAM
> > > bfc90000-bffdffff : reserved
> > > bffe0000-bfffffff : System RAM
> > > c0000000-dfffffff : MEM
> > >    c0000000-c00fffff : PCI Bus 0000:01
> > >      c0000000-c0003fff : 0000:01:00.0
> > >        c0000000-c0003fff : nvme
> To test Laura's bounds-of-zone theory [0], could you put some empty space
> between the nvme and the System RAM? (It sounds like this is a KVM guest).
> Reducing the amount of memory is probably easiest.

This is not KVM - it is real hardware with real PCIe nvme device. I don't 
have smaller memory stick.

The board can use u-boot firmware or EFI firmware. The u-boot firmware 
doesn't put a hole in the memory map and the board has been running with 
it for several months without a problem.

The EFI firmware puts a hole below 0xc0000000 and I got a crash after two 
weeks of uptime.

> > > The bug was already reported here for x86:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1598462
> > > 
> > > For x86, it was fixed in the kernel 4.17.7 - but I observed it in the
> > > kernel 4.17.11 on ARM64. I also observed it on 4.18-rc kernels running in
> > > KVM virtual machine on ARM when I compiled the guest kernel with 64kB page
> > > size.
> 
> I'm not sure this is the same bug.
> 
> [1] reports hitting a VM_BUG, this is a dereference of -ENOENT:

This crash is not from -ENOENT. It crashes because page->compound_head is 
0xffffffffffffffff (see below).

If I enable CONFIG_DEBUG_VM, I also get VM_BUG.

> > > Unable to handle kernel paging request at virtual address fffffffffffffffe
> 
> Does your kernel have HOLES_IN_ZONE enabled? (It looks like it depends on
> NUMA)

No.

> Could you reproduce this with CONIG_DEBUG_VM enabled?

I reproduced it in KVM with 64k pages and I enabled CONIG_DEBUG_VM, see 
below. (the bug could be triggerd more quickly in KVM).

> move_freepages() uses pfn_valid_within(), so it should handle missing struct
> pages in this range.
> 
> 
> > > CPU: 3 PID: 14823 Comm: updatedb.mlocat Not tainted 4.17.11 #16
> > > Hardware name: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin,
> > > BIOS EDK II Jul 30 2018
> > > pstate: 00000085 (nzcv daIf -PAN -UAO)
> > > pc : move_freepages_block+0xb4/0x160
> > > lr : steal_suitable_fallback+0xe4/0x188
> 
> Any chance you could addr2line these?

I analyzed the assembler:
PageBuddy in move_freepages returns false
Then we call PageLRU, the macro calls PF_HEAD which is compound_page()
compound_page reads page->compound_head, it is 0xffffffffffffffff, so it 
resturns 0xfffffffffffffffe - and accessing this address causes crash

> > > Call trace:
> > >   move_freepages_block+0xb4/0x160
> > >   get_page_from_freelist+0xad8/0xea8
> > >   __alloc_pages_nodemask+0xac/0x970
> > >   new_slab+0xc0/0x348
> > >   ___slab_alloc.constprop.32+0x2cc/0x350
> > >   __slab_alloc.isra.26.constprop.31+0x24/0x38
> > >   kmem_cache_alloc+0x168/0x198
> > >   spadfs_alloc_inode+0x2c/0x88
> > >   alloc_inode+0x20/0xa0
> > >   iget5_locked+0xf8/0x1c0
> 
> > >   spadfs_iget+0x44/0x4c8
> > >   spadfs_lookup+0x70/0x108
> 
> Hmmm. What's this?

http://artax.karlin.mff.cuni.cz/~mikulas/spadfs/download/

> Thanks,
> 
> James
> 
> 
> [0] https://www.spinics.net/lists/linux-mm/msg157223.html
> [1] https://www.spinics.net/lists/linux-mm/msg156764.html

The same crash in KVM. The guest kernel has 64k pages. I enabled 
CONFIG_DEBUG_VM:

[ 1493.526129] page:fffffdff802e1780 is uninitialized and poisoned
[ 1493.526136] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[ 1493.528030] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[ 1493.529320] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
[ 1493.530441] ------------[ cut here ]------------
[ 1493.531301] kernel BUG at include/linux/mm.h:978!
[ 1493.532176] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 1493.533196] Modules linked in: raid0 raid10 dm_delay xfs reiserfs loop dm_crypt dm_zero dm_integrity raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx md_mod dm_thin_pool dm_cache_smq dm_cache dm_persistent_data dm_bio_prison libcrc32c dm_mirror dm_region_hash dm_log dm_snapshot dm_bufio dm_mod ipv6 autofs4 binfmt_misc nls_utf8 nls_cp852 vfat fat af_packet aes_ce_blk crypto_simd cryptd aes_ce_cipher crc32_ce crct10dif_ce ghash_ce gf128mul aes_arm64 sha2_ce sha256_arm64 sha1_ce sha1_generic efivars virtio_net virtio_rng net_failover rng_core failover virtio_console ext4 crc32c_generic crc16 mbcache jbd2 virtio_scsi sd_mod scsi_mod virtio_blk virtio_mmio virtio_pci virtio_ring virtio [last unloaded: brd]
[ 1493.545466] CPU: 1 PID: 25236 Comm: dd Not tainted 4.18.0 #7
[ 1493.546540] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 1493.547833] pstate: 40000085 (nZcv daIf -PAN -UAO)
[ 1493.548749] pc : move_freepages_block+0x144/0x248
[ 1493.549647] lr : move_freepages_block+0x144/0x248
[ 1493.550539] sp : fffffe0071177680
[ 1493.551176] x29: fffffe0071177680 x28: fffffc000861f3f8
[ 1493.552184] x27: 0000000000000048 x26: fffffc0008492000
[ 1493.553197] x25: fffffe007117771c x24: 000000000007ffc0
[ 1493.554203] x23: fffffc000861ef80 x22: fffffdff802fffc0
[ 1493.555209] x21: 0000000000000020 x20: fffffdff80280000
[ 1493.556220] x19: fffffdff802e1780 x18: 0000000000000000
[ 1493.557227] x17: 000003ff88424b08 x16: fffffc0008182c9c
[ 1493.558232] x15: 000000000000000a x14: 0720072007200720
[ 1493.559239] x13: 0720072007200720 x12: 0720072007200720
[ 1493.560249] x11: 0720072907290770 x10: 072807640765076e
[ 1493.561256] x9 : 076f07730769076f x8 : 0000000000000000
[ 1493.562261] x7 : 0750072807450747 x6 : 0000000000000007
[ 1493.563270] x5 : fffffe00bff30750 x4 : 0000000000000001
[ 1493.564276] x3 : 0000000000000007 x2 : 0000000000000007
[ 1493.565283] x1 : fffffe006260cd00 x0 : 0000000000000034
[ 1493.566297] Process dd (pid: 25236, stack limit = 0x0000000094cc07fb)
[ 1493.567506] Call trace:
[ 1493.567985]  move_freepages_block+0x144/0x248
[ 1493.568812]  steal_suitable_fallback+0x100/0x16c
[ 1493.569694]  get_page_from_freelist+0x440/0xb20
[ 1493.570554]  __alloc_pages_nodemask+0xe8/0x838
[ 1493.571401]  new_slab+0xd4/0x418
[ 1493.572022]  ___slab_alloc.constprop.27+0x380/0x4a8
[ 1493.572952]  __slab_alloc.isra.21.constprop.26+0x24/0x34
[ 1493.573955]  kmem_cache_alloc+0xa8/0x180
[ 1493.574704]  alloc_buffer_head+0x1c/0x90
[ 1493.575452]  alloc_page_buffers+0x68/0xb0
[ 1493.576222]  create_empty_buffers+0x20/0x1ec
[ 1493.577033]  create_page_buffers+0xb0/0xf0
[ 1493.577815]  __block_write_begin_int+0xc4/0x564
[ 1493.578676]  __block_write_begin+0x10/0x18
[ 1493.579457]  block_write_begin+0x48/0xd0
[ 1493.580212]  blkdev_write_begin+0x28/0x30
[ 1493.580977]  generic_perform_write+0x98/0x16c
[ 1493.581807]  __generic_file_write_iter+0x138/0x168
[ 1493.582715]  blkdev_write_iter+0x80/0xf0
[ 1493.583470]  __vfs_write+0xe4/0x10c
[ 1493.584138]  vfs_write+0xb4/0x168
[ 1493.584775]  ksys_write+0x44/0x88
[ 1493.585412]  sys_write+0xc/0x14
[ 1493.586018]  el0_svc_naked+0x30/0x34
[ 1493.586708] Code: aa1303e0 90001a01 91296421 94008902 (d4210000)
[ 1493.587857] ---[ end trace 1601ba47f6e883fe ]---
[ 1493.588780] note: dd[25236] exited with preempt_count 1

memory map for the KVM guest:

09000000-09000fff : pl011@9000000
  09000000-09000fff : pl011@9000000
09030000-09030fff : pl061@9030000
10000000-3efeffff : pcie@10000000
  10000000-101fffff : PCI Bus 0000:01
    10000000-1003ffff : 0000:01:00.0
    10040000-10040fff : 0000:01:00.0
  10200000-103fffff : PCI Bus 0000:02
  10400000-105fffff : PCI Bus 0000:03
    10400000-10400fff : 0000:03:00.0
  10600000-107fffff : PCI Bus 0000:04
  10800000-109fffff : PCI Bus 0000:05
    10800000-10800fff : 0000:05:00.0
3f000000-3fffffff : PCI ECAM
40000000-f85dffff : System RAM
  40080000-4057ffff : Kernel code
  405d0000-408effff : Kernel data
f85e0000-f86bffff : reserved
f86c0000-f86dffff : System RAM
f86e0000-f874ffff : reserved
f8750000-fbc1ffff : System RAM
fbc20000-fbffffff : reserved
fc000000-ffffffff : System RAM
8000000000-ffffffffff : pcie@10000000
  8000000000-80001fffff : PCI Bus 0000:01
    8000000000-8000003fff : 0000:01:00.0
      8000000000-8000003fff : virtio-pci-modern
  8000200000-80003fffff : PCI Bus 0000:02
  8000400000-80005fffff : PCI Bus 0000:03
    8000400000-8000403fff : 0000:03:00.0
      8000400000-8000403fff : virtio-pci-modern
  8000600000-80007fffff : PCI Bus 0000:04
    8000600000-8000603fff : 0000:04:00.0
      8000600000-8000603fff : virtio-pci-modern
  8000800000-80009fffff : PCI Bus 0000:05
    8000800000-8000803fff : 0000:05:00.0
      8000800000-8000803fff : virtio-pci-modern

Mikulas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-23 11:02     ` Mikulas Patocka
@ 2018-08-23 11:10       ` Michal Hocko
  2018-08-23 11:16         ` Mikulas Patocka
  2018-08-23 14:06       ` James Morse
  1 sibling, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2018-08-23 11:10 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: James Morse, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-mm, Pavel Tatashin

On Thu 23-08-18 07:02:37, Mikulas Patocka wrote:
[...]
> This crash is not from -ENOENT. It crashes because page->compound_head is 
> 0xffffffffffffffff (see below).
> 
> If I enable CONFIG_DEBUG_VM, I also get VM_BUG.

This smells like the struct page is not initialized properly. How is
this memory range added? I mean is it brought up by the memory hotplug
or during the boot?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-23 11:10       ` Michal Hocko
@ 2018-08-23 11:16         ` Mikulas Patocka
  2018-08-23 11:23           ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Mikulas Patocka @ 2018-08-23 11:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: James Morse, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-mm, Pavel Tatashin



On Thu, 23 Aug 2018, Michal Hocko wrote:

> On Thu 23-08-18 07:02:37, Mikulas Patocka wrote:
> [...]
> > This crash is not from -ENOENT. It crashes because page->compound_head is 
> > 0xffffffffffffffff (see below).
> > 
> > If I enable CONFIG_DEBUG_VM, I also get VM_BUG.
> 
> This smells like the struct page is not initialized properly. How is
> this memory range added? I mean is it brought up by the memory hotplug
> or during the boot?
> -- 
> Michal Hocko
> SUSE Labs

During the boot. There's not hotplug.

Mikulas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-23 11:16         ` Mikulas Patocka
@ 2018-08-23 11:23           ` Michal Hocko
  2018-08-23 13:13             ` Pasha Tatashin
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2018-08-23 11:23 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: James Morse, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-mm, Pavel Tatashin

On Thu 23-08-18 07:16:34, Mikulas Patocka wrote:
> 
> 
> On Thu, 23 Aug 2018, Michal Hocko wrote:
> 
> > On Thu 23-08-18 07:02:37, Mikulas Patocka wrote:
> > [...]
> > > This crash is not from -ENOENT. It crashes because page->compound_head is 
> > > 0xffffffffffffffff (see below).
> > > 
> > > If I enable CONFIG_DEBUG_VM, I also get VM_BUG.
> > 
> > This smells like the struct page is not initialized properly. How is
> > this memory range added? I mean is it brought up by the memory hotplug
> > or during the boot?
> 
> During the boot. There's not hotplug.

Do you have any trail where the memory range is registered from?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-23 11:23           ` Michal Hocko
@ 2018-08-23 13:13             ` Pasha Tatashin
  2018-08-23 13:14               ` Pasha Tatashin
  2018-08-23 14:34               ` Mikulas Patocka
  0 siblings, 2 replies; 18+ messages in thread
From: Pasha Tatashin @ 2018-08-23 13:13 UTC (permalink / raw)
  To: Michal Hocko, Mikulas Patocka
  Cc: James Morse, Catalin Marinas, Will Deacon, linux-arm-kernel, linux-mm

On 8/23/18 7:23 AM, Michal Hocko wrote:
> On Thu 23-08-18 07:16:34, Mikulas Patocka wrote:
>>
>>
>> On Thu, 23 Aug 2018, Michal Hocko wrote:
>>
>>> On Thu 23-08-18 07:02:37, Mikulas Patocka wrote:
>>> [...]
>>>> This crash is not from -ENOENT. It crashes because page->compound_head is 
>>>> 0xffffffffffffffff (see below).
>>>>
>>>> If I enable CONFIG_DEBUG_VM, I also get VM_BUG.
>>>
>>> This smells like the struct page is not initialized properly. How is
>>> this memory range added? I mean is it brought up by the memory hotplug
>>> or during the boot?

I believe it is due to uninitialized struct pages. Mikulas, could you
please provide config file, and also the full console output.

Please make sure that you have:
CONFIG_DEBUG_VM=y
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y

I wonder what kind of struct page memory layout is used, and also if
deferred struct pages are enabled or not.

Have you tried bisecting the problem?

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-23 13:13             ` Pasha Tatashin
@ 2018-08-23 13:14               ` Pasha Tatashin
  2018-08-23 14:34               ` Mikulas Patocka
  1 sibling, 0 replies; 18+ messages in thread
From: Pasha Tatashin @ 2018-08-23 13:14 UTC (permalink / raw)
  To: Michal Hocko, Mikulas Patocka
  Cc: James Morse, Catalin Marinas, Will Deacon, linux-arm-kernel, linux-mm



On 8/23/18 9:13 AM, Pavel Tatashin wrote:
> On 8/23/18 7:23 AM, Michal Hocko wrote:
>> On Thu 23-08-18 07:16:34, Mikulas Patocka wrote:
>>>
>>>
>>> On Thu, 23 Aug 2018, Michal Hocko wrote:
>>>
>>>> On Thu 23-08-18 07:02:37, Mikulas Patocka wrote:
>>>> [...]
>>>>> This crash is not from -ENOENT. It crashes because page->compound_head is 
>>>>> 0xffffffffffffffff (see below).
>>>>>
>>>>> If I enable CONFIG_DEBUG_VM, I also get VM_BUG.
>>>>
>>>> This smells like the struct page is not initialized properly. How is
>>>> this memory range added? I mean is it brought up by the memory hotplug
>>>> or during the boot?
> 
> I believe it is due to uninitialized struct pages. Mikulas, could you
> please provide config file, and also the full console output.
> 
> Please make sure that you have:
> CONFIG_DEBUG_VM=y
> CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y

I meant:
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_PGFLAGS=y

> 
> I wonder what kind of struct page memory layout is used, and also if
> deferred struct pages are enabled or not.
> 
> Have you tried bisecting the problem?
> 
> Thank you,
> Pavel
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-23 11:02     ` Mikulas Patocka
  2018-08-23 11:10       ` Michal Hocko
@ 2018-08-23 14:06       ` James Morse
  2018-08-24 11:41         ` Michal Hocko
  1 sibling, 1 reply; 18+ messages in thread
From: James Morse @ 2018-08-23 14:06 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Michal Hocko, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-mm, Pavel Tatashin

Hi Mikulas,

On 23/08/18 12:02, Mikulas Patocka wrote:
> On Tue, 21 Aug 2018, James Morse wrote:
>> On 08/21/2018 11:44 AM, Michal Hocko wrote:
>>> On Fri 17-08-18 15:44:27, Mikulas Patocka wrote:
>>>> I report this crash on ARM64 on the kernel 4.17.11. The reason is that the
>>>> function move_freepages_block accesses contiguous runs of
>>>> pageblock_nr_pages. The ARM64 firmware sets holes of reserved memory there
>>>> and when move_freepages_block stumbles over this hole, it accesses
>>>> uninitialized page structures and crashes.
>>
>> Any idea if this is nomap (so a hole in the linear map), or a missing struct
>> page?
> 
> The page for this hole seems to be filled with 0xff.

This sounds like a memblock:nomap region, it has a struct page, but it hasn't
been initialized.

deferred_init_memmap() won't initialise struct pages for memblock:nomap pages as
its for_each_free_mem_range() loops use MEMBLOCK_NONE as the required flags.

pfn_valid() will return false for these nomap pages, so the struct page should
never be accessed.


For the fault you're seeing, move_freepages() is using pfn_valid_within(), but
this is optimised out as you don't have HOLES_IN_ZONE.

This looks like a disconnect between nomap, ARCH_HAS_HOLES_MEMORYMODEL and
HOLES_IN_ZONE.

Arm64 only enables HOLES_IN_ZONE for NUMA systems:
6d526ee26ccd ("arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA")

It doesn't look like you can't disable ARCH_HAS_HOLES_MEMORYMODEL or SPARSEMEM
for arm64.


My best-guess is that pfn_valid_within() shouldn't be optimised out if
ARCH_HAS_HOLES_MEMORYMODEL, even if HOLES_IN_ZONE isn't set.

Does something like this solve the problem?:
============================%<============================
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2dc52a..5e27095a15f4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1295,7 +1295,7 @@ void memory_present(int nid, unsigned long start, unsigned
long end);
  * pfn_valid_within() should be used in this case; we optimise this away
  * when we have no holes within a MAX_ORDER_NR_PAGES block.
  */
-#ifdef CONFIG_HOLES_IN_ZONE
+#if defined(CONFIG_HOLES_IN_ZONE) || defined(CONFIG_ARCH_HAS_HOLES_MEMORYMODEL)
 #define pfn_valid_within(pfn) pfn_valid(pfn)
 #else
 #define pfn_valid_within(pfn) (1)
============================%<============================


>> To test Laura's bounds-of-zone theory [0], could you put some empty space
>> between the nvme and the System RAM? (It sounds like this is a KVM guest).
>> Reducing the amount of memory is probably easiest.
> 
> This is not KVM - it is real hardware with real PCIe nvme device. I don't 
> have smaller memory stick.

Ah, you mentioned KVM/guests further down, given your nvme is right up against
the top of the System RAM I assumed this was a guest!


> The board can use u-boot firmware or EFI firmware. The u-boot firmware 
> doesn't put a hole in the memory map and the board has been running with 
> it for several months without a problem.

> The EFI firmware puts a hole below 0xc0000000 and I got a crash after two 
> weeks of uptime.

This will be because of UEFI's use of nomap when the EFI memory map describes
the memory as having incompatible attributes to the kernel linear-map.

(if you boot with efi=debug it will dump the uefi memory map)


> I analyzed the assembler:
> PageBuddy in move_freepages returns false
> Then we call PageLRU, the macro calls PF_HEAD which is compound_page()
> compound_page reads page->compound_head, it is 0xffffffffffffffff, so it
> resturns 0xfffffffffffffffe - and accessing this address causes crash

Thanks!
That wasn't straightforward to work out without the vmlinux.

Because you see all-ones, even in KVM, it looks like the struct page is being
initialized like that deliberately... I haven't found where this might be happening.



Thanks,

James

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-23 13:13             ` Pasha Tatashin
  2018-08-23 13:14               ` Pasha Tatashin
@ 2018-08-23 14:34               ` Mikulas Patocka
  1 sibling, 0 replies; 18+ messages in thread
From: Mikulas Patocka @ 2018-08-23 14:34 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Michal Hocko, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel, linux-mm



On Thu, 23 Aug 2018, Pasha Tatashin wrote:

> On 8/23/18 7:23 AM, Michal Hocko wrote:
> > On Thu 23-08-18 07:16:34, Mikulas Patocka wrote:
> >>
> >>
> >> On Thu, 23 Aug 2018, Michal Hocko wrote:
> >>
> >>> On Thu 23-08-18 07:02:37, Mikulas Patocka wrote:
> >>> [...]
> >>>> This crash is not from -ENOENT. It crashes because page->compound_head is 
> >>>> 0xffffffffffffffff (see below).
> >>>>
> >>>> If I enable CONFIG_DEBUG_VM, I also get VM_BUG.
> >>>
> >>> This smells like the struct page is not initialized properly. How is
> >>> this memory range added? I mean is it brought up by the memory hotplug
> >>> or during the boot?
> 
> I believe it is due to uninitialized struct pages. Mikulas, could you
> please provide config file, and also the full console output.
> 
> Please make sure that you have:
> CONFIG_DEBUG_VM=y
> CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
> 
> I wonder what kind of struct page memory layout is used, and also if
> deferred struct pages are enabled or not.

I uploaded configs and console logs (for the real hardware and for the 
virtual machine) here: 
http://people.redhat.com/~mpatocka/testcases/arm64-config/

The virtual machine was running the lvm2 testsuite while the crash 
happened.

> Have you tried bisecting the problem?

I may try some old kernel in the virtual machine to test if the bug 
happens on it.

> Thank you,
> Pavel

Mikulas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-23 14:06       ` James Morse
@ 2018-08-24 11:41         ` Michal Hocko
  2018-08-29 17:37           ` James Morse
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2018-08-24 11:41 UTC (permalink / raw)
  To: James Morse
  Cc: Mikulas Patocka, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-mm, Pavel Tatashin

On Thu 23-08-18 15:06:08, James Morse wrote:
[...]
> My best-guess is that pfn_valid_within() shouldn't be optimised out if
> ARCH_HAS_HOLES_MEMORYMODEL, even if HOLES_IN_ZONE isn't set.
> 
> Does something like this solve the problem?:
> ============================%<============================
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 32699b2dc52a..5e27095a15f4 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1295,7 +1295,7 @@ void memory_present(int nid, unsigned long start, unsigned
> long end);
>   * pfn_valid_within() should be used in this case; we optimise this away
>   * when we have no holes within a MAX_ORDER_NR_PAGES block.
>   */
> -#ifdef CONFIG_HOLES_IN_ZONE
> +#if defined(CONFIG_HOLES_IN_ZONE) || defined(CONFIG_ARCH_HAS_HOLES_MEMORYMODEL)
>  #define pfn_valid_within(pfn) pfn_valid(pfn)
>  #else
>  #define pfn_valid_within(pfn) (1)
> ============================%<============================

This is the first time I hear about CONFIG_ARCH_HAS_HOLES_MEMORYMODEL.
Why it doesn't imply CONFIG_HOLES_IN_ZONE?

> > I analyzed the assembler:
> > PageBuddy in move_freepages returns false
> > Then we call PageLRU, the macro calls PF_HEAD which is compound_page()
> > compound_page reads page->compound_head, it is 0xffffffffffffffff, so it
> > resturns 0xfffffffffffffffe - and accessing this address causes crash
> 
> Thanks!
> That wasn't straightforward to work out without the vmlinux.
> 
> Because you see all-ones, even in KVM, it looks like the struct page is being
> initialized like that deliberately... I haven't found where this might be happening.

It should be

sparse_add_one_section
#ifdef CONFIG_DEBUG_VM
	/*
	 * Poison uninitialized struct pages in order to catch invalid flags
	 * combinations.
	 */
	memset(memmap, PAGE_POISON_PATTERN, sizeof(struct page) * PAGES_PER_SECTION);
#endif

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-24 11:41         ` Michal Hocko
@ 2018-08-29 17:37           ` James Morse
  2018-08-30 15:58             ` Mikulas Patocka
  2018-09-03 19:33             ` Michal Hocko
  0 siblings, 2 replies; 18+ messages in thread
From: James Morse @ 2018-08-29 17:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mikulas Patocka, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-mm, Pavel Tatashin, Ard Biesheuvel

Hi Michal,

(CC: +Ard)

On 24/08/18 12:41, Michal Hocko wrote:
> On Thu 23-08-18 15:06:08, James Morse wrote:
> [...]
>> My best-guess is that pfn_valid_within() shouldn't be optimised out if
> ARCH_HAS_HOLES_MEMORYMODEL, even if HOLES_IN_ZONE isn't set.
>>
>> Does something like this solve the problem?:
>> ============================%<============================
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 32699b2dc52a..5e27095a15f4 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1295,7 +1295,7 @@ void memory_present(int nid, unsigned long start, unsigned
>> long end);
>>   * pfn_valid_within() should be used in this case; we optimise this away
>>   * when we have no holes within a MAX_ORDER_NR_PAGES block.
>>   */
>> -#ifdef CONFIG_HOLES_IN_ZONE
>> +#if defined(CONFIG_HOLES_IN_ZONE) || defined(CONFIG_ARCH_HAS_HOLES_MEMORYMODEL)
>>  #define pfn_valid_within(pfn) pfn_valid(pfn)
>>  #else
>>  #define pfn_valid_within(pfn) (1)
>> ============================%<============================

After plenty of greping, git-archaeology and help from others, I think I've a
clearer picture of what these options do.


Please correct me if I've explained something wrong here:

> This is the first time I hear about CONFIG_ARCH_HAS_HOLES_MEMORYMODEL.

The comment in include/linux/mmzone.h describes this as relevant when parts the
memmap have been free()d. This would happen on systems where memory is smaller
than a sparsemem-section, and the extra struct pages are expensive.
pfn_valid() on these systems returns true for the whole sparsemem-section, so an
extra memmap_valid_within() check is needed.

This is independent of nomap, and isn't relevant on arm64 as our pfn_valid()
always tests the page in memblock due to nomap pages, which can occur anywhere.
(I will propose a patch removing ARCH_HAS_HOLES_MEMORYMODEL for arm64.)


HOLES_IN_ZONE is similar, if some memory is smaller than MAX_ORDER_NR_PAGES,
possibly due to nomap holes.

6d526ee26ccd only enabled it for NUMA systems on arm64, because the NUMA code
was first to fall foul of this, but there is nothing NUMA specific about nomap
holes within a MAX_ORDER_NR_PAGES region.

I'm convinced arm64 should always enable HOLES_IN_ZONE because nomap pages can
occur anywhere. I'll post a fix.


Is it valid to have HOLES_IN_ZONE and !HAVE_ARCH_PFN_VALID?
This would mean pfn_valid_within() is necessary, but pfn_valid() is only looking
at sparse-sections. It looks like ia64 and mips:CAVIUM_OCTEON_SOC are both
configured like this...


> Why it doesn't imply CONFIG_HOLES_IN_ZONE?

I guess the size values for sparsemem-section and MAX_ORDER_NR_PAGES may support
HAS_HOLES_MEMORYMODEL but not HOLES_IN_ZONE. e.g. if only 128Mb of memory
existed in a 256Mb sparsemem-section, but the 4Mb MAX_ORDER_NR_PAGES are always
present if any of their pages are present.


>>> I analyzed the assembler:
>>> PageBuddy in move_freepages returns false
>>> Then we call PageLRU, the macro calls PF_HEAD which is compound_page()
>>> compound_page reads page->compound_head, it is 0xffffffffffffffff, so it
>>> resturns 0xfffffffffffffffe - and accessing this address causes crash
>>
>> Thanks!
>> That wasn't straightforward to work out without the vmlinux.
>>
>> Because you see all-ones, even in KVM, it looks like the struct page is being
>> initialized like that deliberately... I haven't found where this might be happening.
> 
> It should be
> 
> sparse_add_one_section
> #ifdef CONFIG_DEBUG_VM
> 	/*
> 	 * Poison uninitialized struct pages in order to catch invalid flags
> 	 * combinations.
> 	 */
> 	memset(memmap, PAGE_POISON_PATTERN, sizeof(struct page) * PAGES_PER_SECTION);
> #endif

Aha, thanks. (I expected KVMs uninitialized memory to always be zero).


Thanks!

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-29 17:37           ` James Morse
@ 2018-08-30 15:58             ` Mikulas Patocka
  2018-08-30 16:11               ` Will Deacon
  2018-08-30 16:25               ` James Morse
  2018-09-03 19:33             ` Michal Hocko
  1 sibling, 2 replies; 18+ messages in thread
From: Mikulas Patocka @ 2018-08-30 15:58 UTC (permalink / raw)
  To: James Morse
  Cc: Michal Hocko, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-mm, Pavel Tatashin, Ard Biesheuvel



On Wed, 29 Aug 2018, James Morse wrote:

> Hi Michal,
> 
> (CC: +Ard)
> 
> On 24/08/18 12:41, Michal Hocko wrote:
> > On Thu 23-08-18 15:06:08, James Morse wrote:
> > [...]
> >> My best-guess is that pfn_valid_within() shouldn't be optimised out if
> > ARCH_HAS_HOLES_MEMORYMODEL, even if HOLES_IN_ZONE isn't set.
> >>
> >> Does something like this solve the problem?:
> >> ============================%<============================
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index 32699b2dc52a..5e27095a15f4 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -1295,7 +1295,7 @@ void memory_present(int nid, unsigned long start, unsigned
> >> long end);
> >>   * pfn_valid_within() should be used in this case; we optimise this away
> >>   * when we have no holes within a MAX_ORDER_NR_PAGES block.
> >>   */
> >> -#ifdef CONFIG_HOLES_IN_ZONE
> >> +#if defined(CONFIG_HOLES_IN_ZONE) || defined(CONFIG_ARCH_HAS_HOLES_MEMORYMODEL)
> >>  #define pfn_valid_within(pfn) pfn_valid(pfn)
> >>  #else
> >>  #define pfn_valid_within(pfn) (1)
> >> ============================%<============================
> 
> After plenty of greping, git-archaeology and help from others, I think I've a
> clearer picture of what these options do.
> 
> 
> Please correct me if I've explained something wrong here:
> 
> > This is the first time I hear about CONFIG_ARCH_HAS_HOLES_MEMORYMODEL.
> 
> The comment in include/linux/mmzone.h describes this as relevant when parts the
> memmap have been free()d. This would happen on systems where memory is smaller
> than a sparsemem-section, and the extra struct pages are expensive.
> pfn_valid() on these systems returns true for the whole sparsemem-section, so an
> extra memmap_valid_within() check is needed.
> 
> This is independent of nomap, and isn't relevant on arm64 as our pfn_valid()
> always tests the page in memblock due to nomap pages, which can occur anywhere.
> (I will propose a patch removing ARCH_HAS_HOLES_MEMORYMODEL for arm64.)
> 
> 
> HOLES_IN_ZONE is similar, if some memory is smaller than MAX_ORDER_NR_PAGES,
> possibly due to nomap holes.
> 
> 6d526ee26ccd only enabled it for NUMA systems on arm64, because the NUMA code
> was first to fall foul of this, but there is nothing NUMA specific about nomap
> holes within a MAX_ORDER_NR_PAGES region.
> 
> I'm convinced arm64 should always enable HOLES_IN_ZONE because nomap pages can
> occur anywhere. I'll post a fix.

But x86 had the same bug -
https://bugzilla.redhat.com/show_bug.cgi?id=1598462

And x86 fixed it without enabling HOLES_IN_ZONE. On x86, the BIOS can also 
reserve any memory range - so you can have arbitrary holes there that are 
not predictable when the kernel is compiled.

Currently HOLES_IN_ZONE is selected only for ia64, mips/octeon - so does 
it mean that all the other architectures don't have holes in the memory 
map?

What should be architecture-independent way how to handle the holes?

Mikulas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-30 15:58             ` Mikulas Patocka
@ 2018-08-30 16:11               ` Will Deacon
  2018-08-30 16:25               ` James Morse
  1 sibling, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-08-30 16:11 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: James Morse, Michal Hocko, Catalin Marinas, linux-arm-kernel,
	linux-mm, Pavel Tatashin, Ard Biesheuvel

On Thu, Aug 30, 2018 at 11:58:19AM -0400, Mikulas Patocka wrote:
> On Wed, 29 Aug 2018, James Morse wrote:
> > HOLES_IN_ZONE is similar, if some memory is smaller than MAX_ORDER_NR_PAGES,
> > possibly due to nomap holes.
> > 
> > 6d526ee26ccd only enabled it for NUMA systems on arm64, because the NUMA code
> > was first to fall foul of this, but there is nothing NUMA specific about nomap
> > holes within a MAX_ORDER_NR_PAGES region.
> > 
> > I'm convinced arm64 should always enable HOLES_IN_ZONE because nomap pages can
> > occur anywhere. I'll post a fix.
> 
> But x86 had the same bug -
> https://bugzilla.redhat.com/show_bug.cgi?id=1598462

Yeah, that's not readable and lkml.org is down. Any idea what x86 did?

> And x86 fixed it without enabling HOLES_IN_ZONE. On x86, the BIOS can also 
> reserve any memory range - so you can have arbitrary holes there that are 
> not predictable when the kernel is compiled.

What happens when the BIOS reserves a page on x86? Is it still mapped by
the kernel (and therefore has a valid struct page) or is it treated like
NOMAP?

> Currently HOLES_IN_ZONE is selected only for ia64, mips/octeon - so does 
> it mean that all the other architectures don't have holes in the memory 
> map?

Possibly. Note also that arm64 already selects HOLES_IN_ZONE if NUMA.

> What should be architecture-independent way how to handle the holes?

Until firmware is architecture-independent, I think handling this
generically is a lost cause.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-30 15:58             ` Mikulas Patocka
  2018-08-30 16:11               ` Will Deacon
@ 2018-08-30 16:25               ` James Morse
  1 sibling, 0 replies; 18+ messages in thread
From: James Morse @ 2018-08-30 16:25 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Michal Hocko, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-mm, Pavel Tatashin, Ard Biesheuvel

Hi Mikulas,

On 30/08/18 16:58, Mikulas Patocka wrote:
> On Wed, 29 Aug 2018, James Morse wrote:
>> On 24/08/18 12:41, Michal Hocko wrote:
>>> On Thu 23-08-18 15:06:08, James Morse wrote:
>>> [...]
>>>> My best-guess is that pfn_valid_within() shouldn't be optimised out if
>>> ARCH_HAS_HOLES_MEMORYMODEL, even if HOLES_IN_ZONE isn't set.
>>>>
>>>> Does something like this solve the problem?:
>>>> ============================%<============================
>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>> index 32699b2dc52a..5e27095a15f4 100644
>>>> --- a/include/linux/mmzone.h
>>>> +++ b/include/linux/mmzone.h
>>>> @@ -1295,7 +1295,7 @@ void memory_present(int nid, unsigned long start, unsigned
>>>> long end);
>>>>   * pfn_valid_within() should be used in this case; we optimise this away
>>>>   * when we have no holes within a MAX_ORDER_NR_PAGES block.
>>>>   */
>>>> -#ifdef CONFIG_HOLES_IN_ZONE
>>>> +#if defined(CONFIG_HOLES_IN_ZONE) || defined(CONFIG_ARCH_HAS_HOLES_MEMORYMODEL)
>>>>  #define pfn_valid_within(pfn) pfn_valid(pfn)
>>>>  #else
>>>>  #define pfn_valid_within(pfn) (1)
>>>> ============================%<============================
>>
>> After plenty of greping, git-archaeology and help from others, I think I've a
>> clearer picture of what these options do.
>>
>>
>> Please correct me if I've explained something wrong here:
>>
>>> This is the first time I hear about CONFIG_ARCH_HAS_HOLES_MEMORYMODEL.
>>
>> The comment in include/linux/mmzone.h describes this as relevant when parts the
>> memmap have been free()d. This would happen on systems where memory is smaller
>> than a sparsemem-section, and the extra struct pages are expensive.
>> pfn_valid() on these systems returns true for the whole sparsemem-section, so an
>> extra memmap_valid_within() check is needed.
>>
>> This is independent of nomap, and isn't relevant on arm64 as our pfn_valid()
>> always tests the page in memblock due to nomap pages, which can occur anywhere.
>> (I will propose a patch removing ARCH_HAS_HOLES_MEMORYMODEL for arm64.)
>>
>>
>> HOLES_IN_ZONE is similar, if some memory is smaller than MAX_ORDER_NR_PAGES,
>> possibly due to nomap holes.
>>
>> 6d526ee26ccd only enabled it for NUMA systems on arm64, because the NUMA code
>> was first to fall foul of this, but there is nothing NUMA specific about nomap
>> holes within a MAX_ORDER_NR_PAGES region.
>>
>> I'm convinced arm64 should always enable HOLES_IN_ZONE because nomap pages can
>> occur anywhere. I'll post a fix.
> 
> But x86 had the same bug -
> https://bugzilla.redhat.com/show_bug.cgi?id=1598462

(Context: e181ae0c5db "mm: zero unavailable pages before memmap init")

Its the same symptom, but not quite the same bug.


> And x86 fixed it without enabling HOLES_IN_ZONE. On x86, the BIOS can also 
> reserve any memory range - so you can have arbitrary holes there that are 
> not predictable when the kernel is compiled.

x86's pfn_valid() says the struct-page is accessible, the problem was it wasn't
initialized correctly.

On arm64 pfn_valid() says these struct-pages are not accessible. The problem was
the pfn_valid_within()->pfn_valid() calls being removed, causing the
uninitialized struct-page to be accessed.


> Currently HOLES_IN_ZONE is selected only for ia64, mips/octeon - so does 
> it mean that all the other architectures don't have holes in the memory 
> map?

I think there is just more than way of handling these, depending on whether
holes have struct-pages and what pfn_valid() reports for them.


> What should be architecture-independent way how to handle the holes?

We already diverge with e820/memblock. I'm not sure what the x86 holes
correspond to, but on arm64 these are holes in the linear-map because the
corresponding memory needs mapping with particular attributes, and we can't
mix-and-match.


Thanks,

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-08-29 17:37           ` James Morse
  2018-08-30 15:58             ` Mikulas Patocka
@ 2018-09-03 19:33             ` Michal Hocko
  2018-09-07 17:47               ` James Morse
  1 sibling, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2018-09-03 19:33 UTC (permalink / raw)
  To: James Morse
  Cc: Mikulas Patocka, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-mm, Pavel Tatashin, Ard Biesheuvel

On Wed 29-08-18 18:37:55, James Morse wrote:
> Hi Michal,
> 
> (CC: +Ard)
> 
> On 24/08/18 12:41, Michal Hocko wrote:
> > On Thu 23-08-18 15:06:08, James Morse wrote:
> > [...]
> >> My best-guess is that pfn_valid_within() shouldn't be optimised out if
> > ARCH_HAS_HOLES_MEMORYMODEL, even if HOLES_IN_ZONE isn't set.
> >>
> >> Does something like this solve the problem?:
> >> ============================%<============================
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index 32699b2dc52a..5e27095a15f4 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -1295,7 +1295,7 @@ void memory_present(int nid, unsigned long start, unsigned
> >> long end);
> >>   * pfn_valid_within() should be used in this case; we optimise this away
> >>   * when we have no holes within a MAX_ORDER_NR_PAGES block.
> >>   */
> >> -#ifdef CONFIG_HOLES_IN_ZONE
> >> +#if defined(CONFIG_HOLES_IN_ZONE) || defined(CONFIG_ARCH_HAS_HOLES_MEMORYMODEL)
> >>  #define pfn_valid_within(pfn) pfn_valid(pfn)
> >>  #else
> >>  #define pfn_valid_within(pfn) (1)
> >> ============================%<============================
> 
> After plenty of greping, git-archaeology and help from others, I think I've a
> clearer picture of what these options do.
> 
> 
> Please correct me if I've explained something wrong here:
> 
> > This is the first time I hear about CONFIG_ARCH_HAS_HOLES_MEMORYMODEL.
> 
> The comment in include/linux/mmzone.h describes this as relevant when parts the
> memmap have been free()d. This would happen on systems where memory is smaller
> than a sparsemem-section, and the extra struct pages are expensive.
> pfn_valid() on these systems returns true for the whole sparsemem-section, so an
> extra memmap_valid_within() check is needed.

I have hard times to find an actual code that does this partial memmap
initialization.

> This is independent of nomap, and isn't relevant on arm64 as our pfn_valid()
> always tests the page in memblock due to nomap pages, which can occur anywhere.
> (I will propose a patch removing ARCH_HAS_HOLES_MEMORYMODEL for arm64.)

It seems ARCH_HAS_HOLES_MEMORYMODEL is only defined for arm and arm64.
Is it really needed for arm?

> HOLES_IN_ZONE is similar, if some memory is smaller than MAX_ORDER_NR_PAGES,
> possibly due to nomap holes.
> 
> 6d526ee26ccd only enabled it for NUMA systems on arm64, because the NUMA code
> was first to fall foul of this, but there is nothing NUMA specific about nomap
> holes within a MAX_ORDER_NR_PAGES region.
> 
> I'm convinced arm64 should always enable HOLES_IN_ZONE because nomap pages can
> occur anywhere. I'll post a fix.
> 
> 
> Is it valid to have HOLES_IN_ZONE and !HAVE_ARCH_PFN_VALID?
> This would mean pfn_valid_within() is necessary, but pfn_valid() is only looking
> at sparse-sections. It looks like ia64 and mips:CAVIUM_OCTEON_SOC are both
> configured like this...

this smells suspicious and I wouldn't be surprised if this was some
leftover from the past.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
  2018-09-03 19:33             ` Michal Hocko
@ 2018-09-07 17:47               ` James Morse
  0 siblings, 0 replies; 18+ messages in thread
From: James Morse @ 2018-09-07 17:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mikulas Patocka, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-mm, Pavel Tatashin, Ard Biesheuvel, Russell King

Hi Michal,

(CC: +Russell, we're trying to work out if ARCH_HAS_HOLES_MEMORYMODEL is still
necessary)

On 03/09/18 20:33, Michal Hocko wrote:
> On Wed 29-08-18 18:37:55, James Morse wrote:
>> On 24/08/18 12:41, Michal Hocko wrote:
>>> On Thu 23-08-18 15:06:08, James Morse wrote:
>>> [...]
>>>> My best-guess is that pfn_valid_within() shouldn't be optimised out if
>>> ARCH_HAS_HOLES_MEMORYMODEL, even if HOLES_IN_ZONE isn't set.

>> After plenty of greping, git-archaeology and help from others, I think I've a
>> clearer picture of what these options do.
>>
>> Please correct me if I've explained something wrong here:
>>
>>> This is the first time I hear about CONFIG_ARCH_HAS_HOLES_MEMORYMODEL.
>>
>> The comment in include/linux/mmzone.h describes this as relevant when parts the
>> memmap have been free()d. This would happen on systems where memory is smaller
>> than a sparsemem-section, and the extra struct pages are expensive.
>> pfn_valid() on these systems returns true for the whole sparsemem-section, so an
>> extra memmap_valid_within() check is needed.
> 
> I have hard times to find an actual code that does this partial memmap
> initialization.

arch/arm64/mm/init.c:free_unused_memmap(), once it has walked all the memblocks
does this with the space after the last one:
|#ifdef CONFIG_SPARSEMEM
|	if (!IS_ALIGNED(prev_end, PAGES_PER_SECTION))
|		free_memmap(prev_end, ALIGN(prev_end, PAGES_PER_SECTION));
|#endif

prev_end is the pfn of the end of the last memblock, rounded up to
MAX_ORDER_NR_PAGES. If this isn't aligned to a section boundary, whole pages of
memmap between prev_end and the section boundary are freed.

(The memblock walker does something similar for the gaps between memblocks)


>> This is independent of nomap, and isn't relevant on arm64 as our pfn_valid()
>> always tests the page in memblock due to nomap pages, which can occur anywhere.
>> (I will propose a patch removing ARCH_HAS_HOLES_MEMORYMODEL for arm64.)
> 
> It seems ARCH_HAS_HOLES_MEMORYMODEL is only defined for arm and arm64.
> Is it really needed for arm?

I don't know much about arch/arm, but from grepping around: arch/arm does the
same thing as above with its free_unused_memmap(), so this partial memmap
initialisation can happen.

For 32bit ARCH_HAS_HOLES_MEMORYMODEL is something different boards/platforms
opt-into. But to match the partial memmap-initialisation case above it should be
selected if SPARSEMEM. Doing this would make HAVE_ARCH_PFN_VALID always true,
meaning the checks ARCH_HAS_HOLES_MEMORYMODEL enables never need running because
pfn_valid() already does them, at which point it can be removed.

The way it is makes sense if each board/platform knows where/how-much memory it
will have and can size FORCE_MAX_ZONEORDER so it doesn't get holes. But doesn't
this stuff all come from DT nowadays?

I think arch/arm should select ARCH_HAS_HOLES_MEMORYMODEL if USE_OF, but I don't
think this extra configurability is useful. Selecting it unconditionally would
let us remove it.


Digging through the history I think the original commit:
eb33575cf67d ("[ARM] Double check memmap is actually valid with a memmap has
unexpected holes V2")
Was working around the pfn_valid() behaviour that was changed with:
7b7bf499f79d (" ARM: 6913/1: sparsemem: allow pfn_valid to be overridden when
using SPARSEMEM")

The two users that describe their memory layout just want HAVE_ARCH_PFN_VALID:
59f181aa9d633 ("ARM: brcmstb: Enable ARCH_HAS_HOLES_MEMORYMODEL")
e511333212de4 ("ARM: highbank: select ARCH_HAS_HOLES_MEMORYMODEL")


Thanks,

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-09-07 17:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-17 19:44 A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory Mikulas Patocka
2018-08-21 10:44 ` Michal Hocko
2018-08-21 12:58   ` James Morse
2018-08-23 11:02     ` Mikulas Patocka
2018-08-23 11:10       ` Michal Hocko
2018-08-23 11:16         ` Mikulas Patocka
2018-08-23 11:23           ` Michal Hocko
2018-08-23 13:13             ` Pasha Tatashin
2018-08-23 13:14               ` Pasha Tatashin
2018-08-23 14:34               ` Mikulas Patocka
2018-08-23 14:06       ` James Morse
2018-08-24 11:41         ` Michal Hocko
2018-08-29 17:37           ` James Morse
2018-08-30 15:58             ` Mikulas Patocka
2018-08-30 16:11               ` Will Deacon
2018-08-30 16:25               ` James Morse
2018-09-03 19:33             ` Michal Hocko
2018-09-07 17:47               ` James Morse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).