LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Qian Cai <cai@lca.pw>
To: Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@kernel.org>,
	David Hildenbrand <david@redhat.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>, Baoquan He <bhe@redhat.com>
Subject: compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
Date: Thu, 23 Apr 2020 17:25:56 -0400
Message-ID: <8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw> (raw)

Compaction starts to crash below on linux-next today. The faulty page belongs to Node 0 DMA32 zone.
I’ll continue to narrow it down, but just want to give a headup in case someone could beat me to it.

Debug output from free_area_init_core()
[    0.000000] KK start page = ffffea0000000040, end page = ffffea0000040000, nid = 0 DMA
[    0.000000] KK start page = ffffea0000040000, end page = ffffea0004000000, nid = 0 DMA32
[    0.000000] KK start page = ffffea0004000000, end page = ffffea0012000000, nid = 0 NORMAL
[    0.000000] KK start page = ffffea0012000000, end page = ffffea0021fc0000, nid = 4 NORMAL

I don’t understand how it could end up in such a situation. There are several recent patches look
more related than some others.

- mm: rework free_area_init*() funcitons
https://lore.kernel.org/linux-mm/20200412194859.12663-1-rppt@kernel.org/
Could this somehow allow an invalid pfn to escape into the page allocator?
Especially, is it related to skip the checks in memmap_init_zone()?
https://lore.kernel.org/linux-mm/20200412194859.12663-16-rppt@kernel.org

# numactl -H
available: 8 nodes (0-7)
node 0 cpus: 0 1 16 17
node 0 size: 7951 MB
node 0 free: 4445 MB
node 1 cpus: 2 3 18 19
node 1 size: 0 MB
node 1 free: 0 MB
node 2 cpus: 4 5 20 21
node 2 size: 0 MB
node 2 free: 0 MB
node 3 cpus: 6 7 22 23
node 3 size: 0 MB
node 3 free: 0 MB
node 4 cpus: 8 9 24 25
node 4 size: 15354 MB
node 4 free: 78 MB
node 5 cpus: 10 11 26 27
node 5 size: 0 MB
node 5 free: 0 MB
node 6 cpus: 12 13 28 29
node 6 size: 0 MB
node 6 free: 0 MB
node 7 cpus: 14 15 30 31
node 7 size: 0 MB
node 7 free: 0 MB
node distances:
node   0   1   2   3   4   5   6   7 
  0:  10  16  16  16  32  32  32  32 
  1:  16  10  16  16  32  32  32  32 
  2:  16  16  10  16  32  32  32  32 
  3:  16  16  16  10  32  32  32  32 
  4:  32  32  32  32  10  16  16  16 
  5:  32  32  32  32  16  10  16  16 
  6:  32  32  32  32  16  16  10  16 
  7:  32  32  32  32  16  16  16  10

[ 6803.941550] LTP: starting swapping01 (swapping01 -i 5)
[ 6821.098489] page:ffffea0000aa0000 refcount:1 mapcount:0 mapping:000000002243743b index:0x0
[ 6821.107077] flags: 0x1fffe000001000(reserved)
[ 6821.111534] raw: 001fffe000001000 ffffea0000aa0008 ffffea0000aa0008 0000000000000000
[ 6821.119365] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 6821.127167] page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
[ 6821.135399] page_owner info is not present (never set?)
[ 6821.140717] ------------[ cut here ]------------
[ 6821.145372] kernel BUG at mm/page_alloc.c:533!
[ 6821.150075] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 6821.150083] irq event stamp: 10075005
[ 6821.150102] hardirqs last  enabled at (10075005): [<ffffffff99ea403f>] do_page_fault+0x45f/0x9d7
[ 6821.156829] CPU: 17 PID: 218 Comm: kcompactd0 Not tainted 5.7.0-rc2-next-20200423+ #7
[ 6821.160522] hardirqs last disabled at (10075004): [<ffffffff99e03ed1>] trace_hardirqs_off_thunk+0x1a/0x1c
[ 6821.160535] softirqs last  enabled at (10067158): [<ffffffff9ac00478>] __do_softirq+0x478/0x77f
[ 6821.169366] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018
[ 6821.169378] RIP: 0010:set_pfnblock_flags_mask+0x150/0x210
[ 6821.177257] softirqs last disabled at (10067149): [<ffffffff99ed22a6>] irq_exit+0xd6/0xf0
[ 6821.218362] Code: 1a 49 8d 7f 38 e8 80 ee 05 00 48 8b 45 a0 48 8b 55 a8 48 03 50 78 49 39 d4 72 1d 48 c7 c6 80 ee ef 9a 4c 89 ef e8 70 73 fb ff <0f> 0b 48 c7 c7 40 50 75 9b e8 c4 09 2b 00 4c 8b 65 d0 b9 3f 00 00
[ 6821.237457] RSP: 0018:ffffc900042ff858 EFLAGS: 00010282
[ 6821.242719] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9a002382
[ 6821.249900] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8884535b8e6c
[ 6821.257384] RBP: ffffc900042ff8b8 R08: ffffed108a6b8459 R09: ffffed108a6b8459
[ 6821.264566] R10: ffff8884535c22c7 R11: ffffed108a6b8458 R12: 000000000002a800
[ 6821.271748] R13: ffffea0000aa0000 R14: ffff88847fff3000 R15: ffff88847fff3040
[ 6821.278930] FS:  0000000000000000(0000) GS:ffff888453580000(0000) knlGS:0000000000000000
[ 6821.287318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6821.293102] CR2: 00007fd1eb4a1000 CR3: 000000083154c000 CR4: 00000000003406e0
[ 6821.300283] Call Trace:
[ 6821.302752]  isolate_freepages+0xb20/0x1140
[ 6821.307167]  ? isolate_freepages_block+0x730/0x730
[ 6821.311993]  ? mark_held_locks+0x34/0xb0
[ 6821.315942]  ? free_unref_page+0x7d/0x90
[ 6821.319891]  ? free_unref_page+0x7d/0x90
[ 6821.323842]  ? check_flags.part.28+0x86/0x220
[ 6821.328234]  compaction_alloc+0xdd/0x100
[ 6821.332401]  migrate_pages+0x304/0x17e0
[ 6821.336277]  ? __ClearPageMovable+0x100/0x100
[ 6821.340674]  ? isolate_freepages+0x1140/0x1140
[ 6821.345153]  compact_zone+0x1249/0x1e90
[ 6821.349020]  ? compaction_suitable+0x260/0x260
[ 6821.353494]  kcompactd_do_work+0x231/0x650
[ 6821.357873]  ? sysfs_compact_node+0x80/0x80
[ 6821.362088]  ? finish_wait+0xe6/0x110
[ 6821.365775]  kcompactd+0x162/0x490
[ 6821.369202]  ? kcompactd_do_work+0x650/0x650
[ 6821.373501]  ? finish_wait+0x110/0x110
[ 6821.377280]  ? __kasan_check_read+0x11/0x20
[ 6821.381693]  ? __kthread_parkme+0xd4/0xf0
[ 6821.385729]  ? kcompactd_do_work+0x650/0x650
[ 6821.390027]  kthread+0x1f7/0x220
[ 6821.393280]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 6821.398369]  ret_from_fork+0x27/0x50
[ 6821.401968] Modules linked in: brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_amd ses kvm enclosure dax_pmem dax_pmem_core irqbypass acpi_cpufreq ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas tg3 firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
[ 6821.426127] ---[ end trace 9783087562801ccf ]---
[ 6821.430800] RIP: 0010:set_pfnblock_flags_mask+0x150/0x210
[ 6821.436410] Code: 1a 49 8d 7f 38 e8 80 ee 05 00 48 8b 45 a0 48 8b 55 a8 48 03 50 78 49 39 d4 72 1d 48 c7 c6 80 ee ef 9a 4c 89 ef e8 70 73 fb ff <0f> 0b 48 c7 c7 40 50 75 9b e8 c4 09 2b 00 4c 8b 65 d0 b9 3f 00 00
[ 6821.455319] RSP: 0018:ffffc900042ff858 EFLAGS: 00010282
[ 6821.460863] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9a002382
[ 6821.468063] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8884535b8e6c
[ 6821.475245] RBP: ffffc900042ff8b8 R08: ffffed108a6b8459 R09: ffffed108a6b8459
[ 6821.482675] R10: ffff8884535c22c7 R11: ffffed108a6b8458 R12: 000000000002a800
[ 6821.489877] R13: ffffea0000aa0000 R14: ffff88847fff3000 R15: ffff88847fff3040
[ 6821.497062] FS:  0000000000000000(0000) GS:ffff888453580000(0000) knlGS:0000000000000000
[ 6821.505218] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6821.511284] CR2: 00007fd1eb4a1000 CR3: 000000083154c000 CR4: 00000000003406e0
[ 6821.518487] Kernel panic - not syncing: Fatal exception
[ 6821.523876] Kernel Offset: 0x18e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 6821.534915] ---[ end Kernel panic - not syncing: Fatal exception ]---

             reply index

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-23 21:25 Qian Cai [this message]
2020-04-24  3:43 ` Baoquan He
2020-04-24 13:45   ` Qian Cai
2020-05-05 12:43     ` Baoquan He
2020-05-05 13:20       ` Qian Cai
2020-05-11  1:21         ` Baoquan He
2020-04-26 14:41 ` Mike Rapoport
2020-04-27 13:45   ` Qian Cai
2020-11-21 19:45 ` [PATCH 0/1] VM_BUG_ON_PAGE(!zone_spans_pfn) in set_pfnblock_flags_mask Andrea Arcangeli
2020-11-21 19:45   ` [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Andrea Arcangeli
2020-11-21 19:53     ` Andrea Arcangeli
2020-11-23 11:26       ` David Hildenbrand
2020-11-23 13:01     ` Vlastimil Babka
2020-11-24 13:32       ` Mel Gorman
2020-11-24 20:56         ` Andrea Arcangeli
2020-11-25 10:30           ` Mel Gorman
2020-11-25 17:59             ` Andrea Arcangeli
2020-11-26 10:47               ` Mel Gorman
2020-12-06  2:26                 ` Andrea Arcangeli
2020-12-06 23:47                   ` Mel Gorman
2020-11-25  5:34       ` Andrea Arcangeli
2020-11-25  6:45         ` David Hildenbrand
2020-11-25  8:51           ` Mike Rapoport
2020-11-25 10:39           ` Mel Gorman
2020-11-25 11:04             ` David Hildenbrand
2020-11-25 11:41               ` David Hildenbrand
2020-11-25 18:47                 ` Andrea Arcangeli
2020-11-25 13:33               ` Mel Gorman
2020-11-25 13:41                 ` David Hildenbrand
2020-11-25 18:28           ` Andrea Arcangeli
2020-11-25 19:27             ` David Hildenbrand
2020-11-25 20:41               ` Andrea Arcangeli
2020-11-25 21:13                 ` David Hildenbrand
2020-11-25 21:04               ` Mike Rapoport
2020-11-25 21:38                 ` Andrea Arcangeli
2020-11-26  9:36                   ` Mike Rapoport
2020-11-26 10:05                     ` David Hildenbrand
2020-11-26 17:46                       ` Mike Rapoport
2020-11-29 12:32                         ` Mike Rapoport
2020-12-02  0:44                           ` Andrea Arcangeli
2020-12-02 17:39                             ` Mike Rapoport
2020-12-03  6:23                               ` Andrea Arcangeli
2020-12-03 10:51                                 ` Mike Rapoport
2020-12-03 17:31                                   ` Andrea Arcangeli
2020-12-06  8:09                                     ` Mike Rapoport
2020-11-26 18:15                       ` Andrea Arcangeli
2020-11-26 18:29                     ` Andrea Arcangeli
2020-11-26 19:44                       ` Mike Rapoport
2020-11-26 20:30                         ` Andrea Arcangeli
2020-11-26 21:03                           ` Mike Rapoport
2020-11-26 19:21                     ` Andrea Arcangeli
2020-11-25 12:08         ` Vlastimil Babka
2020-11-25 13:32           ` David Hildenbrand
2020-11-25 14:13             ` Mike Rapoport
2020-11-25 14:42               ` David Hildenbrand
2020-11-26 10:51                 ` Mel Gorman
2020-11-25 19:14               ` Andrea Arcangeli
2020-11-25 19:01           ` Andrea Arcangeli
2020-11-25 19:33             ` David Hildenbrand
2020-11-26  3:40         ` Andrea Arcangeli
2020-11-26 10:43           ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw \
    --to=cai@lca.pw \
    --cc=bhe@redhat.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git