linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* isolate_lru_pages(): kernel BUG at mm/vmscan.c:1689!
@ 2019-05-01 23:49 Dexuan Cui
  2019-05-02 12:55 ` Michal Hocko
  2019-05-08 10:33 ` Mel Gorman
  0 siblings, 2 replies; 5+ messages in thread
From: Dexuan Cui @ 2019-05-01 23:49 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Kirill Tkhai
  Cc: Michal Hocko, Johannes Weiner, Vladimir Davydov, Roman Gushchin,
	Hugh Dickins, Andrey Ryabinin, Mel Gorman, dchinner, Greg Thelen,
	Kuo-Hsin Yang, dchinner, Kuo-Hsin Yang

Hi,
Today I got the below BUG in isolate_lru_pages() when building the kernel.

My current running kernel, which exhibits the BUG, is based on the mainline kernel's commit 
262d6a9a63a3 ("Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip").

Looks nobody else reported the issue recently.

So far I only hit the BUG once and I don't know how to reproduce it again, so this is just a FYI.

Thanks,
-- Dexuan

The crash log is:

[ 1626.194411] ------------[ cut here ]------------
[ 1626.197031] kernel BUG at mm/vmscan.c:1689!
[ 1626.197031] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[ 1626.207112] CPU: 2 PID: 86 Comm: kswapd0 Not tainted 5.0.0+ #67
[ 1626.207112] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[ 1626.207112] RIP: 0010:isolate_lru_pages+0x4ab/0x4c0
[ 1626.207112] Code: e8 6a bc f1 ff 85 c0 75 e0 48 c7 c2 40 dc 03 af be 41 01 00 00 48 c7 c7 de ef 06 af c6 05 2e 6c 17 01 01 e8 d2 8c ef ff eb bf <0f> 0b e8 be 50 e9 ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f
[ 1626.245025] RSP: 0000:ffffa051c0c73ac8 EFLAGS: 00010082
[ 1626.258863] RAX: 00000000ffffffea RBX: ffff8cdb229afc20 RCX: dead000000000200
[ 1626.258863] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe0f803543000
[ 1626.258863] RBP: 000000000000000c R08: ffffe0f8033c2708 R09: 0000000000000002
[ 1626.258863] R10: 0000000000000001 R11: 0000000000000001 R12: 000000000000000b
[ 1626.258863] R13: 000000000000000b R14: ffffa051c0c73de0 R15: ffffe0f803543008
[ 1626.258863] FS:  0000000000000000(0000) GS:ffff8cdb43280000(0000) knlGS:0000000000000000
[ 1626.258863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1626.303683] CR2: 0000563ea696da18 CR3: 00000000e07f0005 CR4: 00000000003606e0
[ 1626.303683] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1626.303683] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1626.303683] Call Trace:
[ 1626.327319]  shrink_inactive_list+0xf9/0x700
[ 1626.327319]  ? __lock_acquire+0x42d/0x1190
[ 1626.327319]  ? inactive_list_is_low+0x77/0x2c0
[ 1626.327319]  shrink_node_memcg+0x206/0x780
[ 1626.327319]  ? percpu_ref_put_many+0x8c/0x130
[ 1626.327319]  ? percpu_ref_put_many+0x8c/0x130
[ 1626.327319]  shrink_node+0xcf/0x470
[ 1626.355957]  balance_pgdat+0x2d9/0x560
[ 1626.355957]  kswapd+0x263/0x560
[ 1626.355957]  ? finish_wait+0x80/0x80
[ 1626.355957]  ? balance_pgdat+0x560/0x560
[ 1626.355957]  kthread+0x11b/0x140
[ 1626.373956]  ? kthread_create_on_node+0x60/0x60
[ 1626.373956]  ret_from_fork+0x24/0x30
[ 1626.373956] Modules linked in: crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper serio_raw hyperv_fb evdev autofs4 hid_generic hid_hyperv hv_netvsc hyperv_keyboard hid psmouse i2c_piix4 hv_vmbus atkbd i2c_core
[ 1626.373956] ---[ end trace b148bf262999856d ]---
[ 1626.373956] RIP: 0010:isolate_lru_pages+0x4ab/0x4c0
[ 1626.373956] Code: e8 6a bc f1 ff 85 c0 75 e0 48 c7 c2 40 dc 03 af be 41 01 00 00 48 c7 c7 de ef 06 af c6 05 2e 6c 17 01 01 e8 d2 8c ef ff eb bf <0f> 0b e8 be 50 e9 ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f
[ 1626.373956] RSP: 0000:ffffa051c0c73ac8 EFLAGS: 00010082
[ 1626.373956] RAX: 00000000ffffffea RBX: ffff8cdb229afc20 RCX: dead000000000200
[ 1626.373956] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe0f803543000
[ 1626.373956] RBP: 000000000000000c R08: ffffe0f8033c2708 R09: 0000000000000002
[ 1626.373956] R10: 0000000000000001 R11: 0000000000000001 R12: 000000000000000b
[ 1626.373956] R13: 000000000000000b R14: ffffa051c0c73de0 R15: ffffe0f803543008
[ 1626.373956] FS:  0000000000000000(0000) GS:ffff8cdb43280000(0000) knlGS:0000000000000000
[ 1626.373956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1626.373956] CR2: 0000563ea696da18 CR3: 00000000e07f0005 CR4: 00000000003606e0
[ 1626.373956] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1626.373956] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1626.373956] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:34
[ 1626.373956] in_atomic(): 1, irqs_disabled(): 1, pid: 86, name: kswapd0
[ 1626.373956] INFO: lockdep is turned off.
[ 1626.373956] irq event stamp: 998186
[ 1626.373956] hardirqs last  enabled at (998185): [<ffffffffae830669>] _raw_spin_unlock_irq+0x29/0x50
[ 1626.373956] hardirqs last disabled at (998186): [<ffffffffae8303ff>] _raw_spin_lock_irq+0xf/0x40
[ 1626.520571] softirqs last  enabled at (993564): [<ffffffffaec0038b>] __do_softirq+0x38b/0x498
[ 1626.520571] softirqs last disabled at (993557): [<ffffffffae0796db>] irq_exit+0xdb/0xf0
[ 1626.520571] Preemption disabled at:
[ 1626.520571] [<0000000000000000>]           (null)
[ 1626.520571] CPU: 2 PID: 86 Comm: kswapd0 Tainted: G      D           5.0.0+ #67
[ 1626.520571] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[ 1626.520571] Call Trace:
[ 1626.520571]  dump_stack+0x67/0x90
[ 1626.520571]  ___might_sleep.cold.78+0xf0/0x104
[ 1626.520571]  exit_signals+0x30/0x2d0
[ 1626.520571]  ? finish_wait+0x80/0x80
[ 1626.520571]  do_exit+0xb0/0xc90
[ 1626.520571]  ? balance_pgdat+0x560/0x560
[ 1626.520571]  ? kthread+0x11b/0x140
[ 1626.520571]  rewind_stack_do_exit+0x17/0x20
[ 1626.579941] note: kswapd0[86] exited with preempt_count 1
[ 1691.170873] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1691.174869] rcu:     5-...0: (6 ticks this GP) idle=4e2/1/0x4000000000000000 softirq=19854/19854 fqs=8114 last_accelerate: 0dd1/4d4e, Nonlazy posted: .L.
[ 1691.174869] rcu:     (detected by 1, t=16254 jiffies, g=163449, q=3759)
[ 1691.174869] Sending NMI from CPU 1 to CPUs 5:
[ 1691.174869] NMI backtrace for cpu 5
[ 1691.174869] CPU: 5 PID: 6477 Comm: ld Tainted: G      D W         5.0.0+ #67
[ 1691.174869] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[ 1691.174869] RIP: 0010:queued_spin_lock_slowpath+0x2b/0x1e0
[ 1691.174869] Code: 1f 44 00 00 41 54 55 53 48 89 fb 0f 1f 44 00 00 ba 01 00 00 00 8b 03 85 c0 75 0d f0 0f b1 13 85 c0 75 f2 5b 5d 41 5c c3 f3 90 <eb> e9 81 fe 00 01 00 00 74 44 81 e6 00 ff ff ff 75 71 f0 0f ba 2b
[ 1691.174869] RSP: 0018:ffffa051c8dafb88 EFLAGS: 00000002
[ 1691.174869] RAX: 0000000000000001 RBX: ffff8cdb47804b80 RCX: 605139ec00000000
[ 1691.174869] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8cdb47804b80
[ 1691.174869] RBP: 0000000000000246 R08: 0000000091512c45 R09: 0000000000000001
[ 1691.174869] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[ 1691.174869] R13: ffffffffae1d4fe0 R14: ffff8cdb47800000 R15: ffffe0f8030067c0
[ 1691.174869] FS:  00007f9b6e162b80(0000) GS:ffff8cdb43340000(0000) knlGS:0000000000000000
[ 1691.174869] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1691.174869] CR2: 00007f9b5648b000 CR3: 00000000e07f0006 CR4: 00000000003606e0
[ 1691.174869] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1691.174869] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1691.174869] Call Trace:
[ 1691.174869]  do_raw_spin_lock+0xab/0xb0
[ 1691.174869]  _raw_spin_lock_irqsave+0x40/0x50
[ 1691.174869]  ? pagevec_lru_move_fn+0x6c/0xd0
[ 1691.174869]  pagevec_lru_move_fn+0x6c/0xd0
[ 1691.174869]  __lru_cache_add+0x6b/0xa0
[ 1691.174869]  add_to_page_cache_lru+0x76/0xc0
[ 1691.174869]  pagecache_get_page+0xf2/0x2d0
[ 1691.174869]  grab_cache_page_write_begin+0x1c/0x40
[ 1691.174869]  ext4_da_write_begin+0xe5/0x500
[ 1691.174869]  generic_perform_write+0xf4/0x1c0
[ 1691.174869]  __generic_file_write_iter+0xfa/0x1c0
[ 1691.174869]  ? generic_write_checks+0x4c/0xb0
[ 1691.174869]  ext4_file_write_iter+0xc6/0x3f0
[ 1691.174869]  new_sync_write+0x115/0x180
[ 1691.174869]  vfs_write+0xb7/0x1b0
[ 1691.174869]  ksys_write+0x52/0xc0
[ 1691.174869]  do_syscall_64+0x5e/0x200
[ 1691.174869]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1691.174869] RIP: 0033:0x7f9b6e48afd4
[ 1691.174869] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 29 f7 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 1691.174869] RSP: 002b:00007ffcb3925b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1691.174869] RAX: ffffffffffffffda RBX: 000000000dfd4000 RCX: 00007f9b6e48afd4
[ 1691.174869] RDX: 000000000dfd4000 RSI: 00007f9b5e07a3a0 RDI: 000000000000004e
[ 1691.174869] RBP: 00007f9b5e07a3a0 R08: 000000000dfd4000 R09: 0000000000000000
[ 1691.174869] R10: 000000000000000d R11: 0000000000000246 R12: 0000563ea61763a0
[ 1691.174869] R13: 000000000dfd4000 R14: 000000000dfd4e20 R15: 00007f9b6e561760


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: isolate_lru_pages(): kernel BUG at mm/vmscan.c:1689!
  2019-05-01 23:49 isolate_lru_pages(): kernel BUG at mm/vmscan.c:1689! Dexuan Cui
@ 2019-05-02 12:55 ` Michal Hocko
  2019-05-02 18:24   ` Dexuan Cui
  2019-05-08 10:33 ` Mel Gorman
  1 sibling, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2019-05-02 12:55 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: linux-mm, Andrew Morton, Kirill Tkhai, Johannes Weiner,
	Vladimir Davydov, Roman Gushchin, Hugh Dickins, Andrey Ryabinin,
	Mel Gorman, dchinner, Greg Thelen, Kuo-Hsin Yang

On Wed 01-05-19 23:49:10, Dexuan Cui wrote:
> Hi,
> Today I got the below BUG in isolate_lru_pages() when building the kernel.
> 
> My current running kernel, which exhibits the BUG, is based on the mainline kernel's commit 
> 262d6a9a63a3 ("Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip").
> 
> Looks nobody else reported the issue recently.
> 
> So far I only hit the BUG once and I don't know how to reproduce it again, so this is just a FYI.

This is really unexpected. This BUG means that __isolate_lru_page must
have returned EINVAL which implies a non-LRU page on the LRU or an
unevictable page on an evictable LRU list. I am currently travelling so
I cannot have deeper look. There was a similar report which triggered a
different BUG_ON in the reclaim path also stumbling over but that was on
an really old kernel with out of tree patches so it is not clear what
happened there. Do you think it would be possible to setup a crash dump
or apply the following debugging patch in case it reproduces?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a5ad0b35ab8e..289493986f6c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1686,6 +1686,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			continue;
 
 		default:
+			dump_page(page);
 			BUG();
 		}
 	}

Thanks for the report.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: isolate_lru_pages(): kernel BUG at mm/vmscan.c:1689!
  2019-05-02 12:55 ` Michal Hocko
@ 2019-05-02 18:24   ` Dexuan Cui
  0 siblings, 0 replies; 5+ messages in thread
From: Dexuan Cui @ 2019-05-02 18:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Kirill Tkhai, Johannes Weiner,
	Vladimir Davydov, Roman Gushchin, Hugh Dickins, Andrey Ryabinin,
	Mel Gorman, dchinner, Greg Thelen, Kuo-Hsin Yang

> From: Michal Hocko <mhocko@suse.com>
> Sent: Thursday, May 2, 2019 5:55 AM
> > ...
> > So far I only hit the BUG once and I don't know how to reproduce it again, so
> this is just a FYI.
> 
> ...
> Do you think it would be possible to setup a crash dump
> or apply the following debugging patch in case it reproduces?
> Michal Hocko

Now I applied the "dump_page(page);" and let's see if I'll hit the BUG again.

BTW, I'm developing some code to support hibernation for Linux VM running
on Hyper-V. I don't think my own change causes the BUG, as my change does not
touch the mm system or any file system code at all.

Thanks,
-- Dexuan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: isolate_lru_pages(): kernel BUG at mm/vmscan.c:1689!
  2019-05-01 23:49 isolate_lru_pages(): kernel BUG at mm/vmscan.c:1689! Dexuan Cui
  2019-05-02 12:55 ` Michal Hocko
@ 2019-05-08 10:33 ` Mel Gorman
  2019-05-08 15:44   ` Dexuan Cui
  1 sibling, 1 reply; 5+ messages in thread
From: Mel Gorman @ 2019-05-08 10:33 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: linux-mm, Andrew Morton, Kirill Tkhai, Michal Hocko,
	Johannes Weiner, Vladimir Davydov, Roman Gushchin, Hugh Dickins,
	Andrey Ryabinin, dchinner, Greg Thelen, Kuo-Hsin Yang

On Wed, May 01, 2019 at 11:49:10PM +0000, Dexuan Cui wrote:
> Hi,
> Today I got the below BUG in isolate_lru_pages() when building the kernel.
> 
> My current running kernel, which exhibits the BUG, is based on the mainline kernel's commit 
> 262d6a9a63a3 ("Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip").
> 
> Looks nobody else reported the issue recently.
> 

That is missing some fixes that were merged for 5.1, particularly
6b0868c820ff ("mm/compaction.c: correct zone boundary handling when
resetting pageblock skip hints"). Can you try reproducing this under 5.1
at least?

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: isolate_lru_pages(): kernel BUG at mm/vmscan.c:1689!
  2019-05-08 10:33 ` Mel Gorman
@ 2019-05-08 15:44   ` Dexuan Cui
  0 siblings, 0 replies; 5+ messages in thread
From: Dexuan Cui @ 2019-05-08 15:44 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Andrew Morton, Kirill Tkhai, Michal Hocko,
	Johannes Weiner, Vladimir Davydov, Roman Gushchin, Hugh Dickins,
	Andrey Ryabinin, dchinner, Greg Thelen, Kuo-Hsin Yang

> From: Mel Gorman <mgorman@techsingularity.net>
> Sent: Wednesday, May 8, 2019 3:33 AM
> On Wed, May 01, 2019 at 11:49:10PM +0000, Dexuan Cui wrote:
> > Hi,
> > Today I got the below BUG in isolate_lru_pages() when building the kernel.
> >
> > My current running kernel, which exhibits the BUG, is based on the mainline
> kernel's commit
> > 262d6a9a63a3 ("Merge branch 'x86-urgent-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip").
> >
> > Looks nobody else reported the issue recently.
> >
> That is missing some fixes that were merged for 5.1, particularly
> 6b0868c820ff ("mm/compaction.c: correct zone boundary handling when
> resetting pageblock skip hints"). Can you try reproducing this under 5.1
> at least?
> 
> --
> Mel Gorman
> SUSE Labs

So far I only reproduced the issue once, and I don't know how to repro it again.

If I repro it again, I'll move to v5.1 with the "dump_page(page);" change.

Thanks,
-- Dexuan


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-05-08 15:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-01 23:49 isolate_lru_pages(): kernel BUG at mm/vmscan.c:1689! Dexuan Cui
2019-05-02 12:55 ` Michal Hocko
2019-05-02 18:24   ` Dexuan Cui
2019-05-08 10:33 ` Mel Gorman
2019-05-08 15:44   ` Dexuan Cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).