All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: vbabka@suse.cz, Linux-MM <linux-mm@kvack.org>
Subject: Re: low-memory crash with patch "capture a page under direct compaction"
Date: Tue, 05 Mar 2019 10:13:24 -0500	[thread overview]
Message-ID: <1551798804.7087.7.camel@lca.pw> (raw)
In-Reply-To: <20190305144234.GH9565@techsingularity.net>

On Tue, 2019-03-05 at 14:42 +0000, Mel Gorman wrote:
> On Mon, Mar 04, 2019 at 10:55:04PM -0500, Qian Cai wrote:
> > Reverted the patches below from linux-next seems fixed a crash while running
> > LTP
> > oom01.
> > 
> > 915c005358c1 mm, compaction: Capture a page under direct compaction -fix
> > e492a5711b67 mm, compaction: capture a page under direct compaction
> > 
> > Especially, just removed this chunk along seems fixed the problem.
> > 
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -2227,10 +2227,10 @@ compact_zone(struct compact_control *cc, struct
> > capture_control *capc)
> >                 }
> > 
> >                 /* Stop if a page has been captured */
> > -               if (capc && capc->page) {
> > -                       ret = COMPACT_SUCCESS;
> > -                       break;
> > -               }
> > 
> 
> It's hard to make sense of how this is connected to the bug. The
> out-of-bounds warning would have required page flags to be corrupted
> quite badly or maybe the use of an uninitialised page. How reproducible
> has this been for you? I just ran the test 100 times with UBSAN and page
> alloc debugging enabled and it completed correctly.
> 

I did manage to reproduce this every time by running oom01 within 3 tries on
this x86_64 server and was unable to reproduce on arm64 and ppc64le servers so
far.

# for i in `seq 1 3`; do /opt/ltp/testcases/bin/oom01 ; done

Sometimes, it could trigger different traces.

[  391.704320] SLUB: Unable to allocate memory on node -1,
gfp=0x800(GFP_NOWAIT)
[  391.737794]   cache: kmalloc-64, object size: 64, buffer size: 416,
default order: 2, min order: 0
[  391.778079]   node 0: slabs: 5999, objs: 232851, free: 16
[  391.802926]   node 1: slabs: 4303, objs: 167067, free: 37
[  499.866479] ------------[ cut here ]------------
[  499.866500] BUG: Bad page state in process oom01  pfn:fffffe7a09fffd07
[  499.890013] kernel BUG at mm/page_alloc.c:3124!
[  499.935430] double fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  499.971334] CPU: 0 PID: 1623 Comm: oom01 Tainted: G        W
5.0.0-next-20190305+ #49
[  499.992805]
================================================================================
[  500.009887] Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9,
BIOS U20 10/25/2017
[  500.009901] RIP: 0010:check_memory_region+0x10/0x1e0
[  500.048252] UBSAN: Undefined behaviour in
kernel/locking/qspinlock.c:138:9
[  500.085378] Code: 00 00 00 48 89 e5 e8 ff 3e 9f 00 5d c3 0f 1f 00 66 2e
0f 1f 84 00 00 00 00 00 48 85 f6 0f 84 68 01 00 00 55 0f b6 d2 48 89 e5
<41> 55 41 54 53 e9 b3 00 00 00 48 b8 00 00 00 00 00 00 00 ff 48 39
[  500.107608] index 8190 is out of range for type 'long unsigned int
[256]'
[  500.138462] RSP: 0000:ffff888428f80000 EFLAGS: 00010002
[  500.223186] CPU: 42 PID: 0 Comm: swapper/42 Tainted: G        W
5.0.0-next-20190305+ #49
[  500.253922] RAX: ffff88827fff41c0 RBX: ffff88827fff41c8 RCX:
ffffffff9c0a9468
[  500.253925] RDX: 0000000000000000 RSI: 0000000000000004 RDI:
ffff88827fff41f8
[  500.277367] Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9,
BIOS U20 10/25/2017
[  500.277370] Call Trace:
[  500.318081] RBP: ffff888428f80000 R08: ffffed104fffe840 R09:
ffffed104fffe83f
[  500.318085] R10: ffffed104fffe83f R11: ffff88827fff41fb R12:
ffff88827fff41f8
[  500.349838]  <IRQ>
[  500.381765] R13: ffff88827fff41c8 R14: ffff88842a96f770 R15:
ffff88827fff41c8
[  500.381768] FS:  00007fdfd3559700(0000) GS:ffff8881f3c00000(0000)
knlGS:0000000000000000
[  500.424074]  dump_stack+0x62/0x9a
[  500.435452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  500.435455] CR2: ffff888428f7fff8 CR3: 000000041abca003 CR4:
00000000001606b0
[  500.467546]  ubsan_epilogue+0xd/0x7f
[  500.500039] Call Trace:
[  500.500042] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat
kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci igb
libahci i2c_algo_bit i2c_core libata dm_mirror dm_region_hash dm_log dm_mod
efivarfs
[  500.509058]  __ubsan_handle_out_of_bounds+0x14d/0x192
[  500.541152] ---[ end trace f9ff2b89b6b88c5f ]---
[  500.541155] invalid opcode: 0000 [#2] SMP DEBUG_PAGEALLOC KASAN PTI
[  500.541159] CPU: 10 PID: 262 Comm: kcompactd0 Tainted: G      D W
5.0.0-next-20190305+ #49
[  500.541161] Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9,
BIOS U20 10/25/2017
[  500.541167] RIP: 0010:__isolate_free_page+0x464/0x600
[  500.541170] Code: 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c7 c6 20 6f
0b 9d 48 89 df e8 4a 8b f8 ff 0f 0b 48 c7 c7 a0 32 69 9d e8 51 40 43 00
<0f> 0b 48 c7 c7 e0 31 69 9d e8 43 40 43 00 48 c7 c6 80 71 0b 9d 48
[  500.541172] RSP: 0000:ffff8881f1fdf848 EFLAGS: 00010002
[  500.541175] RAX: 00000000f0000080 RBX: ffffea00064fc000 RCX:
ffff88827fff41d0
[  500.541177] RDX: 1ffffd4000c9f806 RSI: 0000000000000008 RDI:
ffffffff9d9f1640
[  500.541179] RBP: ffff8881f1fdf898 R08: ffffea00064fc000 R09:
ffff8881f1fdfd30
[  500.541181] R10: 0000000000000002 R11: 1ffff1104fffe83b R12:
0000000000000008
[  500.541183] R13: dffffc0000000000 R14: ffff88827fff3000 R15:
0000000000000002
[  500.541185] FS:  0000000000000000(0000) GS:ffff8881f4100000(0000)
knlGS:0000000000000000
[  500.541188] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  500.541190] CR2: 00007fdce416a000 CR3: 000000026ea16002 CR4:
00000000001606a0
[  500.541191] Call Trace:
[  500.541199]  compaction_alloc+0x886/0x25f0
[  500.541221]  unmap_and_move+0x37/0x1e70
[  500.541228]  migrate_pages+0x2ca/0xb20
[  500.541238]  compact_zone+0x19cb/0x3620
[  500.541252]  kcompactd_do_work+0x2df/0x680


  reply	other threads:[~2019-03-05 15:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-05  3:55 low-memory crash with patch "capture a page under direct compaction" Qian Cai
2019-03-05 14:42 ` Mel Gorman
2019-03-05 15:13   ` Qian Cai [this message]
2019-03-05 15:27     ` Mel Gorman
2019-03-06  3:01       ` Qian Cai
2019-03-06  3:14         ` Qian Cai
2019-03-06  9:13           ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1551798804.7087.7.camel@lca.pw \
    --to=cai@lca.pw \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.