All of lore.kernel.org
 help / color / mirror / Atom feed
* OOM in v4.8
@ 2016-10-12  6:54 ` Aaron Lu
  0 siblings, 0 replies; 16+ messages in thread
From: Aaron Lu @ 2016-10-12  6:54 UTC (permalink / raw)
  To: Linux MM; +Cc: lkp, Huang Ying

[-- Attachment #1: Type: text/plain, Size: 6850 bytes --]

Hello,

There is a chromeswap test case:
https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/platform_CompressedSwapPerf

We have done small changes and ported it to our LKP environment:
https://github.com/aaronlu/chromeswap

The test starts nr_procs processes and let them each allocate some
memory equally with realloc, so anonymous pages are used. When the
pre-specified swap_target is reached, the allocation will stop. The
total allocation size is: MemFree + swap_target * SwapTotal.
After allocation, a random process is selected to touch its memory to
trigger swap in/out.

For this test, nr_procs is 50 and swap_target is 50%.
The test box has 8G memory where 4G is used as a pmem block device and
created as the swap partition.

There is OOM occured for this test recently so I did more tests:
on v4.6, 10 tests all pass;
on v4.7, 2 tests OOMed out of 10 tests;
on v4.8, 6 tests OOMed out of 10 tests;
on 101105b1717f, which is yersterday's Linus' master branch head,
1 test OOMed out of 10 tests.

SO things are much better than v4.8 now.

When OOM occurred, there is still enough swap space though:

kern  :warn  : [   38.708419] proc-vmstat invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
kern  :info  : [   38.720644] proc-vmstat cpuset=/ mems_allowed=0
kern  :warn  : [   38.726404] CPU: 5 PID: 500 Comm: proc-vmstat Not tainted 4.8.0 #1
kern  :warn  : [   38.733731] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 12/05/2013
kern  :warn  : [   38.742114]  0000000000000000 ffff8800c106fb60 ffffffff8144f659 ffff8800c106fcf0
kern  :warn  : [   38.750709]  ffff88021c2325c0 ffff8800c106fbc8 ffffffff81209a0c 01018800c106fb70
kern  :warn  : [   38.759267]  ffffffff81ee14a0 0000000000000015 ffffffff81e43340 0000000000000206
kern  :warn  : [   38.767794] Call Trace:
kern  :warn  : [   38.771323]  [<ffffffff8144f659>] dump_stack+0x63/0x8a
kern  :warn  : [   38.777520]  [<ffffffff81209a0c>] dump_header+0x5c/0x1ff
kern  :warn  : [   38.783874]  [<ffffffff81181e3c>] oom_kill_process+0x22c/0x410
kern  :warn  : [   38.790731]  [<ffffffff810886a5>] ? has_capability_noaudit+0x25/0x40
kern  :warn  : [   38.798140]  [<ffffffff8118248a>] out_of_memory+0x41a/0x430
kern  :warn  : [   38.804722]  [<ffffffff811877db>] __alloc_pages_slowpath+0xa7b/0xaa0
kern  :warn  : [   38.812034]  [<ffffffff81187aab>] __alloc_pages_nodemask+0x2ab/0x2f0
kern  :warn  : [   38.819344]  [<ffffffff8107bf2e>] copy_process+0x11e/0x1990
kern  :warn  : [   38.826558]  [<ffffffff811e7696>] ? kmem_cache_alloc+0x1a6/0x1c0
kern  :warn  : [   38.833483]  [<ffffffff813e4427>] ? selinux_file_alloc_security+0x37/0x60
kern  :warn  : [   38.841172]  [<ffffffff813e4427>] ? selinux_file_alloc_security+0x37/0x60
kern  :warn  : [   38.848858]  [<ffffffff8107d96a>] _do_fork+0xca/0x3f0
kern  :warn  : [   38.854793]  [<ffffffff8122d367>] ? __fd_install+0x37/0x100
kern  :warn  : [   38.861231]  [<ffffffff8107dd39>] SyS_clone+0x19/0x20
kern  :warn  : [   38.867122]  [<ffffffff81003bb7>] do_syscall_64+0x67/0x160
kern  :warn  : [   38.873459]  [<ffffffff819341e1>] entry_SYSCALL64_slow_path+0x25/0x25
kern  :warn  : [   38.880744] Mem-Info:
kern  :warn  : [   38.883875] active_anon:622526 inactive_anon:154230 isolated_anon:0
                               active_file:0 inactive_file:1 isolated_file:0
                               unevictable:94198 dirty:0 writeback:0 unstable:3
                               slab_reclaimable:59989 slab_unreclaimable:6489
                               mapped:6022 shmem:257 pagetables:3956 bounce:0
                               free:17325 free_pcp:357 free_cma:897
kern  :warn  : [   38.920992] Node 0 active_anon:2477952kB inactive_anon:619360kB active_file:0kB inactive_file:4kB unevictable:376792kB isolated(anon):0kB isolated(file):0kB mapped:24088kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 12288kB anon_thp: 1028kB writeback_tmp:0kB unstable:12kB pages_scanned:0 all_unreclaimable? no
kern  :warn  : [   38.952034] Node 0 DMA free:2008kB min:280kB low:348kB high:416kB active_anon:1112kB inactive_anon:28kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:12704kB slab_unreclaimable:48kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
kern  :warn  : [   38.984654] lowmem_reserve[]: 0 3430 3524 3524 3524
kern  :warn  : [   38.990371] Node 0 DMA32 free:83292kB min:61984kB low:77480kB high:92976kB active_anon:2395900kB inactive_anon:454596kB active_file:0kB inactive_file:4kB unevictable:314016kB writepending:0kB present:3578492kB managed:3512924kB mlocked:92kB slab_reclaimable:218640kB slab_unreclaimable:6036kB kernel_stack:1744kB pagetables:14040kB bounce:0kB free_pcp:2160kB local_pcp:36kB free_cma:0kB
kern  :warn  : [   39.027044] lowmem_reserve[]: 0 0 94 94 94
kern  :warn  : [   39.031921] Node 0 Normal free:5448kB min:5316kB low:6644kB high:7972kB active_anon:61364kB inactive_anon:162752kB active_file:0kB inactive_file:0kB unevictable:62776kB writepending:0kB present:505856kB managed:420724kB mlocked:2124kB slab_reclaimable:17396kB slab_unreclaimable:19876kB kernel_stack:2992kB pagetables:1784kB bounce:0kB free_pcp:60kB local_pcp:0kB free_cma:4004kB
kern  :warn  : [   39.067344] lowmem_reserve[]: 0 0 0 0 0
kern  :warn  : [   39.072034] Node 0 DMA: 47*4kB (E) 9*8kB (E) 3*16kB (H) 1*32kB (H) 1*64kB (H) 1*128kB (H) 0*256kB 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 2068kB
kern  :warn  : [   39.087289] Node 0 DMA32: 9514*4kB (UME) 3085*8kB (ME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62736kB
kern  :warn  : [   39.101342] Node 0 Normal: 203*4kB (UEC) 400*8kB (UEC) 107*16kB (HC) 5*32kB (C) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5884kB
kern  :info  : [   39.116367] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
kern  :info  : [   39.126827] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
kern  :warn  : [   39.136481] 94403 total pagecache pages
kern  :warn  : [   39.141514] 10 pages in swap cache
kern  :warn  : [   39.145995] Swap cache stats: add 6689320, delete 6689310, find 6080/3054405
kern  :warn  : [   39.154135] Free swap  = 1845896kB
kern  :warn  : [   39.158638] Total swap = 4194300kB
kern  :warn  : [   39.163152] 1025083 pages RAM
kern  :warn  : [   39.167218] 0 pages HighMem/MovableOnly
kern  :warn  : [   39.172167] 37696 pages reserved
kern  :warn  : [   39.176504] 51200 pages cma reserved
kern  :warn  : [   39.181223] 0 pages hwpoisoned

I wonder if this OOM could/should be avoided?

Full dmesg for v4.7, v4.8 and 101105b1717f are attached, please let me
know if you need more information.

Thanks,
Aaron

[-- Attachment #2: v4.7.xz --]
[-- Type: application/x-xz, Size: 21796 bytes --]

[-- Attachment #3: v4.8.xz --]
[-- Type: application/x-xz, Size: 23552 bytes --]

[-- Attachment #4: 101105b1717f.xz --]
[-- Type: application/x-xz, Size: 21840 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* OOM in v4.8
@ 2016-10-12  6:54 ` Aaron Lu
  0 siblings, 0 replies; 16+ messages in thread
From: Aaron Lu @ 2016-10-12  6:54 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 6941 bytes --]

Hello,

There is a chromeswap test case:
https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/platform_CompressedSwapPerf

We have done small changes and ported it to our LKP environment:
https://github.com/aaronlu/chromeswap

The test starts nr_procs processes and let them each allocate some
memory equally with realloc, so anonymous pages are used. When the
pre-specified swap_target is reached, the allocation will stop. The
total allocation size is: MemFree + swap_target * SwapTotal.
After allocation, a random process is selected to touch its memory to
trigger swap in/out.

For this test, nr_procs is 50 and swap_target is 50%.
The test box has 8G memory where 4G is used as a pmem block device and
created as the swap partition.

There is OOM occured for this test recently so I did more tests:
on v4.6, 10 tests all pass;
on v4.7, 2 tests OOMed out of 10 tests;
on v4.8, 6 tests OOMed out of 10 tests;
on 101105b1717f, which is yersterday's Linus' master branch head,
1 test OOMed out of 10 tests.

SO things are much better than v4.8 now.

When OOM occurred, there is still enough swap space though:

kern  :warn  : [   38.708419] proc-vmstat invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
kern  :info  : [   38.720644] proc-vmstat cpuset=/ mems_allowed=0
kern  :warn  : [   38.726404] CPU: 5 PID: 500 Comm: proc-vmstat Not tainted 4.8.0 #1
kern  :warn  : [   38.733731] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 12/05/2013
kern  :warn  : [   38.742114]  0000000000000000 ffff8800c106fb60 ffffffff8144f659 ffff8800c106fcf0
kern  :warn  : [   38.750709]  ffff88021c2325c0 ffff8800c106fbc8 ffffffff81209a0c 01018800c106fb70
kern  :warn  : [   38.759267]  ffffffff81ee14a0 0000000000000015 ffffffff81e43340 0000000000000206
kern  :warn  : [   38.767794] Call Trace:
kern  :warn  : [   38.771323]  [<ffffffff8144f659>] dump_stack+0x63/0x8a
kern  :warn  : [   38.777520]  [<ffffffff81209a0c>] dump_header+0x5c/0x1ff
kern  :warn  : [   38.783874]  [<ffffffff81181e3c>] oom_kill_process+0x22c/0x410
kern  :warn  : [   38.790731]  [<ffffffff810886a5>] ? has_capability_noaudit+0x25/0x40
kern  :warn  : [   38.798140]  [<ffffffff8118248a>] out_of_memory+0x41a/0x430
kern  :warn  : [   38.804722]  [<ffffffff811877db>] __alloc_pages_slowpath+0xa7b/0xaa0
kern  :warn  : [   38.812034]  [<ffffffff81187aab>] __alloc_pages_nodemask+0x2ab/0x2f0
kern  :warn  : [   38.819344]  [<ffffffff8107bf2e>] copy_process+0x11e/0x1990
kern  :warn  : [   38.826558]  [<ffffffff811e7696>] ? kmem_cache_alloc+0x1a6/0x1c0
kern  :warn  : [   38.833483]  [<ffffffff813e4427>] ? selinux_file_alloc_security+0x37/0x60
kern  :warn  : [   38.841172]  [<ffffffff813e4427>] ? selinux_file_alloc_security+0x37/0x60
kern  :warn  : [   38.848858]  [<ffffffff8107d96a>] _do_fork+0xca/0x3f0
kern  :warn  : [   38.854793]  [<ffffffff8122d367>] ? __fd_install+0x37/0x100
kern  :warn  : [   38.861231]  [<ffffffff8107dd39>] SyS_clone+0x19/0x20
kern  :warn  : [   38.867122]  [<ffffffff81003bb7>] do_syscall_64+0x67/0x160
kern  :warn  : [   38.873459]  [<ffffffff819341e1>] entry_SYSCALL64_slow_path+0x25/0x25
kern  :warn  : [   38.880744] Mem-Info:
kern  :warn  : [   38.883875] active_anon:622526 inactive_anon:154230 isolated_anon:0
                               active_file:0 inactive_file:1 isolated_file:0
                               unevictable:94198 dirty:0 writeback:0 unstable:3
                               slab_reclaimable:59989 slab_unreclaimable:6489
                               mapped:6022 shmem:257 pagetables:3956 bounce:0
                               free:17325 free_pcp:357 free_cma:897
kern  :warn  : [   38.920992] Node 0 active_anon:2477952kB inactive_anon:619360kB active_file:0kB inactive_file:4kB unevictable:376792kB isolated(anon):0kB isolated(file):0kB mapped:24088kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 12288kB anon_thp: 1028kB writeback_tmp:0kB unstable:12kB pages_scanned:0 all_unreclaimable? no
kern  :warn  : [   38.952034] Node 0 DMA free:2008kB min:280kB low:348kB high:416kB active_anon:1112kB inactive_anon:28kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:12704kB slab_unreclaimable:48kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
kern  :warn  : [   38.984654] lowmem_reserve[]: 0 3430 3524 3524 3524
kern  :warn  : [   38.990371] Node 0 DMA32 free:83292kB min:61984kB low:77480kB high:92976kB active_anon:2395900kB inactive_anon:454596kB active_file:0kB inactive_file:4kB unevictable:314016kB writepending:0kB present:3578492kB managed:3512924kB mlocked:92kB slab_reclaimable:218640kB slab_unreclaimable:6036kB kernel_stack:1744kB pagetables:14040kB bounce:0kB free_pcp:2160kB local_pcp:36kB free_cma:0kB
kern  :warn  : [   39.027044] lowmem_reserve[]: 0 0 94 94 94
kern  :warn  : [   39.031921] Node 0 Normal free:5448kB min:5316kB low:6644kB high:7972kB active_anon:61364kB inactive_anon:162752kB active_file:0kB inactive_file:0kB unevictable:62776kB writepending:0kB present:505856kB managed:420724kB mlocked:2124kB slab_reclaimable:17396kB slab_unreclaimable:19876kB kernel_stack:2992kB pagetables:1784kB bounce:0kB free_pcp:60kB local_pcp:0kB free_cma:4004kB
kern  :warn  : [   39.067344] lowmem_reserve[]: 0 0 0 0 0
kern  :warn  : [   39.072034] Node 0 DMA: 47*4kB (E) 9*8kB (E) 3*16kB (H) 1*32kB (H) 1*64kB (H) 1*128kB (H) 0*256kB 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 2068kB
kern  :warn  : [   39.087289] Node 0 DMA32: 9514*4kB (UME) 3085*8kB (ME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62736kB
kern  :warn  : [   39.101342] Node 0 Normal: 203*4kB (UEC) 400*8kB (UEC) 107*16kB (HC) 5*32kB (C) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5884kB
kern  :info  : [   39.116367] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
kern  :info  : [   39.126827] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
kern  :warn  : [   39.136481] 94403 total pagecache pages
kern  :warn  : [   39.141514] 10 pages in swap cache
kern  :warn  : [   39.145995] Swap cache stats: add 6689320, delete 6689310, find 6080/3054405
kern  :warn  : [   39.154135] Free swap  = 1845896kB
kern  :warn  : [   39.158638] Total swap = 4194300kB
kern  :warn  : [   39.163152] 1025083 pages RAM
kern  :warn  : [   39.167218] 0 pages HighMem/MovableOnly
kern  :warn  : [   39.172167] 37696 pages reserved
kern  :warn  : [   39.176504] 51200 pages cma reserved
kern  :warn  : [   39.181223] 0 pages hwpoisoned

I wonder if this OOM could/should be avoided?

Full dmesg for v4.7, v4.8 and 101105b1717f are attached, please let me
know if you need more information.

Thanks,
Aaron

[-- Attachment #2: v4.7.xz --]
[-- Type: application/x-xz, Size: 21796 bytes --]

[-- Attachment #3: v4.8.xz --]
[-- Type: application/x-xz, Size: 23552 bytes --]

[-- Attachment #4: 101105b1717f.xz --]
[-- Type: application/x-xz, Size: 21840 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
  2016-10-12  6:54 ` Aaron Lu
@ 2016-10-12  7:44   ` Michal Hocko
  -1 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2016-10-12  7:44 UTC (permalink / raw)
  To: Aaron Lu; +Cc: Linux MM, lkp, Huang Ying, Vlastimil Babka

[Let's CC Vlastimil]

On Wed 12-10-16 14:54:23, Aaron Lu wrote:
> Hello,
> 
> There is a chromeswap test case:
> https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/platform_CompressedSwapPerf
> 
> We have done small changes and ported it to our LKP environment:
> https://github.com/aaronlu/chromeswap
> 
> The test starts nr_procs processes and let them each allocate some
> memory equally with realloc, so anonymous pages are used. When the
> pre-specified swap_target is reached, the allocation will stop. The
> total allocation size is: MemFree + swap_target * SwapTotal.
> After allocation, a random process is selected to touch its memory to
> trigger swap in/out.
> 
> For this test, nr_procs is 50 and swap_target is 50%.
> The test box has 8G memory where 4G is used as a pmem block device and
> created as the swap partition.
> 
> There is OOM occured for this test recently so I did more tests:
> on v4.6, 10 tests all pass;
> on v4.7, 2 tests OOMed out of 10 tests;
> on v4.8, 6 tests OOMed out of 10 tests;
> on 101105b1717f, which is yersterday's Linus' master branch head,
> 1 test OOMed out of 10 tests.

Could you try to retest with the current linux-next please?
 
> SO things are much better than v4.8 now.
> 
> When OOM occurred, there is still enough swap space though:
> 
> kern  :warn  : [   38.708419] proc-vmstat invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
[...]
> kern  :warn  : [   38.880744] Mem-Info:
> kern  :warn  : [   38.883875] active_anon:622526 inactive_anon:154230 isolated_anon:0
>                                active_file:0 inactive_file:1 isolated_file:0
>                                unevictable:94198 dirty:0 writeback:0 unstable:3
>                                slab_reclaimable:59989 slab_unreclaimable:6489
>                                mapped:6022 shmem:257 pagetables:3956 bounce:0
>                                free:17325 free_pcp:357 free_cma:897
[...]
> kern  :warn  : [   38.952034] Node 0 DMA free:2008kB min:280kB low:348kB high:416kB active_anon:1112kB inactive_anon:28kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:12704kB slab_unreclaimable:48kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> kern  :warn  : [   38.984654] lowmem_reserve[]: 0 3430 3524 3524 3524
> kern  :warn  : [   38.990371] Node 0 DMA32 free:83292kB min:61984kB low:77480kB high:92976kB active_anon:2395900kB inactive_anon:454596kB active_file:0kB inactive_file:4kB unevictable:314016kB writepending:0kB present:3578492kB managed:3512924kB mlocked:92kB slab_reclaimable:218640kB slab_unreclaimable:6036kB kernel_stack:1744kB pagetables:14040kB bounce:0kB free_pcp:2160kB local_pcp:36kB free_cma:0kB
> kern  :warn  : [   39.027044] lowmem_reserve[]: 0 0 94 94 94
> kern  :warn  : [   39.031921] Node 0 Normal free:5448kB min:5316kB low:6644kB high:7972kB active_anon:61364kB inactive_anon:162752kB active_file:0kB inactive_file:0kB unevictable:62776kB writepending:0kB present:505856kB managed:420724kB mlocked:2124kB slab_reclaimable:17396kB slab_unreclaimable:19876kB kernel_stack:2992kB pagetables:1784kB bounce:0kB free_pcp:60kB local_pcp:0kB free_cma:4004kB
> kern  :warn  : [   39.067344] lowmem_reserve[]: 0 0 0 0 0
> kern  :warn  : [   39.072034] Node 0 DMA: 47*4kB (E) 9*8kB (E) 3*16kB (H) 1*32kB (H) 1*64kB (H) 1*128kB (H) 0*256kB 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 2068kB
> kern  :warn  : [   39.087289] Node 0 DMA32: 9514*4kB (UME) 3085*8kB (ME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62736kB
> kern  :warn  : [   39.101342] Node 0 Normal: 203*4kB (UEC) 400*8kB (UEC) 107*16kB (HC) 5*32kB (C) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5884kB

OK, so your system is close to min watermarks and requesting order-2
page while those cannot be allocated because GFP_KERNEL (aka unmovable
allocation) cannot fall back to CMA reserved blocks. Do you see the same
when CMA is not involved?

Anyway, 4.8 had temporarily disable the compaction feedback for the oom
declaration and used watermark based estimation. 4.9 will have the
compaction feedback approach back along with many compaction
improvements so it is definitely worth retesting with linux-next or
4.9-rc1.

> kern  :info  : [   39.116367] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> kern  :info  : [   39.126827] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> kern  :warn  : [   39.136481] 94403 total pagecache pages
> kern  :warn  : [   39.141514] 10 pages in swap cache
> kern  :warn  : [   39.145995] Swap cache stats: add 6689320, delete 6689310, find 6080/3054405
> kern  :warn  : [   39.154135] Free swap  = 1845896kB
> kern  :warn  : [   39.158638] Total swap = 4194300kB
> kern  :warn  : [   39.163152] 1025083 pages RAM
> kern  :warn  : [   39.167218] 0 pages HighMem/MovableOnly
> kern  :warn  : [   39.172167] 37696 pages reserved
> kern  :warn  : [   39.176504] 51200 pages cma reserved
> kern  :warn  : [   39.181223] 0 pages hwpoisoned
> 
> I wonder if this OOM could/should be avoided?

The system is highly fragmented and low on memory but there is a lot of
anonymous memory which we should at least try to compact into contiguous
blocks so I believe we should be able to cope with that much better.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
@ 2016-10-12  7:44   ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2016-10-12  7:44 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 5596 bytes --]

[Let's CC Vlastimil]

On Wed 12-10-16 14:54:23, Aaron Lu wrote:
> Hello,
> 
> There is a chromeswap test case:
> https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/platform_CompressedSwapPerf
> 
> We have done small changes and ported it to our LKP environment:
> https://github.com/aaronlu/chromeswap
> 
> The test starts nr_procs processes and let them each allocate some
> memory equally with realloc, so anonymous pages are used. When the
> pre-specified swap_target is reached, the allocation will stop. The
> total allocation size is: MemFree + swap_target * SwapTotal.
> After allocation, a random process is selected to touch its memory to
> trigger swap in/out.
> 
> For this test, nr_procs is 50 and swap_target is 50%.
> The test box has 8G memory where 4G is used as a pmem block device and
> created as the swap partition.
> 
> There is OOM occured for this test recently so I did more tests:
> on v4.6, 10 tests all pass;
> on v4.7, 2 tests OOMed out of 10 tests;
> on v4.8, 6 tests OOMed out of 10 tests;
> on 101105b1717f, which is yersterday's Linus' master branch head,
> 1 test OOMed out of 10 tests.

Could you try to retest with the current linux-next please?
 
> SO things are much better than v4.8 now.
> 
> When OOM occurred, there is still enough swap space though:
> 
> kern  :warn  : [   38.708419] proc-vmstat invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
[...]
> kern  :warn  : [   38.880744] Mem-Info:
> kern  :warn  : [   38.883875] active_anon:622526 inactive_anon:154230 isolated_anon:0
>                                active_file:0 inactive_file:1 isolated_file:0
>                                unevictable:94198 dirty:0 writeback:0 unstable:3
>                                slab_reclaimable:59989 slab_unreclaimable:6489
>                                mapped:6022 shmem:257 pagetables:3956 bounce:0
>                                free:17325 free_pcp:357 free_cma:897
[...]
> kern  :warn  : [   38.952034] Node 0 DMA free:2008kB min:280kB low:348kB high:416kB active_anon:1112kB inactive_anon:28kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:12704kB slab_unreclaimable:48kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> kern  :warn  : [   38.984654] lowmem_reserve[]: 0 3430 3524 3524 3524
> kern  :warn  : [   38.990371] Node 0 DMA32 free:83292kB min:61984kB low:77480kB high:92976kB active_anon:2395900kB inactive_anon:454596kB active_file:0kB inactive_file:4kB unevictable:314016kB writepending:0kB present:3578492kB managed:3512924kB mlocked:92kB slab_reclaimable:218640kB slab_unreclaimable:6036kB kernel_stack:1744kB pagetables:14040kB bounce:0kB free_pcp:2160kB local_pcp:36kB free_cma:0kB
> kern  :warn  : [   39.027044] lowmem_reserve[]: 0 0 94 94 94
> kern  :warn  : [   39.031921] Node 0 Normal free:5448kB min:5316kB low:6644kB high:7972kB active_anon:61364kB inactive_anon:162752kB active_file:0kB inactive_file:0kB unevictable:62776kB writepending:0kB present:505856kB managed:420724kB mlocked:2124kB slab_reclaimable:17396kB slab_unreclaimable:19876kB kernel_stack:2992kB pagetables:1784kB bounce:0kB free_pcp:60kB local_pcp:0kB free_cma:4004kB
> kern  :warn  : [   39.067344] lowmem_reserve[]: 0 0 0 0 0
> kern  :warn  : [   39.072034] Node 0 DMA: 47*4kB (E) 9*8kB (E) 3*16kB (H) 1*32kB (H) 1*64kB (H) 1*128kB (H) 0*256kB 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 2068kB
> kern  :warn  : [   39.087289] Node 0 DMA32: 9514*4kB (UME) 3085*8kB (ME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62736kB
> kern  :warn  : [   39.101342] Node 0 Normal: 203*4kB (UEC) 400*8kB (UEC) 107*16kB (HC) 5*32kB (C) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5884kB

OK, so your system is close to min watermarks and requesting order-2
page while those cannot be allocated because GFP_KERNEL (aka unmovable
allocation) cannot fall back to CMA reserved blocks. Do you see the same
when CMA is not involved?

Anyway, 4.8 had temporarily disable the compaction feedback for the oom
declaration and used watermark based estimation. 4.9 will have the
compaction feedback approach back along with many compaction
improvements so it is definitely worth retesting with linux-next or
4.9-rc1.

> kern  :info  : [   39.116367] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> kern  :info  : [   39.126827] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> kern  :warn  : [   39.136481] 94403 total pagecache pages
> kern  :warn  : [   39.141514] 10 pages in swap cache
> kern  :warn  : [   39.145995] Swap cache stats: add 6689320, delete 6689310, find 6080/3054405
> kern  :warn  : [   39.154135] Free swap  = 1845896kB
> kern  :warn  : [   39.158638] Total swap = 4194300kB
> kern  :warn  : [   39.163152] 1025083 pages RAM
> kern  :warn  : [   39.167218] 0 pages HighMem/MovableOnly
> kern  :warn  : [   39.172167] 37696 pages reserved
> kern  :warn  : [   39.176504] 51200 pages cma reserved
> kern  :warn  : [   39.181223] 0 pages hwpoisoned
> 
> I wonder if this OOM could/should be avoided?

The system is highly fragmented and low on memory but there is a lot of
anonymous memory which we should at least try to compact into contiguous
blocks so I believe we should be able to cope with that much better.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
  2016-10-12  7:44   ` Michal Hocko
@ 2016-10-12  8:00     ` Michal Hocko
  -1 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2016-10-12  8:00 UTC (permalink / raw)
  To: Aaron Lu; +Cc: Linux MM, lkp, Huang Ying, Vlastimil Babka

On Wed 12-10-16 09:44:11, Michal Hocko wrote:
> [Let's CC Vlastimil]
> 
> On Wed 12-10-16 14:54:23, Aaron Lu wrote:
> > Hello,
> > 
> > There is a chromeswap test case:
> > https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/platform_CompressedSwapPerf
> > 
> > We have done small changes and ported it to our LKP environment:
> > https://github.com/aaronlu/chromeswap
> > 
> > The test starts nr_procs processes and let them each allocate some
> > memory equally with realloc, so anonymous pages are used. When the
> > pre-specified swap_target is reached, the allocation will stop. The
> > total allocation size is: MemFree + swap_target * SwapTotal.
> > After allocation, a random process is selected to touch its memory to
> > trigger swap in/out.
> > 
> > For this test, nr_procs is 50 and swap_target is 50%.
> > The test box has 8G memory where 4G is used as a pmem block device and
> > created as the swap partition.
> > 
> > There is OOM occured for this test recently so I did more tests:
> > on v4.6, 10 tests all pass;
> > on v4.7, 2 tests OOMed out of 10 tests;
> > on v4.8, 6 tests OOMed out of 10 tests;
> > on 101105b1717f, which is yersterday's Linus' master branch head,
> > 1 test OOMed out of 10 tests.
> 
> Could you try to retest with the current linux-next please?

And I am obviously blind because you have already tested with
101105b1717f which contains the Andrew patchbomb and so all the relevant
changes. Now that I am lookinig into your log for that kernel there
doesn't seem to be any OOM killer invocation. There is only
kern  :warn  : [  177.175954] perf: page allocation failure: order:2, mode:0x208c020(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO)

which is an atomic high order request that failed which is not all that
unexpected when the system is low on memory. The allocation failure
report is hard to read because of unexpected end-of-lines but I suspect
that again we are not able to allocate because of the CMA standing in
the way. I wouldn't call the above failure critical though.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
@ 2016-10-12  8:00     ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2016-10-12  8:00 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 2128 bytes --]

On Wed 12-10-16 09:44:11, Michal Hocko wrote:
> [Let's CC Vlastimil]
> 
> On Wed 12-10-16 14:54:23, Aaron Lu wrote:
> > Hello,
> > 
> > There is a chromeswap test case:
> > https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/platform_CompressedSwapPerf
> > 
> > We have done small changes and ported it to our LKP environment:
> > https://github.com/aaronlu/chromeswap
> > 
> > The test starts nr_procs processes and let them each allocate some
> > memory equally with realloc, so anonymous pages are used. When the
> > pre-specified swap_target is reached, the allocation will stop. The
> > total allocation size is: MemFree + swap_target * SwapTotal.
> > After allocation, a random process is selected to touch its memory to
> > trigger swap in/out.
> > 
> > For this test, nr_procs is 50 and swap_target is 50%.
> > The test box has 8G memory where 4G is used as a pmem block device and
> > created as the swap partition.
> > 
> > There is OOM occured for this test recently so I did more tests:
> > on v4.6, 10 tests all pass;
> > on v4.7, 2 tests OOMed out of 10 tests;
> > on v4.8, 6 tests OOMed out of 10 tests;
> > on 101105b1717f, which is yersterday's Linus' master branch head,
> > 1 test OOMed out of 10 tests.
> 
> Could you try to retest with the current linux-next please?

And I am obviously blind because you have already tested with
101105b1717f which contains the Andrew patchbomb and so all the relevant
changes. Now that I am lookinig into your log for that kernel there
doesn't seem to be any OOM killer invocation. There is only
kern  :warn  : [  177.175954] perf: page allocation failure: order:2, mode:0x208c020(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO)

which is an atomic high order request that failed which is not all that
unexpected when the system is low on memory. The allocation failure
report is hard to read because of unexpected end-of-lines but I suspect
that again we are not able to allocate because of the CMA standing in
the way. I wouldn't call the above failure critical though.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
  2016-10-12  8:00     ` Michal Hocko
@ 2016-10-12  8:24       ` Aaron Lu
  -1 siblings, 0 replies; 16+ messages in thread
From: Aaron Lu @ 2016-10-12  8:24 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Linux MM, lkp, Huang Ying, Vlastimil Babka

On 10/12/2016 04:00 PM, Michal Hocko wrote:
> On Wed 12-10-16 09:44:11, Michal Hocko wrote:
>> [Let's CC Vlastimil]
>>
>> On Wed 12-10-16 14:54:23, Aaron Lu wrote:
>>> Hello,
>>>
>>> There is a chromeswap test case:
>>> https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/platform_CompressedSwapPerf
>>>
>>> We have done small changes and ported it to our LKP environment:
>>> https://github.com/aaronlu/chromeswap
>>>
>>> The test starts nr_procs processes and let them each allocate some
>>> memory equally with realloc, so anonymous pages are used. When the
>>> pre-specified swap_target is reached, the allocation will stop. The
>>> total allocation size is: MemFree + swap_target * SwapTotal.
>>> After allocation, a random process is selected to touch its memory to
>>> trigger swap in/out.
>>>
>>> For this test, nr_procs is 50 and swap_target is 50%.
>>> The test box has 8G memory where 4G is used as a pmem block device and
>>> created as the swap partition.
>>>
>>> There is OOM occured for this test recently so I did more tests:
>>> on v4.6, 10 tests all pass;
>>> on v4.7, 2 tests OOMed out of 10 tests;
>>> on v4.8, 6 tests OOMed out of 10 tests;
>>> on 101105b1717f, which is yersterday's Linus' master branch head,
>>> 1 test OOMed out of 10 tests.
>>
>> Could you try to retest with the current linux-next please?
> 
> And I am obviously blind because you have already tested with
> 101105b1717f which contains the Andrew patchbomb and so all the relevant
> changes. Now that I am lookinig into your log for that kernel there
> doesn't seem to be any OOM killer invocation. There is only
> kern  :warn  : [  177.175954] perf: page allocation failure: order:2, mode:0x208c020(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO)

Oh right, perf may fail but that shouldn't make the test be terminated.
I'll need to check why OOM is marked for that test.

Another possibility is, OOM occurred later when the chromeswap test is
requesting memory but for some reason, the log isn't properly saved.

> 
> which is an atomic high order request that failed which is not all that
> unexpected when the system is low on memory. The allocation failure
> report is hard to read because of unexpected end-of-lines but I suspect

Sorry about that, I'll try to find out why dmesg is saved so ugly on
that test box.

> that again we are not able to allocate because of the CMA standing in
> the way. I wouldn't call the above failure critical though.
 
I'll test that commit and v4.8 again with cma=0 added to cmdline.

Thanks for taking a look at this.

Regards,
Aaron

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
@ 2016-10-12  8:24       ` Aaron Lu
  0 siblings, 0 replies; 16+ messages in thread
From: Aaron Lu @ 2016-10-12  8:24 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 2657 bytes --]

On 10/12/2016 04:00 PM, Michal Hocko wrote:
> On Wed 12-10-16 09:44:11, Michal Hocko wrote:
>> [Let's CC Vlastimil]
>>
>> On Wed 12-10-16 14:54:23, Aaron Lu wrote:
>>> Hello,
>>>
>>> There is a chromeswap test case:
>>> https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/platform_CompressedSwapPerf
>>>
>>> We have done small changes and ported it to our LKP environment:
>>> https://github.com/aaronlu/chromeswap
>>>
>>> The test starts nr_procs processes and let them each allocate some
>>> memory equally with realloc, so anonymous pages are used. When the
>>> pre-specified swap_target is reached, the allocation will stop. The
>>> total allocation size is: MemFree + swap_target * SwapTotal.
>>> After allocation, a random process is selected to touch its memory to
>>> trigger swap in/out.
>>>
>>> For this test, nr_procs is 50 and swap_target is 50%.
>>> The test box has 8G memory where 4G is used as a pmem block device and
>>> created as the swap partition.
>>>
>>> There is OOM occured for this test recently so I did more tests:
>>> on v4.6, 10 tests all pass;
>>> on v4.7, 2 tests OOMed out of 10 tests;
>>> on v4.8, 6 tests OOMed out of 10 tests;
>>> on 101105b1717f, which is yersterday's Linus' master branch head,
>>> 1 test OOMed out of 10 tests.
>>
>> Could you try to retest with the current linux-next please?
> 
> And I am obviously blind because you have already tested with
> 101105b1717f which contains the Andrew patchbomb and so all the relevant
> changes. Now that I am lookinig into your log for that kernel there
> doesn't seem to be any OOM killer invocation. There is only
> kern  :warn  : [  177.175954] perf: page allocation failure: order:2, mode:0x208c020(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO)

Oh right, perf may fail but that shouldn't make the test be terminated.
I'll need to check why OOM is marked for that test.

Another possibility is, OOM occurred later when the chromeswap test is
requesting memory but for some reason, the log isn't properly saved.

> 
> which is an atomic high order request that failed which is not all that
> unexpected when the system is low on memory. The allocation failure
> report is hard to read because of unexpected end-of-lines but I suspect

Sorry about that, I'll try to find out why dmesg is saved so ugly on
that test box.

> that again we are not able to allocate because of the CMA standing in
> the way. I wouldn't call the above failure critical though.
 
I'll test that commit and v4.8 again with cma=0 added to cmdline.

Thanks for taking a look at this.

Regards,
Aaron

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
  2016-10-12  8:24       ` Aaron Lu
@ 2016-10-12  8:43         ` Michal Hocko
  -1 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2016-10-12  8:43 UTC (permalink / raw)
  To: Aaron Lu; +Cc: Linux MM, lkp, Huang Ying, Vlastimil Babka

On Wed 12-10-16 16:24:47, Aaron Lu wrote:
> On 10/12/2016 04:00 PM, Michal Hocko wrote:
[...]
> > which is an atomic high order request that failed which is not all that
> > unexpected when the system is low on memory. The allocation failure
> > report is hard to read because of unexpected end-of-lines but I suspect
> 
> Sorry about that, I'll try to find out why dmesg is saved so ugly on
> that test box.

Not your fault. This seems to be 4bcc595ccd80 ("printk: reinstate
KERN_CONT for printing continuation lines")

> > that again we are not able to allocate because of the CMA standing in
> > the way. I wouldn't call the above failure critical though.
>  
> I'll test that commit and v4.8 again with cma=0 added to cmdline.

Thanks!

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
@ 2016-10-12  8:43         ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2016-10-12  8:43 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 791 bytes --]

On Wed 12-10-16 16:24:47, Aaron Lu wrote:
> On 10/12/2016 04:00 PM, Michal Hocko wrote:
[...]
> > which is an atomic high order request that failed which is not all that
> > unexpected when the system is low on memory. The allocation failure
> > report is hard to read because of unexpected end-of-lines but I suspect
> 
> Sorry about that, I'll try to find out why dmesg is saved so ugly on
> that test box.

Not your fault. This seems to be 4bcc595ccd80 ("printk: reinstate
KERN_CONT for printing continuation lines")

> > that again we are not able to allocate because of the CMA standing in
> > the way. I wouldn't call the above failure critical though.
>  
> I'll test that commit and v4.8 again with cma=0 added to cmdline.

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
  2016-10-12  8:43         ` Michal Hocko
@ 2016-10-12 13:38           ` Aaron Lu
  -1 siblings, 0 replies; 16+ messages in thread
From: Aaron Lu @ 2016-10-12 13:38 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Linux MM, lkp, Huang Ying, Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 1176 bytes --]

On Wed, Oct 12, 2016 at 10:43:42AM +0200, Michal Hocko wrote:
> On Wed 12-10-16 16:24:47, Aaron Lu wrote:
> > On 10/12/2016 04:00 PM, Michal Hocko wrote:
> [...]
> > > which is an atomic high order request that failed which is not all that
> > > unexpected when the system is low on memory. The allocation failure
> > > report is hard to read because of unexpected end-of-lines but I suspect
> > 
> > Sorry about that, I'll try to find out why dmesg is saved so ugly on
> > that test box.
> 
> Not your fault. This seems to be 4bcc595ccd80 ("printk: reinstate
> KERN_CONT for printing continuation lines")
> 
> > > that again we are not able to allocate because of the CMA standing in
> > > the way. I wouldn't call the above failure critical though.
> >  
> > I'll test that commit and v4.8 again with cma=0 added to cmdline.
> 
> Thanks!

With cma=0:
1 on v4.8, 8 tests OOMed out of 10 tests;
2 on 101105b1717f, 1 test OOMed out of 10 tests as before.

It seems to be worse for v4.8, previouslly it's 6 failures.
For 101105b1717f, it's the same case: perf requested a order 2 atomic
allocation and failed, no OOM killer is invoked.

Both dmesgs are attached.

Thanks,
Aaron

[-- Attachment #2: v4.8.xz --]
[-- Type: application/x-xz, Size: 26780 bytes --]

[-- Attachment #3: 101105b1717f.xz --]
[-- Type: application/x-xz, Size: 21928 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
@ 2016-10-12 13:38           ` Aaron Lu
  0 siblings, 0 replies; 16+ messages in thread
From: Aaron Lu @ 2016-10-12 13:38 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]

On Wed, Oct 12, 2016 at 10:43:42AM +0200, Michal Hocko wrote:
> On Wed 12-10-16 16:24:47, Aaron Lu wrote:
> > On 10/12/2016 04:00 PM, Michal Hocko wrote:
> [...]
> > > which is an atomic high order request that failed which is not all that
> > > unexpected when the system is low on memory. The allocation failure
> > > report is hard to read because of unexpected end-of-lines but I suspect
> > 
> > Sorry about that, I'll try to find out why dmesg is saved so ugly on
> > that test box.
> 
> Not your fault. This seems to be 4bcc595ccd80 ("printk: reinstate
> KERN_CONT for printing continuation lines")
> 
> > > that again we are not able to allocate because of the CMA standing in
> > > the way. I wouldn't call the above failure critical though.
> >  
> > I'll test that commit and v4.8 again with cma=0 added to cmdline.
> 
> Thanks!

With cma=0:
1 on v4.8, 8 tests OOMed out of 10 tests;
2 on 101105b1717f, 1 test OOMed out of 10 tests as before.

It seems to be worse for v4.8, previouslly it's 6 failures.
For 101105b1717f, it's the same case: perf requested a order 2 atomic
allocation and failed, no OOM killer is invoked.

Both dmesgs are attached.

Thanks,
Aaron

[-- Attachment #2: v4.8.xz --]
[-- Type: application/x-xz, Size: 26780 bytes --]

[-- Attachment #3: 101105b1717f.xz --]
[-- Type: application/x-xz, Size: 21928 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
  2016-10-12  8:24       ` Aaron Lu
@ 2016-10-13  6:23         ` Aaron Lu
  -1 siblings, 0 replies; 16+ messages in thread
From: Aaron Lu @ 2016-10-13  6:23 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Linux MM, lkp, Huang Ying, Vlastimil Babka

On 10/12/2016 04:24 PM, Aaron Lu wrote:
> On 10/12/2016 04:00 PM, Michal Hocko wrote:
>> On Wed 12-10-16 09:44:11, Michal Hocko wrote:
>>> [Let's CC Vlastimil]
>>>
>>> On Wed 12-10-16 14:54:23, Aaron Lu wrote:
>>>> Hello,
>>>>
>>>> There is a chromeswap test case:
>>>> https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/platform_CompressedSwapPerf
>>>>
>>>> We have done small changes and ported it to our LKP environment:
>>>> https://github.com/aaronlu/chromeswap
>>>>
>>>> The test starts nr_procs processes and let them each allocate some
>>>> memory equally with realloc, so anonymous pages are used. When the
>>>> pre-specified swap_target is reached, the allocation will stop. The
>>>> total allocation size is: MemFree + swap_target * SwapTotal.
>>>> After allocation, a random process is selected to touch its memory to
>>>> trigger swap in/out.
>>>>
>>>> For this test, nr_procs is 50 and swap_target is 50%.
>>>> The test box has 8G memory where 4G is used as a pmem block device and
>>>> created as the swap partition.
>>>>
>>>> There is OOM occured for this test recently so I did more tests:
>>>> on v4.6, 10 tests all pass;
>>>> on v4.7, 2 tests OOMed out of 10 tests;
>>>> on v4.8, 6 tests OOMed out of 10 tests;
>>>> on 101105b1717f, which is yersterday's Linus' master branch head,
>>>> 1 test OOMed out of 10 tests.
>>>
>>> Could you try to retest with the current linux-next please?
>>
>> And I am obviously blind because you have already tested with
>> 101105b1717f which contains the Andrew patchbomb and so all the relevant
>> changes. Now that I am lookinig into your log for that kernel there
>> doesn't seem to be any OOM killer invocation. There is only
>> kern  :warn  : [  177.175954] perf: page allocation failure: order:2, mode:0x208c020(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO)
> 
> Oh right, perf may fail but that shouldn't make the test be terminated.
> I'll need to check why OOM is marked for that test.

There is a monitor in our test infrastructure that periodically checks
dmesg for messages like "out of memory", "page allocation failure", etc.
And if those messages are found, the test is believed not trustworthy
and killed since most of our tests are performance related.

That is the reason why "perf page allocation failure" caused the test to
be marked OOM. I tried to not start perf and with commit 101105b1717f,
10 tests finished without any OOM failures.

Thanks,
Aaron

> 
> Another possibility is, OOM occurred later when the chromeswap test is
> requesting memory but for some reason, the log isn't properly saved.
> 
>>
>> which is an atomic high order request that failed which is not all that
>> unexpected when the system is low on memory. The allocation failure
>> report is hard to read because of unexpected end-of-lines but I suspect
> 
> Sorry about that, I'll try to find out why dmesg is saved so ugly on
> that test box.
> 
>> that again we are not able to allocate because of the CMA standing in
>> the way. I wouldn't call the above failure critical though.
>  
> I'll test that commit and v4.8 again with cma=0 added to cmdline.
> 
> Thanks for taking a look at this.
> 
> Regards,
> Aaron
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
@ 2016-10-13  6:23         ` Aaron Lu
  0 siblings, 0 replies; 16+ messages in thread
From: Aaron Lu @ 2016-10-13  6:23 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 3276 bytes --]

On 10/12/2016 04:24 PM, Aaron Lu wrote:
> On 10/12/2016 04:00 PM, Michal Hocko wrote:
>> On Wed 12-10-16 09:44:11, Michal Hocko wrote:
>>> [Let's CC Vlastimil]
>>>
>>> On Wed 12-10-16 14:54:23, Aaron Lu wrote:
>>>> Hello,
>>>>
>>>> There is a chromeswap test case:
>>>> https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/platform_CompressedSwapPerf
>>>>
>>>> We have done small changes and ported it to our LKP environment:
>>>> https://github.com/aaronlu/chromeswap
>>>>
>>>> The test starts nr_procs processes and let them each allocate some
>>>> memory equally with realloc, so anonymous pages are used. When the
>>>> pre-specified swap_target is reached, the allocation will stop. The
>>>> total allocation size is: MemFree + swap_target * SwapTotal.
>>>> After allocation, a random process is selected to touch its memory to
>>>> trigger swap in/out.
>>>>
>>>> For this test, nr_procs is 50 and swap_target is 50%.
>>>> The test box has 8G memory where 4G is used as a pmem block device and
>>>> created as the swap partition.
>>>>
>>>> There is OOM occured for this test recently so I did more tests:
>>>> on v4.6, 10 tests all pass;
>>>> on v4.7, 2 tests OOMed out of 10 tests;
>>>> on v4.8, 6 tests OOMed out of 10 tests;
>>>> on 101105b1717f, which is yersterday's Linus' master branch head,
>>>> 1 test OOMed out of 10 tests.
>>>
>>> Could you try to retest with the current linux-next please?
>>
>> And I am obviously blind because you have already tested with
>> 101105b1717f which contains the Andrew patchbomb and so all the relevant
>> changes. Now that I am lookinig into your log for that kernel there
>> doesn't seem to be any OOM killer invocation. There is only
>> kern  :warn  : [  177.175954] perf: page allocation failure: order:2, mode:0x208c020(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO)
> 
> Oh right, perf may fail but that shouldn't make the test be terminated.
> I'll need to check why OOM is marked for that test.

There is a monitor in our test infrastructure that periodically checks
dmesg for messages like "out of memory", "page allocation failure", etc.
And if those messages are found, the test is believed not trustworthy
and killed since most of our tests are performance related.

That is the reason why "perf page allocation failure" caused the test to
be marked OOM. I tried to not start perf and with commit 101105b1717f,
10 tests finished without any OOM failures.

Thanks,
Aaron

> 
> Another possibility is, OOM occurred later when the chromeswap test is
> requesting memory but for some reason, the log isn't properly saved.
> 
>>
>> which is an atomic high order request that failed which is not all that
>> unexpected when the system is low on memory. The allocation failure
>> report is hard to read because of unexpected end-of-lines but I suspect
> 
> Sorry about that, I'll try to find out why dmesg is saved so ugly on
> that test box.
> 
>> that again we are not able to allocate because of the CMA standing in
>> the way. I wouldn't call the above failure critical though.
>  
> I'll test that commit and v4.8 again with cma=0 added to cmdline.
> 
> Thanks for taking a look at this.
> 
> Regards,
> Aaron
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
  2016-10-13  6:23         ` Aaron Lu
@ 2016-10-13  6:34           ` Michal Hocko
  -1 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2016-10-13  6:34 UTC (permalink / raw)
  To: Aaron Lu; +Cc: Linux MM, lkp, Huang Ying, Vlastimil Babka

On Thu 13-10-16 14:23:54, Aaron Lu wrote:
> On 10/12/2016 04:24 PM, Aaron Lu wrote:
> > On 10/12/2016 04:00 PM, Michal Hocko wrote:
[...]
> >> And I am obviously blind because you have already tested with
> >> 101105b1717f which contains the Andrew patchbomb and so all the relevant
> >> changes. Now that I am lookinig into your log for that kernel there
> >> doesn't seem to be any OOM killer invocation. There is only
> >> kern  :warn  : [  177.175954] perf: page allocation failure: order:2, mode:0x208c020(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO)
> > 
> > Oh right, perf may fail but that shouldn't make the test be terminated.
> > I'll need to check why OOM is marked for that test.
> 
> There is a monitor in our test infrastructure that periodically checks
> dmesg for messages like "out of memory", "page allocation failure", etc.
> And if those messages are found, the test is believed not trustworthy
> and killed since most of our tests are performance related.
> 
> That is the reason why "perf page allocation failure" caused the test to
> be marked OOM. I tried to not start perf and with commit 101105b1717f,
> 10 tests finished without any OOM failures.

Thanks for double checking!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: OOM in v4.8
@ 2016-10-13  6:34           ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2016-10-13  6:34 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1246 bytes --]

On Thu 13-10-16 14:23:54, Aaron Lu wrote:
> On 10/12/2016 04:24 PM, Aaron Lu wrote:
> > On 10/12/2016 04:00 PM, Michal Hocko wrote:
[...]
> >> And I am obviously blind because you have already tested with
> >> 101105b1717f which contains the Andrew patchbomb and so all the relevant
> >> changes. Now that I am lookinig into your log for that kernel there
> >> doesn't seem to be any OOM killer invocation. There is only
> >> kern  :warn  : [  177.175954] perf: page allocation failure: order:2, mode:0x208c020(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO)
> > 
> > Oh right, perf may fail but that shouldn't make the test be terminated.
> > I'll need to check why OOM is marked for that test.
> 
> There is a monitor in our test infrastructure that periodically checks
> dmesg for messages like "out of memory", "page allocation failure", etc.
> And if those messages are found, the test is believed not trustworthy
> and killed since most of our tests are performance related.
> 
> That is the reason why "perf page allocation failure" caused the test to
> be marked OOM. I tried to not start perf and with commit 101105b1717f,
> 10 tests finished without any OOM failures.

Thanks for double checking!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-10-13  6:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-12  6:54 OOM in v4.8 Aaron Lu
2016-10-12  6:54 ` Aaron Lu
2016-10-12  7:44 ` Michal Hocko
2016-10-12  7:44   ` Michal Hocko
2016-10-12  8:00   ` Michal Hocko
2016-10-12  8:00     ` Michal Hocko
2016-10-12  8:24     ` Aaron Lu
2016-10-12  8:24       ` Aaron Lu
2016-10-12  8:43       ` Michal Hocko
2016-10-12  8:43         ` Michal Hocko
2016-10-12 13:38         ` Aaron Lu
2016-10-12 13:38           ` Aaron Lu
2016-10-13  6:23       ` Aaron Lu
2016-10-13  6:23         ` Aaron Lu
2016-10-13  6:34         ` Michal Hocko
2016-10-13  6:34           ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.