* [PATCH 1/2] mm: add per-zone lru list stat
@ 2016-07-19 15:50 ` Minchan Kim
0 siblings, 0 replies; 10+ messages in thread
From: Minchan Kim @ 2016-07-19 15:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Vlastimil Babka, Johannes Weiner, Joonsoo Kim,
linux-mm, linux-kernel, Minchan Kim
While I did stress test with hackbench, I got OOM message frequently
which didn't ever happen in zone-lru.
gfp_mask=0x26004c0(GFP_KERNEL|__GFP_REPEAT|__GFP_NOTRACK), order=0
..
..
[<c71a76e2>] __alloc_pages_nodemask+0xe52/0xe60
[<c71f31dc>] ? new_slab+0x39c/0x3b0
[<c71f31dc>] new_slab+0x39c/0x3b0
[<c71f4eca>] ___slab_alloc.constprop.87+0x6da/0x840
[<c763e6fc>] ? __alloc_skb+0x3c/0x260
[<c777e127>] ? _raw_spin_unlock_irq+0x27/0x60
[<c70cebfc>] ? trace_hardirqs_on_caller+0xec/0x1b0
[<c70a1506>] ? finish_task_switch+0xa6/0x220
[<c7219ee0>] ? poll_select_copy_remaining+0x140/0x140
[<c7201645>] __slab_alloc.isra.81.constprop.86+0x40/0x6d
[<c763e6fc>] ? __alloc_skb+0x3c/0x260
[<c71f525c>] kmem_cache_alloc+0x22c/0x260
[<c763e6fc>] ? __alloc_skb+0x3c/0x260
[<c763e6fc>] __alloc_skb+0x3c/0x260
[<c763eece>] alloc_skb_with_frags+0x4e/0x1a0
[<c7638d6a>] sock_alloc_send_pskb+0x16a/0x1b0
[<c770b581>] ? wait_for_unix_gc+0x31/0x90
[<c71cfb1d>] ? alloc_set_pte+0x2ad/0x310
[<c77084dd>] unix_stream_sendmsg+0x28d/0x340
[<c7634dad>] sock_sendmsg+0x2d/0x40
[<c7634e2c>] sock_write_iter+0x6c/0xc0
[<c7204a90>] __vfs_write+0xc0/0x120
[<c72053ab>] vfs_write+0x9b/0x1a0
[<c71cc4a9>] ? __might_fault+0x49/0xa0
[<c72062c4>] SyS_write+0x44/0x90
[<c70036c6>] do_fast_syscall_32+0xa6/0x1e0
[<c777ea2c>] sysenter_past_esp+0x45/0x74
Mem-Info:
active_anon:104698 inactive_anon:105791 isolated_anon:192
active_file:433 inactive_file:283 isolated_file:22
unevictable:0 dirty:0 writeback:296 unstable:0
slab_reclaimable:6389 slab_unreclaimable:78927
mapped:474 shmem:0 pagetables:101426 bounce:0
free:10518 free_pcp:334 free_cma:0
Node 0 active_anon:418792kB inactive_anon:423164kB active_file:1732kB inactive_file:1132kB unevictable:0kB isolated(anon):768kB isolated(file):88kB mapped:1896kB dirty:0kB writeback:1184kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1478632 all_unreclaimable? yes
DMA free:3304kB min:68kB low:84kB high:100kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:4088kB kernel_stack:0kB pagetables:2480kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 809 1965 1965
Normal free:3436kB min:3604kB low:4504kB high:5404kB present:897016kB managed:858460kB mlocked:0kB slab_reclaimable:25556kB slab_unreclaimable:311712kB kernel_stack:164608kB pagetables:30844kB bounce:0kB free_pcp:620kB local_pcp:104kB free_cma:0kB
lowmem_reserve[]: 0 0 9247 9247
HighMem free:33808kB min:512kB low:1796kB high:3080kB present:1183736kB managed:1183736kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:372252kB bounce:0kB free_pcp:428kB local_pcp:72kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 2*4kB (UM) 2*8kB (UM) 0*16kB 1*32kB (U) 1*64kB (U) 2*128kB (UM) 1*256kB (U) 1*512kB (M) 0*1024kB 1*2048kB (U) 0*4096kB = 3192kB
Normal: 33*4kB (MH) 79*8kB (ME) 11*16kB (M) 4*32kB (M) 2*64kB (ME) 2*128kB (EH) 7*256kB (EH) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3244kB
HighMem: 2590*4kB (UM) 1568*8kB (UM) 491*16kB (UM) 60*32kB (UM) 6*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 33064kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
25121 total pagecache pages
24160 pages in swap cache
Swap cache stats: add 86371, delete 62211, find 42865/60187
Free swap = 4015560kB
Total swap = 4192252kB
524186 pages RAM
295934 pages HighMem/MovableOnly
9658 pages reserved
0 pages cma reserved
The order-0 allocation for normal zone failed while there are a lot of
reclaimable memory(i.e., anonymous memory with free swap). I wanted to
analyze the problem but it was hard because we removed per-zone lru stat
so I couldn't know how many of anonymous memory there are in normal/dma
zone.
When we investigate OOM problem, reclaimable memory count is crucial stat
to find a problem. Without it, it's hard to parse the OOM message so I
believe we should keep it.
With per-zone lru stat,
gfp_mask=0x26004c0(GFP_KERNEL|__GFP_REPEAT|__GFP_NOTRACK), order=0
Mem-Info:
active_anon:101103 inactive_anon:102219 isolated_anon:0
active_file:503 inactive_file:544 isolated_file:0
unevictable:0 dirty:0 writeback:34 unstable:0
slab_reclaimable:6298 slab_unreclaimable:74669
mapped:863 shmem:0 pagetables:100998 bounce:0
free:23573 free_pcp:1861 free_cma:0
Node 0 active_anon:404412kB inactive_anon:409040kB active_file:2012kB inactive_file:2176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3452kB dirty:0kB writeback:136kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1320845 all_unreclaimable? yes
DMA free:3296kB min:68kB low:84kB high:100kB active_anon:5540kB inactive_anon:0kB active_file:0kB inactive_file:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:248kB slab_unreclaimable:2628kB kernel_stack:792kB pagetables:2316kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 809 1965 1965
Normal free:3600kB min:3604kB low:4504kB high:5404kB active_anon:86304kB inactive_anon:0kB active_file:160kB inactive_file:376kB present:897016kB managed:858524kB mlocked:0kB slab_reclaimable:24944kB slab_unreclaimable:296048kB kernel_stack:163832kB pagetables:35892kB bounce:0kB free_pcp:3076kB local_pcp:656kB free_cma:0kB
lowmem_reserve[]: 0 0 9247 9247
HighMem free:86156kB min:512kB low:1796kB high:3080kB active_anon:312852kB inactive_anon:410024kB active_file:1924kB inactive_file:2012kB present:1183736kB managed:1183736kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:365784kB bounce:0kB free_pcp:3868kB local_pcp:720kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 8*4kB (UM) 8*8kB (UM) 4*16kB (M) 2*32kB (UM) 2*64kB (UM) 1*128kB (M) 3*256kB (UME) 2*512kB (UE) 1*1024kB (E) 0*2048kB 0*4096kB = 3296kB
Normal: 240*4kB (UME) 160*8kB (UME) 23*16kB (ME) 3*32kB (UE) 3*64kB (UME) 2*128kB (ME) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3408kB
HighMem: 10942*4kB (UM) 3102*8kB (UM) 866*16kB (UM) 76*32kB (UM) 11*64kB (UM) 4*128kB (UM) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 86344kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
54409 total pagecache pages
53215 pages in swap cache
Swap cache stats: add 300982, delete 247765, find 157978/226539
Free swap = 3803244kB
Total swap = 4192252kB
524186 pages RAM
295934 pages HighMem/MovableOnly
9642 pages reserved
0 pages cma reserved
With that, we can see normal zone has a 86M reclaimable memory so we can
know something goes wrong(I will fix the problem in next patch) in reclaim.
Of course, with restored per-zone lru stat, we can revert approximation
[1] did but intentionally, I don't touch it now and will do after getting
comment.
[1] mm, vmstat: remove zone and node double accounting by approximating retries
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
include/linux/mm_inline.h | 2 ++
include/linux/mmzone.h | 5 +++++
mm/page_alloc.c | 8 ++++++++
mm/vmstat.c | 4 ++++
4 files changed, 19 insertions(+)
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index bcc4ed0..9cc130f 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -45,6 +45,8 @@ static __always_inline void __update_lru_size(struct lruvec *lruvec,
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
__mod_node_page_state(pgdat, NR_LRU_BASE + lru, nr_pages);
+ __mod_zone_page_state(&pgdat->node_zones[zid],
+ NR_ZONE_LRU_BASE + lru, nr_pages);
acct_highmem_file_pages(zid, lru, nr_pages);
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e6aca07..bbba91c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -110,6 +110,11 @@ struct zone_padding {
enum zone_stat_item {
/* First 128 byte cacheline (assuming 64 bit words) */
NR_FREE_PAGES,
+ NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */
+ NR_ZONE_INACTIVE_ANON = NR_ZONE_LRU_BASE,
+ NR_ZONE_ACTIVE_ANON,
+ NR_ZONE_INACTIVE_FILE,
+ NR_ZONE_ACTIVE_FILE,
NR_MLOCK, /* mlock()ed pages found and moved off LRU */
NR_SLAB_RECLAIMABLE,
NR_SLAB_UNRECLAIMABLE,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 830ad49..1947dc5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4388,6 +4388,10 @@ void show_free_areas(unsigned int filter)
" min:%lukB"
" low:%lukB"
" high:%lukB"
+ " active_anon:%lukB"
+ " inactive_anon:%lukB"
+ " active_file:%lukB"
+ " inactive_file:%lukB"
" present:%lukB"
" managed:%lukB"
" mlocked:%lukB"
@@ -4405,6 +4409,10 @@ void show_free_areas(unsigned int filter)
K(min_wmark_pages(zone)),
K(low_wmark_pages(zone)),
K(high_wmark_pages(zone)),
+ K(zone_page_state(zone, NR_ZONE_ACTIVE_ANON)),
+ K(zone_page_state(zone, NR_ZONE_INACTIVE_ANON)),
+ K(zone_page_state(zone, NR_ZONE_ACTIVE_FILE)),
+ K(zone_page_state(zone, NR_ZONE_INACTIVE_FILE)),
K(zone->present_pages),
K(zone->managed_pages),
K(zone_page_state(zone, NR_MLOCK)),
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 91ecca9..b263484 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -921,6 +921,10 @@ int fragmentation_index(struct zone *zone, unsigned int order)
const char * const vmstat_text[] = {
/* enum zone_stat_item countes */
"nr_free_pages",
+ "nr_inactive_anon",
+ "nr_active_anon",
+ "nr_inactive_file",
+ "nr_active_file",
"nr_mlock",
"nr_slab_reclaimable",
"nr_slab_unreclaimable",
--
1.9.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 1/2] mm: add per-zone lru list stat
@ 2016-07-19 15:50 ` Minchan Kim
0 siblings, 0 replies; 10+ messages in thread
From: Minchan Kim @ 2016-07-19 15:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Vlastimil Babka, Johannes Weiner, Joonsoo Kim,
linux-mm, linux-kernel, Minchan Kim
While I did stress test with hackbench, I got OOM message frequently
which didn't ever happen in zone-lru.
gfp_mask=0x26004c0(GFP_KERNEL|__GFP_REPEAT|__GFP_NOTRACK), order=0
..
..
[<c71a76e2>] __alloc_pages_nodemask+0xe52/0xe60
[<c71f31dc>] ? new_slab+0x39c/0x3b0
[<c71f31dc>] new_slab+0x39c/0x3b0
[<c71f4eca>] ___slab_alloc.constprop.87+0x6da/0x840
[<c763e6fc>] ? __alloc_skb+0x3c/0x260
[<c777e127>] ? _raw_spin_unlock_irq+0x27/0x60
[<c70cebfc>] ? trace_hardirqs_on_caller+0xec/0x1b0
[<c70a1506>] ? finish_task_switch+0xa6/0x220
[<c7219ee0>] ? poll_select_copy_remaining+0x140/0x140
[<c7201645>] __slab_alloc.isra.81.constprop.86+0x40/0x6d
[<c763e6fc>] ? __alloc_skb+0x3c/0x260
[<c71f525c>] kmem_cache_alloc+0x22c/0x260
[<c763e6fc>] ? __alloc_skb+0x3c/0x260
[<c763e6fc>] __alloc_skb+0x3c/0x260
[<c763eece>] alloc_skb_with_frags+0x4e/0x1a0
[<c7638d6a>] sock_alloc_send_pskb+0x16a/0x1b0
[<c770b581>] ? wait_for_unix_gc+0x31/0x90
[<c71cfb1d>] ? alloc_set_pte+0x2ad/0x310
[<c77084dd>] unix_stream_sendmsg+0x28d/0x340
[<c7634dad>] sock_sendmsg+0x2d/0x40
[<c7634e2c>] sock_write_iter+0x6c/0xc0
[<c7204a90>] __vfs_write+0xc0/0x120
[<c72053ab>] vfs_write+0x9b/0x1a0
[<c71cc4a9>] ? __might_fault+0x49/0xa0
[<c72062c4>] SyS_write+0x44/0x90
[<c70036c6>] do_fast_syscall_32+0xa6/0x1e0
[<c777ea2c>] sysenter_past_esp+0x45/0x74
Mem-Info:
active_anon:104698 inactive_anon:105791 isolated_anon:192
active_file:433 inactive_file:283 isolated_file:22
unevictable:0 dirty:0 writeback:296 unstable:0
slab_reclaimable:6389 slab_unreclaimable:78927
mapped:474 shmem:0 pagetables:101426 bounce:0
free:10518 free_pcp:334 free_cma:0
Node 0 active_anon:418792kB inactive_anon:423164kB active_file:1732kB inactive_file:1132kB unevictable:0kB isolated(anon):768kB isolated(file):88kB mapped:1896kB dirty:0kB writeback:1184kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1478632 all_unreclaimable? yes
DMA free:3304kB min:68kB low:84kB high:100kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:4088kB kernel_stack:0kB pagetables:2480kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 809 1965 1965
Normal free:3436kB min:3604kB low:4504kB high:5404kB present:897016kB managed:858460kB mlocked:0kB slab_reclaimable:25556kB slab_unreclaimable:311712kB kernel_stack:164608kB pagetables:30844kB bounce:0kB free_pcp:620kB local_pcp:104kB free_cma:0kB
lowmem_reserve[]: 0 0 9247 9247
HighMem free:33808kB min:512kB low:1796kB high:3080kB present:1183736kB managed:1183736kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:372252kB bounce:0kB free_pcp:428kB local_pcp:72kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 2*4kB (UM) 2*8kB (UM) 0*16kB 1*32kB (U) 1*64kB (U) 2*128kB (UM) 1*256kB (U) 1*512kB (M) 0*1024kB 1*2048kB (U) 0*4096kB = 3192kB
Normal: 33*4kB (MH) 79*8kB (ME) 11*16kB (M) 4*32kB (M) 2*64kB (ME) 2*128kB (EH) 7*256kB (EH) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3244kB
HighMem: 2590*4kB (UM) 1568*8kB (UM) 491*16kB (UM) 60*32kB (UM) 6*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 33064kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
25121 total pagecache pages
24160 pages in swap cache
Swap cache stats: add 86371, delete 62211, find 42865/60187
Free swap = 4015560kB
Total swap = 4192252kB
524186 pages RAM
295934 pages HighMem/MovableOnly
9658 pages reserved
0 pages cma reserved
The order-0 allocation for normal zone failed while there are a lot of
reclaimable memory(i.e., anonymous memory with free swap). I wanted to
analyze the problem but it was hard because we removed per-zone lru stat
so I couldn't know how many of anonymous memory there are in normal/dma
zone.
When we investigate OOM problem, reclaimable memory count is crucial stat
to find a problem. Without it, it's hard to parse the OOM message so I
believe we should keep it.
With per-zone lru stat,
gfp_mask=0x26004c0(GFP_KERNEL|__GFP_REPEAT|__GFP_NOTRACK), order=0
Mem-Info:
active_anon:101103 inactive_anon:102219 isolated_anon:0
active_file:503 inactive_file:544 isolated_file:0
unevictable:0 dirty:0 writeback:34 unstable:0
slab_reclaimable:6298 slab_unreclaimable:74669
mapped:863 shmem:0 pagetables:100998 bounce:0
free:23573 free_pcp:1861 free_cma:0
Node 0 active_anon:404412kB inactive_anon:409040kB active_file:2012kB inactive_file:2176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3452kB dirty:0kB writeback:136kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1320845 all_unreclaimable? yes
DMA free:3296kB min:68kB low:84kB high:100kB active_anon:5540kB inactive_anon:0kB active_file:0kB inactive_file:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:248kB slab_unreclaimable:2628kB kernel_stack:792kB pagetables:2316kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 809 1965 1965
Normal free:3600kB min:3604kB low:4504kB high:5404kB active_anon:86304kB inactive_anon:0kB active_file:160kB inactive_file:376kB present:897016kB managed:858524kB mlocked:0kB slab_reclaimable:24944kB slab_unreclaimable:296048kB kernel_stack:163832kB pagetables:35892kB bounce:0kB free_pcp:3076kB local_pcp:656kB free_cma:0kB
lowmem_reserve[]: 0 0 9247 9247
HighMem free:86156kB min:512kB low:1796kB high:3080kB active_anon:312852kB inactive_anon:410024kB active_file:1924kB inactive_file:2012kB present:1183736kB managed:1183736kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:365784kB bounce:0kB free_pcp:3868kB local_pcp:720kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 8*4kB (UM) 8*8kB (UM) 4*16kB (M) 2*32kB (UM) 2*64kB (UM) 1*128kB (M) 3*256kB (UME) 2*512kB (UE) 1*1024kB (E) 0*2048kB 0*4096kB = 3296kB
Normal: 240*4kB (UME) 160*8kB (UME) 23*16kB (ME) 3*32kB (UE) 3*64kB (UME) 2*128kB (ME) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3408kB
HighMem: 10942*4kB (UM) 3102*8kB (UM) 866*16kB (UM) 76*32kB (UM) 11*64kB (UM) 4*128kB (UM) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 86344kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
54409 total pagecache pages
53215 pages in swap cache
Swap cache stats: add 300982, delete 247765, find 157978/226539
Free swap = 3803244kB
Total swap = 4192252kB
524186 pages RAM
295934 pages HighMem/MovableOnly
9642 pages reserved
0 pages cma reserved
With that, we can see normal zone has a 86M reclaimable memory so we can
know something goes wrong(I will fix the problem in next patch) in reclaim.
Of course, with restored per-zone lru stat, we can revert approximation
[1] did but intentionally, I don't touch it now and will do after getting
comment.
[1] mm, vmstat: remove zone and node double accounting by approximating retries
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
include/linux/mm_inline.h | 2 ++
include/linux/mmzone.h | 5 +++++
mm/page_alloc.c | 8 ++++++++
mm/vmstat.c | 4 ++++
4 files changed, 19 insertions(+)
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index bcc4ed0..9cc130f 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -45,6 +45,8 @@ static __always_inline void __update_lru_size(struct lruvec *lruvec,
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
__mod_node_page_state(pgdat, NR_LRU_BASE + lru, nr_pages);
+ __mod_zone_page_state(&pgdat->node_zones[zid],
+ NR_ZONE_LRU_BASE + lru, nr_pages);
acct_highmem_file_pages(zid, lru, nr_pages);
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e6aca07..bbba91c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -110,6 +110,11 @@ struct zone_padding {
enum zone_stat_item {
/* First 128 byte cacheline (assuming 64 bit words) */
NR_FREE_PAGES,
+ NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */
+ NR_ZONE_INACTIVE_ANON = NR_ZONE_LRU_BASE,
+ NR_ZONE_ACTIVE_ANON,
+ NR_ZONE_INACTIVE_FILE,
+ NR_ZONE_ACTIVE_FILE,
NR_MLOCK, /* mlock()ed pages found and moved off LRU */
NR_SLAB_RECLAIMABLE,
NR_SLAB_UNRECLAIMABLE,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 830ad49..1947dc5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4388,6 +4388,10 @@ void show_free_areas(unsigned int filter)
" min:%lukB"
" low:%lukB"
" high:%lukB"
+ " active_anon:%lukB"
+ " inactive_anon:%lukB"
+ " active_file:%lukB"
+ " inactive_file:%lukB"
" present:%lukB"
" managed:%lukB"
" mlocked:%lukB"
@@ -4405,6 +4409,10 @@ void show_free_areas(unsigned int filter)
K(min_wmark_pages(zone)),
K(low_wmark_pages(zone)),
K(high_wmark_pages(zone)),
+ K(zone_page_state(zone, NR_ZONE_ACTIVE_ANON)),
+ K(zone_page_state(zone, NR_ZONE_INACTIVE_ANON)),
+ K(zone_page_state(zone, NR_ZONE_ACTIVE_FILE)),
+ K(zone_page_state(zone, NR_ZONE_INACTIVE_FILE)),
K(zone->present_pages),
K(zone->managed_pages),
K(zone_page_state(zone, NR_MLOCK)),
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 91ecca9..b263484 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -921,6 +921,10 @@ int fragmentation_index(struct zone *zone, unsigned int order)
const char * const vmstat_text[] = {
/* enum zone_stat_item countes */
"nr_free_pages",
+ "nr_inactive_anon",
+ "nr_active_anon",
+ "nr_inactive_file",
+ "nr_active_file",
"nr_mlock",
"nr_slab_reclaimable",
"nr_slab_unreclaimable",
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] mm: consider per-zone inactive ratio to deactivate
2016-07-19 15:50 ` Minchan Kim
@ 2016-07-19 15:50 ` Minchan Kim
-1 siblings, 0 replies; 10+ messages in thread
From: Minchan Kim @ 2016-07-19 15:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Vlastimil Babka, Johannes Weiner, Joonsoo Kim,
linux-mm, linux-kernel, Minchan Kim
With per-zone lru stat, we could know normal zone has 86M anonymous
memory but OOM happend with non-atomic order-0 allocation. And most of
anonymous pages in normal zone are on active list.
gfp_mask=0x26004c0(GFP_KERNEL|__GFP_REPEAT|__GFP_NOTRACK), order=0
Call Trace:
[<c51a76e2>] __alloc_pages_nodemask+0xe52/0xe60
[<c51f31dc>] ? new_slab+0x39c/0x3b0
[<c51f31dc>] new_slab+0x39c/0x3b0
[<c51f4eca>] ___slab_alloc.constprop.87+0x6da/0x840
[<c563e6fc>] ? __alloc_skb+0x3c/0x260
[<c50b8e93>] ? enqueue_task_fair+0x73/0xbf0
[<c5219ee0>] ? poll_select_copy_remaining+0x140/0x140
[<c5201645>] __slab_alloc.isra.81.constprop.86+0x40/0x6d
[<c563e6fc>] ? __alloc_skb+0x3c/0x260
[<c51f525c>] kmem_cache_alloc+0x22c/0x260
[<c563e6fc>] ? __alloc_skb+0x3c/0x260
[<c563e6fc>] __alloc_skb+0x3c/0x260
[<c563eece>] alloc_skb_with_frags+0x4e/0x1a0
[<c5638d6a>] sock_alloc_send_pskb+0x16a/0x1b0
[<c570b581>] ? wait_for_unix_gc+0x31/0x90
[<c57084dd>] unix_stream_sendmsg+0x28d/0x340
[<c5634dad>] sock_sendmsg+0x2d/0x40
[<c5634e2c>] sock_write_iter+0x6c/0xc0
[<c5204a90>] __vfs_write+0xc0/0x120
[<c52053ab>] vfs_write+0x9b/0x1a0
[<c51cc4a9>] ? __might_fault+0x49/0xa0
[<c52062c4>] SyS_write+0x44/0x90
[<c50036c6>] do_fast_syscall_32+0xa6/0x1e0
Mem-Info:
active_anon:101103 inactive_anon:102219 isolated_anon:0
active_file:503 inactive_file:544 isolated_file:0
unevictable:0 dirty:0 writeback:34 unstable:0
slab_reclaimable:6298 slab_unreclaimable:74669
mapped:863 shmem:0 pagetables:100998 bounce:0
free:23573 free_pcp:1861 free_cma:0
Node 0 active_anon:404412kB inactive_anon:409040kB active_file:2012kB inactive_file:2176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3452kB dirty:0kB writeback:136kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1320845 all_unreclaimable? yes
DMA free:3296kB min:68kB low:84kB high:100kB active_anon:5540kB inactive_anon:0kB active_file:0kB inactive_file:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:248kB slab_unreclaimable:2628kB kernel_stack:792kB pagetables:2316kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 809 1965 1965
Normal free:3600kB min:3604kB low:4504kB high:5404kB active_anon:86304kB inactive_anon:0kB active_file:160kB inactive_file:376kB present:897016kB managed:858524kB mlocked:0kB slab_reclaimable:24944kB slab_unreclaimable:296048kB kernel_stack:163832kB pagetables:35892kB bounce:0kB free_pcp:3076kB local_pcp:656kB free_cma:0kB
lowmem_reserve[]: 0 0 9247 9247
HighMem free:86156kB min:512kB low:1796kB high:3080kB active_anon:312852kB inactive_anon:410024kB active_file:1924kB inactive_file:2012kB present:1183736kB managed:1183736kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:365784kB bounce:0kB free_pcp:3868kB local_pcp:720kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 8*4kB (UM) 8*8kB (UM) 4*16kB (M) 2*32kB (UM) 2*64kB (UM) 1*128kB (M) 3*256kB (UME) 2*512kB (UE) 1*1024kB (E) 0*2048kB 0*4096kB = 3296kB
Normal: 240*4kB (UME) 160*8kB (UME) 23*16kB (ME) 3*32kB (UE) 3*64kB (UME) 2*128kB (ME) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3408kB
HighMem: 10942*4kB (UM) 3102*8kB (UM) 866*16kB (UM) 76*32kB (UM) 11*64kB (UM) 4*128kB (UM) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 86344kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
54409 total pagecache pages
53215 pages in swap cache
Swap cache stats: add 300982, delete 247765, find 157978/226539
Free swap = 3803244kB
Total swap = 4192252kB
524186 pages RAM
295934 pages HighMem/MovableOnly
9642 pages reserved
0 pages cma reserved
The problem was current deactivation logic(i.e., inactive_list_is_low).
Node 0 active_anon:404412kB inactive_anon:409040kB
IOW, (inactive_anon of node * inactive_ratio > active_anon of node) due
to highmem anonymous stat so VM never deactivates normal zone's
anonymous pages.
This patch makes inactive_list_is_low be aware of per-zone inactive raio
so if requested zone or lower zones's LRU list cannot meet the ratio, VM
start to deactivate pages.
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/vmscan.c | 41 ++++++++++++++++++++++++++++-------------
1 file changed, 28 insertions(+), 13 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dffa7e0..e60329b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1955,12 +1955,16 @@ static void shrink_active_list(unsigned long nr_to_scan,
* 1TB 101 10GB
* 10TB 320 32GB
*/
-static bool inactive_list_is_low(struct lruvec *lruvec, bool file)
+static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
+ int classzone_idx)
{
unsigned long inactive_ratio;
unsigned long inactive;
unsigned long active;
unsigned long gb;
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+ struct zone *zone;
+ int zone_idx;
/*
* If we don't have swap space, anonymous page deactivation
@@ -1969,23 +1973,34 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file)
if (!file && !total_swap_pages)
return false;
- inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
- active = lruvec_lru_size(lruvec, file * LRU_FILE + LRU_ACTIVE);
+ for (zone_idx = classzone_idx; zone_idx > 0; zone_idx--) {
+ zone = &pgdat->node_zones[zone_idx];
+ if (!populated_zone(zone))
+ continue;
- gb = (inactive + active) >> (30 - PAGE_SHIFT);
- if (gb)
- inactive_ratio = int_sqrt(10 * gb);
- else
- inactive_ratio = 1;
+ inactive = zone_page_state(zone,
+ NR_ZONE_LRU_BASE + file * LRU_FILE);
+ active = zone_page_state(zone,
+ NR_ZONE_LRU_BASE + file * LRU_FILE + LRU_ACTIVE);
- return inactive * inactive_ratio < active;
+ gb = (inactive + active) >> (30 - PAGE_SHIFT);
+ if (gb)
+ inactive_ratio = int_sqrt(10 * gb);
+ else
+ inactive_ratio = 1;
+
+ if ((inactive * inactive_ratio) < active)
+ return true;
+ }
+
+ return false;
}
static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
struct lruvec *lruvec, struct scan_control *sc)
{
if (is_active_lru(lru)) {
- if (inactive_list_is_low(lruvec, is_file_lru(lru)))
+ if (inactive_list_is_low(lruvec, is_file_lru(lru), sc->reclaim_idx))
shrink_active_list(nr_to_scan, lruvec, sc, lru);
return 0;
}
@@ -2116,7 +2131,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
* lruvec even if it has plenty of old anonymous pages unless the
* system is under heavy pressure.
*/
- if (!inactive_list_is_low(lruvec, true) &&
+ if (!inactive_list_is_low(lruvec, true, sc->reclaim_idx) &&
lruvec_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) {
scan_balance = SCAN_FILE;
goto out;
@@ -2358,7 +2373,7 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
* Even if we did not try to evict anon pages at all, we want to
* rebalance the anon lru active/inactive ratio.
*/
- if (inactive_list_is_low(lruvec, false))
+ if (inactive_list_is_low(lruvec, false, sc->reclaim_idx))
shrink_active_list(SWAP_CLUSTER_MAX, lruvec,
sc, LRU_ACTIVE_ANON);
@@ -3011,7 +3026,7 @@ static void age_active_anon(struct pglist_data *pgdat,
do {
struct lruvec *lruvec = mem_cgroup_lruvec(pgdat, memcg);
- if (inactive_list_is_low(lruvec, false))
+ if (inactive_list_is_low(lruvec, false, sc->reclaim_idx))
shrink_active_list(SWAP_CLUSTER_MAX, lruvec,
sc, LRU_ACTIVE_ANON);
--
1.9.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] mm: consider per-zone inactive ratio to deactivate
@ 2016-07-19 15:50 ` Minchan Kim
0 siblings, 0 replies; 10+ messages in thread
From: Minchan Kim @ 2016-07-19 15:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Vlastimil Babka, Johannes Weiner, Joonsoo Kim,
linux-mm, linux-kernel, Minchan Kim
With per-zone lru stat, we could know normal zone has 86M anonymous
memory but OOM happend with non-atomic order-0 allocation. And most of
anonymous pages in normal zone are on active list.
gfp_mask=0x26004c0(GFP_KERNEL|__GFP_REPEAT|__GFP_NOTRACK), order=0
Call Trace:
[<c51a76e2>] __alloc_pages_nodemask+0xe52/0xe60
[<c51f31dc>] ? new_slab+0x39c/0x3b0
[<c51f31dc>] new_slab+0x39c/0x3b0
[<c51f4eca>] ___slab_alloc.constprop.87+0x6da/0x840
[<c563e6fc>] ? __alloc_skb+0x3c/0x260
[<c50b8e93>] ? enqueue_task_fair+0x73/0xbf0
[<c5219ee0>] ? poll_select_copy_remaining+0x140/0x140
[<c5201645>] __slab_alloc.isra.81.constprop.86+0x40/0x6d
[<c563e6fc>] ? __alloc_skb+0x3c/0x260
[<c51f525c>] kmem_cache_alloc+0x22c/0x260
[<c563e6fc>] ? __alloc_skb+0x3c/0x260
[<c563e6fc>] __alloc_skb+0x3c/0x260
[<c563eece>] alloc_skb_with_frags+0x4e/0x1a0
[<c5638d6a>] sock_alloc_send_pskb+0x16a/0x1b0
[<c570b581>] ? wait_for_unix_gc+0x31/0x90
[<c57084dd>] unix_stream_sendmsg+0x28d/0x340
[<c5634dad>] sock_sendmsg+0x2d/0x40
[<c5634e2c>] sock_write_iter+0x6c/0xc0
[<c5204a90>] __vfs_write+0xc0/0x120
[<c52053ab>] vfs_write+0x9b/0x1a0
[<c51cc4a9>] ? __might_fault+0x49/0xa0
[<c52062c4>] SyS_write+0x44/0x90
[<c50036c6>] do_fast_syscall_32+0xa6/0x1e0
Mem-Info:
active_anon:101103 inactive_anon:102219 isolated_anon:0
active_file:503 inactive_file:544 isolated_file:0
unevictable:0 dirty:0 writeback:34 unstable:0
slab_reclaimable:6298 slab_unreclaimable:74669
mapped:863 shmem:0 pagetables:100998 bounce:0
free:23573 free_pcp:1861 free_cma:0
Node 0 active_anon:404412kB inactive_anon:409040kB active_file:2012kB inactive_file:2176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3452kB dirty:0kB writeback:136kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1320845 all_unreclaimable? yes
DMA free:3296kB min:68kB low:84kB high:100kB active_anon:5540kB inactive_anon:0kB active_file:0kB inactive_file:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:248kB slab_unreclaimable:2628kB kernel_stack:792kB pagetables:2316kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 809 1965 1965
Normal free:3600kB min:3604kB low:4504kB high:5404kB active_anon:86304kB inactive_anon:0kB active_file:160kB inactive_file:376kB present:897016kB managed:858524kB mlocked:0kB slab_reclaimable:24944kB slab_unreclaimable:296048kB kernel_stack:163832kB pagetables:35892kB bounce:0kB free_pcp:3076kB local_pcp:656kB free_cma:0kB
lowmem_reserve[]: 0 0 9247 9247
HighMem free:86156kB min:512kB low:1796kB high:3080kB active_anon:312852kB inactive_anon:410024kB active_file:1924kB inactive_file:2012kB present:1183736kB managed:1183736kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:365784kB bounce:0kB free_pcp:3868kB local_pcp:720kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 8*4kB (UM) 8*8kB (UM) 4*16kB (M) 2*32kB (UM) 2*64kB (UM) 1*128kB (M) 3*256kB (UME) 2*512kB (UE) 1*1024kB (E) 0*2048kB 0*4096kB = 3296kB
Normal: 240*4kB (UME) 160*8kB (UME) 23*16kB (ME) 3*32kB (UE) 3*64kB (UME) 2*128kB (ME) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3408kB
HighMem: 10942*4kB (UM) 3102*8kB (UM) 866*16kB (UM) 76*32kB (UM) 11*64kB (UM) 4*128kB (UM) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 86344kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
54409 total pagecache pages
53215 pages in swap cache
Swap cache stats: add 300982, delete 247765, find 157978/226539
Free swap = 3803244kB
Total swap = 4192252kB
524186 pages RAM
295934 pages HighMem/MovableOnly
9642 pages reserved
0 pages cma reserved
The problem was current deactivation logic(i.e., inactive_list_is_low).
Node 0 active_anon:404412kB inactive_anon:409040kB
IOW, (inactive_anon of node * inactive_ratio > active_anon of node) due
to highmem anonymous stat so VM never deactivates normal zone's
anonymous pages.
This patch makes inactive_list_is_low be aware of per-zone inactive raio
so if requested zone or lower zones's LRU list cannot meet the ratio, VM
start to deactivate pages.
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/vmscan.c | 41 ++++++++++++++++++++++++++++-------------
1 file changed, 28 insertions(+), 13 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dffa7e0..e60329b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1955,12 +1955,16 @@ static void shrink_active_list(unsigned long nr_to_scan,
* 1TB 101 10GB
* 10TB 320 32GB
*/
-static bool inactive_list_is_low(struct lruvec *lruvec, bool file)
+static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
+ int classzone_idx)
{
unsigned long inactive_ratio;
unsigned long inactive;
unsigned long active;
unsigned long gb;
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+ struct zone *zone;
+ int zone_idx;
/*
* If we don't have swap space, anonymous page deactivation
@@ -1969,23 +1973,34 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file)
if (!file && !total_swap_pages)
return false;
- inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
- active = lruvec_lru_size(lruvec, file * LRU_FILE + LRU_ACTIVE);
+ for (zone_idx = classzone_idx; zone_idx > 0; zone_idx--) {
+ zone = &pgdat->node_zones[zone_idx];
+ if (!populated_zone(zone))
+ continue;
- gb = (inactive + active) >> (30 - PAGE_SHIFT);
- if (gb)
- inactive_ratio = int_sqrt(10 * gb);
- else
- inactive_ratio = 1;
+ inactive = zone_page_state(zone,
+ NR_ZONE_LRU_BASE + file * LRU_FILE);
+ active = zone_page_state(zone,
+ NR_ZONE_LRU_BASE + file * LRU_FILE + LRU_ACTIVE);
- return inactive * inactive_ratio < active;
+ gb = (inactive + active) >> (30 - PAGE_SHIFT);
+ if (gb)
+ inactive_ratio = int_sqrt(10 * gb);
+ else
+ inactive_ratio = 1;
+
+ if ((inactive * inactive_ratio) < active)
+ return true;
+ }
+
+ return false;
}
static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
struct lruvec *lruvec, struct scan_control *sc)
{
if (is_active_lru(lru)) {
- if (inactive_list_is_low(lruvec, is_file_lru(lru)))
+ if (inactive_list_is_low(lruvec, is_file_lru(lru), sc->reclaim_idx))
shrink_active_list(nr_to_scan, lruvec, sc, lru);
return 0;
}
@@ -2116,7 +2131,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
* lruvec even if it has plenty of old anonymous pages unless the
* system is under heavy pressure.
*/
- if (!inactive_list_is_low(lruvec, true) &&
+ if (!inactive_list_is_low(lruvec, true, sc->reclaim_idx) &&
lruvec_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) {
scan_balance = SCAN_FILE;
goto out;
@@ -2358,7 +2373,7 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
* Even if we did not try to evict anon pages at all, we want to
* rebalance the anon lru active/inactive ratio.
*/
- if (inactive_list_is_low(lruvec, false))
+ if (inactive_list_is_low(lruvec, false, sc->reclaim_idx))
shrink_active_list(SWAP_CLUSTER_MAX, lruvec,
sc, LRU_ACTIVE_ANON);
@@ -3011,7 +3026,7 @@ static void age_active_anon(struct pglist_data *pgdat,
do {
struct lruvec *lruvec = mem_cgroup_lruvec(pgdat, memcg);
- if (inactive_list_is_low(lruvec, false))
+ if (inactive_list_is_low(lruvec, false, sc->reclaim_idx))
shrink_active_list(SWAP_CLUSTER_MAX, lruvec,
sc, LRU_ACTIVE_ANON);
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] mm: add per-zone lru list stat
2016-07-19 15:50 ` Minchan Kim
@ 2016-07-19 16:48 ` Mel Gorman
-1 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2016-07-19 16:48 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, Vlastimil Babka, Johannes Weiner, Joonsoo Kim,
linux-mm, linux-kernel
On Wed, Jul 20, 2016 at 12:50:32AM +0900, Minchan Kim wrote:
> While I did stress test with hackbench, I got OOM message frequently
> which didn't ever happen in zone-lru.
>
This one also showed pgdat going unreclaimable early. Have you tried any
of the three oom-related patches I sent to Joonsoo to see what impact,
if any, it had?
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] mm: add per-zone lru list stat
@ 2016-07-19 16:48 ` Mel Gorman
0 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2016-07-19 16:48 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, Vlastimil Babka, Johannes Weiner, Joonsoo Kim,
linux-mm, linux-kernel
On Wed, Jul 20, 2016 at 12:50:32AM +0900, Minchan Kim wrote:
> While I did stress test with hackbench, I got OOM message frequently
> which didn't ever happen in zone-lru.
>
This one also showed pgdat going unreclaimable early. Have you tried any
of the three oom-related patches I sent to Joonsoo to see what impact,
if any, it had?
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] mm: add per-zone lru list stat
2016-07-19 16:48 ` Mel Gorman
@ 2016-07-20 0:16 ` Minchan Kim
-1 siblings, 0 replies; 10+ messages in thread
From: Minchan Kim @ 2016-07-20 0:16 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Vlastimil Babka, Johannes Weiner, Joonsoo Kim,
linux-mm, linux-kernel
On Tue, Jul 19, 2016 at 05:48:57PM +0100, Mel Gorman wrote:
> On Wed, Jul 20, 2016 at 12:50:32AM +0900, Minchan Kim wrote:
> > While I did stress test with hackbench, I got OOM message frequently
> > which didn't ever happen in zone-lru.
> >
>
> This one also showed pgdat going unreclaimable early. Have you tried any
> of the three oom-related patches I sent to Joonsoo to see what impact,
> if any, it had?
Before the result, I want to say goal of this patch, again.
Without per-zone lru stat, it's really hard to debug OOM problem in
multiple zones system so regardless of solving the problem, we should add
per-zone lru stat for debuggability of OOM which has been never perfect
solution, ever.
You sent 3 patches in that thread and first one was same I had applied
when I found this problem firstly. It didn't solve the problem.
So I tested last one
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a6f31617a08c..0dc443b52228 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1415,7 +1415,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
LIST_HEAD(pages_skipped);
for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan &&
- !list_empty(src); scan++) {
+ !list_empty(src);) {
struct page *page;
page = lru_to_page(src);
@@ -1428,6 +1428,9 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
nr_skipped[page_zonenum(page)]++;
continue;
}
+`
+ /* Pages skipped do not contribute to scan */
+ scan++;
switch (__isolate_lru_page(page, mode)) {
case 0:
The result is not OOM but hackbench stalls forever.
When I parse vmstat for every 2sec, I found pgskip_high velocity is too
high(i.e., 100000000 pages per 2 sec) while pgscan_direct and pgdeactiation is
really low(i.e., 30 pages per 2 sec).
The reason why it doesn't trigger OOM is a small amout of pages(i.e. 20 pages
per sec) are freed so NR_PAGES_SCANNED is always reset to zero.
>
> --
> Mel Gorman
> SUSE Labs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] mm: add per-zone lru list stat
@ 2016-07-20 0:16 ` Minchan Kim
0 siblings, 0 replies; 10+ messages in thread
From: Minchan Kim @ 2016-07-20 0:16 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Vlastimil Babka, Johannes Weiner, Joonsoo Kim,
linux-mm, linux-kernel
On Tue, Jul 19, 2016 at 05:48:57PM +0100, Mel Gorman wrote:
> On Wed, Jul 20, 2016 at 12:50:32AM +0900, Minchan Kim wrote:
> > While I did stress test with hackbench, I got OOM message frequently
> > which didn't ever happen in zone-lru.
> >
>
> This one also showed pgdat going unreclaimable early. Have you tried any
> of the three oom-related patches I sent to Joonsoo to see what impact,
> if any, it had?
Before the result, I want to say goal of this patch, again.
Without per-zone lru stat, it's really hard to debug OOM problem in
multiple zones system so regardless of solving the problem, we should add
per-zone lru stat for debuggability of OOM which has been never perfect
solution, ever.
You sent 3 patches in that thread and first one was same I had applied
when I found this problem firstly. It didn't solve the problem.
So I tested last one
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a6f31617a08c..0dc443b52228 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1415,7 +1415,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
LIST_HEAD(pages_skipped);
for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan &&
- !list_empty(src); scan++) {
+ !list_empty(src);) {
struct page *page;
page = lru_to_page(src);
@@ -1428,6 +1428,9 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
nr_skipped[page_zonenum(page)]++;
continue;
}
+`
+ /* Pages skipped do not contribute to scan */
+ scan++;
switch (__isolate_lru_page(page, mode)) {
case 0:
The result is not OOM but hackbench stalls forever.
When I parse vmstat for every 2sec, I found pgskip_high velocity is too
high(i.e., 100000000 pages per 2 sec) while pgscan_direct and pgdeactiation is
really low(i.e., 30 pages per 2 sec).
The reason why it doesn't trigger OOM is a small amout of pages(i.e. 20 pages
per sec) are freed so NR_PAGES_SCANNED is always reset to zero.
>
> --
> Mel Gorman
> SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] mm: add per-zone lru list stat
2016-07-20 0:16 ` Minchan Kim
@ 2016-07-20 10:55 ` Mel Gorman
-1 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2016-07-20 10:55 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, Vlastimil Babka, Johannes Weiner, Joonsoo Kim,
linux-mm, linux-kernel
On Wed, Jul 20, 2016 at 09:16:24AM +0900, Minchan Kim wrote:
> On Tue, Jul 19, 2016 at 05:48:57PM +0100, Mel Gorman wrote:
> > On Wed, Jul 20, 2016 at 12:50:32AM +0900, Minchan Kim wrote:
> > > While I did stress test with hackbench, I got OOM message frequently
> > > which didn't ever happen in zone-lru.
> > >
> >
> > This one also showed pgdat going unreclaimable early. Have you tried any
> > of the three oom-related patches I sent to Joonsoo to see what impact,
> > if any, it had?
>
> Before the result, I want to say goal of this patch, again.
> Without per-zone lru stat, it's really hard to debug OOM problem in
> multiple zones system so regardless of solving the problem, we should add
> per-zone lru stat for debuggability of OOM which has been never perfect
> solution, ever.
>
That's not in dispute, I simply wanted to know the impact.
> The result is not OOM but hackbench stalls forever.
Ok, that points to both the premature marking pgdats as unreclaimable
and the inactive rotation are both problems.
I have a series prepared that may or may not address the problem. I'm
trying to reproduce the OOM killer on a 32-bit KVM but so far so luck.
If I fail to reproduce it then I cannot tell if the series has an impact
and may have to post it and hope you and Joonsoo can test it.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] mm: add per-zone lru list stat
@ 2016-07-20 10:55 ` Mel Gorman
0 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2016-07-20 10:55 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, Vlastimil Babka, Johannes Weiner, Joonsoo Kim,
linux-mm, linux-kernel
On Wed, Jul 20, 2016 at 09:16:24AM +0900, Minchan Kim wrote:
> On Tue, Jul 19, 2016 at 05:48:57PM +0100, Mel Gorman wrote:
> > On Wed, Jul 20, 2016 at 12:50:32AM +0900, Minchan Kim wrote:
> > > While I did stress test with hackbench, I got OOM message frequently
> > > which didn't ever happen in zone-lru.
> > >
> >
> > This one also showed pgdat going unreclaimable early. Have you tried any
> > of the three oom-related patches I sent to Joonsoo to see what impact,
> > if any, it had?
>
> Before the result, I want to say goal of this patch, again.
> Without per-zone lru stat, it's really hard to debug OOM problem in
> multiple zones system so regardless of solving the problem, we should add
> per-zone lru stat for debuggability of OOM which has been never perfect
> solution, ever.
>
That's not in dispute, I simply wanted to know the impact.
> The result is not OOM but hackbench stalls forever.
Ok, that points to both the premature marking pgdats as unreclaimable
and the inactive rotation are both problems.
I have a series prepared that may or may not address the problem. I'm
trying to reproduce the OOM killer on a 32-bit KVM but so far so luck.
If I fail to reproduce it then I cannot tell if the series has an impact
and may have to post it and hope you and Joonsoo can test it.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-07-20 10:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19 15:50 [PATCH 1/2] mm: add per-zone lru list stat Minchan Kim
2016-07-19 15:50 ` Minchan Kim
2016-07-19 15:50 ` [PATCH 2/2] mm: consider per-zone inactive ratio to deactivate Minchan Kim
2016-07-19 15:50 ` Minchan Kim
2016-07-19 16:48 ` [PATCH 1/2] mm: add per-zone lru list stat Mel Gorman
2016-07-19 16:48 ` Mel Gorman
2016-07-20 0:16 ` Minchan Kim
2016-07-20 0:16 ` Minchan Kim
2016-07-20 10:55 ` Mel Gorman
2016-07-20 10:55 ` Mel Gorman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.