linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] vmscan: scan pages until it founds eligible pages
       [not found] <1493700038-27091-1-git-send-email-minchan@kernel.org>
@ 2017-05-02  5:14 ` Minchan Kim
  2017-05-02  7:54   ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Minchan Kim @ 2017-05-02  5:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Mel Gorman, Michal Hocko, kernel-team,
	linux-kernel, linux-mm

Oops, forgot to add lkml and linux-mm.
Sorry for that.
Send it again.

>From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Tue, 2 May 2017 12:34:05 +0900
Subject: [PATCH] vmscan: scan pages until it founds eligible pages

On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote:
There are premature OOM happening. Although there are a ton of free
swap and anonymous LRU list of elgible zones, OOM happened.

With investigation, skipping page of isolate_lru_pages makes reclaim
void because it returns zero nr_taken easily so LRU shrinking is
effectively nothing and just increases priority aggressively.
Finally, OOM happens.

This patch makes isolate_lru_pages try to scan pages until it
encounters eligible zones's pages or too much scan happen(ie,
node's LRU size).

balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
 dump_stack+0x65/0x87
 dump_header.isra.19+0x8f/0x20f
 ? preempt_count_add+0x9e/0xb0
 ? _raw_spin_unlock_irqrestore+0x24/0x40
 oom_kill_process+0x21d/0x3f0
 ? has_capability_noaudit+0x17/0x20
 out_of_memory+0xd8/0x390
 __alloc_pages_slowpath+0xbc1/0xc50
 ? anon_vma_interval_tree_insert+0x84/0x90
 __alloc_pages_nodemask+0x1a5/0x1c0
 pte_alloc_one+0x20/0x50
 __pte_alloc+0x1e/0x110
 __handle_mm_fault+0x919/0x960
 handle_mm_fault+0x77/0x120
 __do_page_fault+0x27a/0x550
 trace_do_page_fault+0x43/0x150
 do_async_page_fault+0x2c/0x90
 async_page_fault+0x28/0x30
RIP: 0033:0x7fc4636bacb8
RSP: 002b:00007fff97c9c4c0 EFLAGS: 00010202
RAX: 00007fc3e818d000 RBX: 00007fc4639f8760 RCX: 00007fc46372e9ca
RDX: 0000000000101002 RSI: 0000000000101000 RDI: 0000000000000000
RBP: 0000000000100010 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000022 R11: 00000000000a3901 R12: 00007fc3e818d010
R13: 0000000000101000 R14: 00007fc4639f87b8 R15: 00007fc4639f87b8
Mem-Info:
active_anon:424716 inactive_anon:65314 isolated_anon:0
 active_file:52 inactive_file:46 isolated_file:0
 unevictable:0 dirty:27 writeback:0 unstable:0
 slab_reclaimable:3967 slab_unreclaimable:4125
 mapped:133 shmem:43 pagetables:1674 bounce:0
 free:4637 free_pcp:225 free_cma:0
Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 992 992 1952
DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
lowmem_reserve[]: 0 0 0 959
Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB
DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB
Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB
378 total pagecache pages
17 pages in swap cache
Swap cache stats: add 17325, delete 17302, find 0/27
Free swap  = 978940kB
Total swap = 1048572kB
524157 pages RAM
0 pages HighMem/MovableOnly
12629 pages reserved
0 pages cma reserved
0 pages hwpoisoned
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  433]     0   433     4904        5      14       3       82             0 upstart-udev-br
[  438]     0   438    12371        5      27       3      191         -1000 systemd-udevd
...

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/vmscan.c | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2314aca47d12..1fec21d155b3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1488,12 +1488,20 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	unsigned long nr_taken = 0;
 	unsigned long nr_zone_taken[MAX_NR_ZONES] = { 0 };
 	unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
+	unsigned long total_skipped = 0;
 	unsigned long skipped = 0;
 	unsigned long scan, nr_pages;
+	unsigned long lru_size;
 	LIST_HEAD(pages_skipped);
 
+	if (!mem_cgroup_disabled())
+		lru_size = mem_cgroup_get_lru_size(lruvec, lru);
+	else
+		lru_size = node_page_state(lruvec_pgdat(lruvec),
+						NR_LRU_BASE + lru);
+
 	for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan &&
-					!list_empty(src); scan++) {
+		!list_empty(src) && (scan + total_skipped < lru_size); scan++) {
 		struct page *page;
 
 		page = lru_to_page(src);
@@ -1502,8 +1510,25 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		VM_BUG_ON_PAGE(!PageLRU(page), page);
 
 		if (page_zonenum(page) > sc->reclaim_idx) {
+			if (skipped > SWAP_CLUSTER_MAX) {
+				int zid;
+
+				list_splice_init(&pages_skipped, src);
+				for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+					if (!nr_skipped[zid])
+						continue;
+					__count_zid_vm_events(PGSCAN_SKIP, zid,
+							nr_skipped[zid]);
+					total_skipped += nr_skipped[zid];
+					nr_skipped[zid] = 0;
+				}
+				skipped = 0;
+			}
+
 			list_move(&page->lru, &pages_skipped);
 			nr_skipped[page_zonenum(page)]++;
+			skipped++;
+			scan--;
 			continue;
 		}
 
@@ -1541,12 +1566,12 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 				continue;
 
 			__count_zid_vm_events(PGSCAN_SKIP, zid, nr_skipped[zid]);
-			skipped += nr_skipped[zid];
+			total_skipped += nr_skipped[zid];
 		}
 	}
-	*nr_scanned = scan;
+	*nr_scanned = scan + total_skipped;
 	trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
-				    scan, skipped, nr_taken, mode, lru);
+				    scan, total_skipped, nr_taken, mode, lru);
 	update_lru_sizes(lruvec, lru, nr_zone_taken);
 	return nr_taken;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] vmscan: scan pages until it founds eligible pages
  2017-05-02  5:14 ` [PATCH] vmscan: scan pages until it founds eligible pages Minchan Kim
@ 2017-05-02  7:54   ` Michal Hocko
  2017-05-02 14:51     ` Minchan Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2017-05-02  7:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, kernel-team,
	linux-kernel, linux-mm

On Tue 02-05-17 14:14:52, Minchan Kim wrote:
> Oops, forgot to add lkml and linux-mm.
> Sorry for that.
> Send it again.
> 
> >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Tue, 2 May 2017 12:34:05 +0900
> Subject: [PATCH] vmscan: scan pages until it founds eligible pages
> 
> On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote:
> There are premature OOM happening. Although there are a ton of free
> swap and anonymous LRU list of elgible zones, OOM happened.
> 
> With investigation, skipping page of isolate_lru_pages makes reclaim
> void because it returns zero nr_taken easily so LRU shrinking is
> effectively nothing and just increases priority aggressively.
> Finally, OOM happens.

I am not really sure I understand the problem you are facing. Could you
be more specific please? What is your configuration etc...

> balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
[...]
> Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> lowmem_reserve[]: 0 992 992 1952
> DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
> lowmem_reserve[]: 0 0 0 959

Hmm DMA32 has sufficient free memory to allow this order-0 request.
Inactive anon lru is basically empty. Why do not we rotate a really
large active anon list? Isn't this the primary problem?

I haven't really looked at the patch deeply yet. It looks quite scary at
first sight though. I would really like to understand what exactly is
going on here before we move to a patch to fix it.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] vmscan: scan pages until it founds eligible pages
  2017-05-02  7:54   ` Michal Hocko
@ 2017-05-02 14:51     ` Minchan Kim
  2017-05-02 15:14       ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Minchan Kim @ 2017-05-02 14:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, kernel-team,
	linux-kernel, linux-mm

Hi Michal,

On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote:
> On Tue 02-05-17 14:14:52, Minchan Kim wrote:
> > Oops, forgot to add lkml and linux-mm.
> > Sorry for that.
> > Send it again.
> > 
> > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001
> > From: Minchan Kim <minchan@kernel.org>
> > Date: Tue, 2 May 2017 12:34:05 +0900
> > Subject: [PATCH] vmscan: scan pages until it founds eligible pages
> > 
> > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote:
> > There are premature OOM happening. Although there are a ton of free
> > swap and anonymous LRU list of elgible zones, OOM happened.
> > 
> > With investigation, skipping page of isolate_lru_pages makes reclaim
> > void because it returns zero nr_taken easily so LRU shrinking is
> > effectively nothing and just increases priority aggressively.
> > Finally, OOM happens.
> 
> I am not really sure I understand the problem you are facing. Could you
> be more specific please? What is your configuration etc...

Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured
movablecore=1G to simulate highmem zone.
Workload is a process consumes 2.2G memory and then random touch the
address space so it makes lots of swap in/out.

> 
> > balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
> [...]
> > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > lowmem_reserve[]: 0 992 992 1952
> > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
> > lowmem_reserve[]: 0 0 0 959
> 
> Hmm DMA32 has sufficient free memory to allow this order-0 request.
> Inactive anon lru is basically empty. Why do not we rotate a really
> large active anon list? Isn't this the primary problem?

It's a side effect by skipping page logic in isolate_lru_pages
I mentioned above in changelog.

The problem is a lot of anonymous memory in movable zone(ie, highmem)
and non-small memory in DMA32 zone. In heavy memory pressure,
requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list
is low so it tries to deactivate pages. For it, first of all, it tries
to isolate pages from active list but there are lots of anonymous pages
from movable zone so skipping logic in isolate_lru_pages works. With
the result, isolate_lru_pages cannot isolate any eligible pages so
reclaim trial is effectively void. It continues to meet OOM.

I'm on long vacation from today so understand if my response is slow.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] vmscan: scan pages until it founds eligible pages
  2017-05-02 14:51     ` Minchan Kim
@ 2017-05-02 15:14       ` Michal Hocko
  2017-05-03  4:48         ` Minchan Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2017-05-02 15:14 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, kernel-team,
	linux-kernel, linux-mm

On Tue 02-05-17 23:51:50, Minchan Kim wrote:
> Hi Michal,
> 
> On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote:
> > On Tue 02-05-17 14:14:52, Minchan Kim wrote:
> > > Oops, forgot to add lkml and linux-mm.
> > > Sorry for that.
> > > Send it again.
> > > 
> > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001
> > > From: Minchan Kim <minchan@kernel.org>
> > > Date: Tue, 2 May 2017 12:34:05 +0900
> > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages
> > > 
> > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote:
> > > There are premature OOM happening. Although there are a ton of free
> > > swap and anonymous LRU list of elgible zones, OOM happened.
> > > 
> > > With investigation, skipping page of isolate_lru_pages makes reclaim
> > > void because it returns zero nr_taken easily so LRU shrinking is
> > > effectively nothing and just increases priority aggressively.
> > > Finally, OOM happens.
> > 
> > I am not really sure I understand the problem you are facing. Could you
> > be more specific please? What is your configuration etc...
> 
> Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured
> movablecore=1G to simulate highmem zone.
> Workload is a process consumes 2.2G memory and then random touch the
> address space so it makes lots of swap in/out.
> 
> > 
> > > balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
> > [...]
> > > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > lowmem_reserve[]: 0 992 992 1952
> > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
> > > lowmem_reserve[]: 0 0 0 959
> > 
> > Hmm DMA32 has sufficient free memory to allow this order-0 request.
> > Inactive anon lru is basically empty. Why do not we rotate a really
> > large active anon list? Isn't this the primary problem?
> 
> It's a side effect by skipping page logic in isolate_lru_pages
> I mentioned above in changelog.
> 
> The problem is a lot of anonymous memory in movable zone(ie, highmem)
> and non-small memory in DMA32 zone.

Such a configuration is questionable on its own. But let't keep this
part alone.

> In heavy memory pressure,
> requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list
> is low so it tries to deactivate pages. For it, first of all, it tries
> to isolate pages from active list but there are lots of anonymous pages
> from movable zone so skipping logic in isolate_lru_pages works. With
> the result, isolate_lru_pages cannot isolate any eligible pages so
> reclaim trial is effectively void. It continues to meet OOM.

But skipped pages should be rotated and we should eventually hit pages
from the right zone(s). Moreover we should scan the full LRU at priority
0 so why exactly we hit the OOM killer?

Anyway [1] has changed this behavior. Are you seeing the issue with this
patch dropped?

[1] http://www.ozlabs.org/~akpm/mmotm/broken-out/revert-mm-vmscan-account-for-skipped-pages-as-a-partial-scan.patch
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] vmscan: scan pages until it founds eligible pages
  2017-05-02 15:14       ` Michal Hocko
@ 2017-05-03  4:48         ` Minchan Kim
  2017-05-03  6:00           ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Minchan Kim @ 2017-05-03  4:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, kernel-team,
	linux-kernel, linux-mm

On Tue, May 02, 2017 at 05:14:36PM +0200, Michal Hocko wrote:
> On Tue 02-05-17 23:51:50, Minchan Kim wrote:
> > Hi Michal,
> > 
> > On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote:
> > > On Tue 02-05-17 14:14:52, Minchan Kim wrote:
> > > > Oops, forgot to add lkml and linux-mm.
> > > > Sorry for that.
> > > > Send it again.
> > > > 
> > > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001
> > > > From: Minchan Kim <minchan@kernel.org>
> > > > Date: Tue, 2 May 2017 12:34:05 +0900
> > > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages
> > > > 
> > > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote:
> > > > There are premature OOM happening. Although there are a ton of free
> > > > swap and anonymous LRU list of elgible zones, OOM happened.
> > > > 
> > > > With investigation, skipping page of isolate_lru_pages makes reclaim
> > > > void because it returns zero nr_taken easily so LRU shrinking is
> > > > effectively nothing and just increases priority aggressively.
> > > > Finally, OOM happens.
> > > 
> > > I am not really sure I understand the problem you are facing. Could you
> > > be more specific please? What is your configuration etc...
> > 
> > Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured
> > movablecore=1G to simulate highmem zone.
> > Workload is a process consumes 2.2G memory and then random touch the
> > address space so it makes lots of swap in/out.
> > 
> > > 
> > > > balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
> > > [...]
> > > > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> > > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > > lowmem_reserve[]: 0 992 992 1952
> > > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
> > > > lowmem_reserve[]: 0 0 0 959
> > > 
> > > Hmm DMA32 has sufficient free memory to allow this order-0 request.
> > > Inactive anon lru is basically empty. Why do not we rotate a really
> > > large active anon list? Isn't this the primary problem?
> > 
> > It's a side effect by skipping page logic in isolate_lru_pages
> > I mentioned above in changelog.
> > 
> > The problem is a lot of anonymous memory in movable zone(ie, highmem)
> > and non-small memory in DMA32 zone.
> 
> Such a configuration is questionable on its own. But let't keep this
> part alone.

It seems you are misunderstood. It's really common on 32bit.
Think of 2G DRAM system on 32bit. Normally, it's 1G normal:1G highmem.
It's almost same with one I configured.

> 
> > In heavy memory pressure,
> > requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list
> > is low so it tries to deactivate pages. For it, first of all, it tries
> > to isolate pages from active list but there are lots of anonymous pages
> > from movable zone so skipping logic in isolate_lru_pages works. With
> > the result, isolate_lru_pages cannot isolate any eligible pages so
> > reclaim trial is effectively void. It continues to meet OOM.
> 
> But skipped pages should be rotated and we should eventually hit pages
> from the right zone(s). Moreover we should scan the full LRU at priority
> 0 so why exactly we hit the OOM killer?

Yes, full scan in priority 0 but keep it in mind that the number of full
LRU pages to scan is one of eligible pages, not all pages of the node.
And isolate_lru_pages have accounted skipped pages as scan count so that
VM cannot isolate any pages of eligible pages in LRU if non-eligible pages
are a lot in the LRU.

> 
> Anyway [1] has changed this behavior. Are you seeing the issue with this
> patch dropped?

Good point. Before the patch, it didn't increase scan count with skipped
pages so with reverting [1], I guess it might work but worry about
isolating lots of skipped pages into temporal pages_skipped list which
might causes premate OOM. Anyway, I will test it when I returns at
office after vacation.

Thanks.

> 
> [1] http://www.ozlabs.org/~akpm/mmotm/broken-out/revert-mm-vmscan-account-for-skipped-pages-as-a-partial-scan.patch
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] vmscan: scan pages until it founds eligible pages
  2017-05-03  4:48         ` Minchan Kim
@ 2017-05-03  6:00           ` Michal Hocko
  2017-05-10  1:46             ` Minchan Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2017-05-03  6:00 UTC (permalink / raw)
  To: Minchan Kim, Johannes Weiner
  Cc: Andrew Morton, Mel Gorman, kernel-team, linux-kernel, linux-mm

On Wed 03-05-17 13:48:09, Minchan Kim wrote:
> On Tue, May 02, 2017 at 05:14:36PM +0200, Michal Hocko wrote:
> > On Tue 02-05-17 23:51:50, Minchan Kim wrote:
> > > Hi Michal,
> > > 
> > > On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote:
> > > > On Tue 02-05-17 14:14:52, Minchan Kim wrote:
> > > > > Oops, forgot to add lkml and linux-mm.
> > > > > Sorry for that.
> > > > > Send it again.
> > > > > 
> > > > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001
> > > > > From: Minchan Kim <minchan@kernel.org>
> > > > > Date: Tue, 2 May 2017 12:34:05 +0900
> > > > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages
> > > > > 
> > > > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote:
> > > > > There are premature OOM happening. Although there are a ton of free
> > > > > swap and anonymous LRU list of elgible zones, OOM happened.
> > > > > 
> > > > > With investigation, skipping page of isolate_lru_pages makes reclaim
> > > > > void because it returns zero nr_taken easily so LRU shrinking is
> > > > > effectively nothing and just increases priority aggressively.
> > > > > Finally, OOM happens.
> > > > 
> > > > I am not really sure I understand the problem you are facing. Could you
> > > > be more specific please? What is your configuration etc...
> > > 
> > > Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured
> > > movablecore=1G to simulate highmem zone.
> > > Workload is a process consumes 2.2G memory and then random touch the
> > > address space so it makes lots of swap in/out.
> > > 
> > > > 
> > > > > balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
> > > > [...]
> > > > > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> > > > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > > > lowmem_reserve[]: 0 992 992 1952
> > > > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
> > > > > lowmem_reserve[]: 0 0 0 959
> > > > 
> > > > Hmm DMA32 has sufficient free memory to allow this order-0 request.
> > > > Inactive anon lru is basically empty. Why do not we rotate a really
> > > > large active anon list? Isn't this the primary problem?
> > > 
> > > It's a side effect by skipping page logic in isolate_lru_pages
> > > I mentioned above in changelog.
> > > 
> > > The problem is a lot of anonymous memory in movable zone(ie, highmem)
> > > and non-small memory in DMA32 zone.
> > 
> > Such a configuration is questionable on its own. But let't keep this
> > part alone.
> 
> It seems you are misunderstood. It's really common on 32bit.

Yes, I am not arguing about 32b systems. It is quite common to see
issues which are inherent to the highmem zone.

> Think of 2G DRAM system on 32bit. Normally, it's 1G normal:1G highmem.
> It's almost same with one I configured.
> 
> > 
> > > In heavy memory pressure,
> > > requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list
> > > is low so it tries to deactivate pages. For it, first of all, it tries
> > > to isolate pages from active list but there are lots of anonymous pages
> > > from movable zone so skipping logic in isolate_lru_pages works. With
> > > the result, isolate_lru_pages cannot isolate any eligible pages so
> > > reclaim trial is effectively void. It continues to meet OOM.
> > 
> > But skipped pages should be rotated and we should eventually hit pages
> > from the right zone(s). Moreover we should scan the full LRU at priority
> > 0 so why exactly we hit the OOM killer?
> 
> Yes, full scan in priority 0 but keep it in mind that the number of full
> LRU pages to scan is one of eligible pages, not all pages of the node.

I have hard time understanding what you are trying to say here.

> And isolate_lru_pages have accounted skipped pages as scan count so that
> VM cannot isolate any pages of eligible pages in LRU if non-eligible pages
> are a lot in the LRU.
> 
> > 
> > Anyway [1] has changed this behavior. Are you seeing the issue with this
> > patch dropped?
> 
> Good point. Before the patch, it didn't increase scan count with skipped
> pages so with reverting [1], I guess it might work but worry about
> isolating lots of skipped pages into temporal pages_skipped list which
> might causes premate OOM. Anyway, I will test it when I returns at
> office after vacation.

I do not think we want to drop this patch. I think we might be good
enough to simply fold this into the patch
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 24efcc20af91..ac146f10f222 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1472,7 +1472,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	LIST_HEAD(pages_skipped);
 
 	for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan &&
-					!list_empty(src); scan++) {
+					!list_empty(src);) {
 		struct page *page;
 
 		page = lru_to_page(src);
@@ -1486,6 +1486,12 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			continue;
 		}
 
+		/*
+		 * Do not count skipped pages because we do want to isolate
+		 * some pages even when the LRU mostly contains ineligible
+		 * pages
+		 */
+		scan++;
 		switch (__isolate_lru_page(page, mode)) {
 		case 0:
 			nr_pages = hpage_nr_pages(page);

What do you think Johannes?

> > [1] http://www.ozlabs.org/~akpm/mmotm/broken-out/revert-mm-vmscan-account-for-skipped-pages-as-a-partial-scan.patch
> > -- 
> > Michal Hocko
> > SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] vmscan: scan pages until it founds eligible pages
  2017-05-03  6:00           ` Michal Hocko
@ 2017-05-10  1:46             ` Minchan Kim
  2017-05-10  6:13               ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Minchan Kim @ 2017-05-10  1:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Andrew Morton, Mel Gorman, kernel-team,
	linux-kernel, linux-mm

On Wed, May 03, 2017 at 08:00:44AM +0200, Michal Hocko wrote:
> On Wed 03-05-17 13:48:09, Minchan Kim wrote:
> > On Tue, May 02, 2017 at 05:14:36PM +0200, Michal Hocko wrote:
> > > On Tue 02-05-17 23:51:50, Minchan Kim wrote:
> > > > Hi Michal,
> > > > 
> > > > On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote:
> > > > > On Tue 02-05-17 14:14:52, Minchan Kim wrote:
> > > > > > Oops, forgot to add lkml and linux-mm.
> > > > > > Sorry for that.
> > > > > > Send it again.
> > > > > > 
> > > > > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001
> > > > > > From: Minchan Kim <minchan@kernel.org>
> > > > > > Date: Tue, 2 May 2017 12:34:05 +0900
> > > > > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages
> > > > > > 
> > > > > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote:
> > > > > > There are premature OOM happening. Although there are a ton of free
> > > > > > swap and anonymous LRU list of elgible zones, OOM happened.
> > > > > > 
> > > > > > With investigation, skipping page of isolate_lru_pages makes reclaim
> > > > > > void because it returns zero nr_taken easily so LRU shrinking is
> > > > > > effectively nothing and just increases priority aggressively.
> > > > > > Finally, OOM happens.
> > > > > 
> > > > > I am not really sure I understand the problem you are facing. Could you
> > > > > be more specific please? What is your configuration etc...
> > > > 
> > > > Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured
> > > > movablecore=1G to simulate highmem zone.
> > > > Workload is a process consumes 2.2G memory and then random touch the
> > > > address space so it makes lots of swap in/out.
> > > > 
> > > > > 
> > > > > > balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
> > > > > [...]
> > > > > > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> > > > > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > > > > lowmem_reserve[]: 0 992 992 1952
> > > > > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
> > > > > > lowmem_reserve[]: 0 0 0 959
> > > > > 
> > > > > Hmm DMA32 has sufficient free memory to allow this order-0 request.
> > > > > Inactive anon lru is basically empty. Why do not we rotate a really
> > > > > large active anon list? Isn't this the primary problem?
> > > > 
> > > > It's a side effect by skipping page logic in isolate_lru_pages
> > > > I mentioned above in changelog.
> > > > 
> > > > The problem is a lot of anonymous memory in movable zone(ie, highmem)
> > > > and non-small memory in DMA32 zone.
> > > 
> > > Such a configuration is questionable on its own. But let't keep this
> > > part alone.
> > 
> > It seems you are misunderstood. It's really common on 32bit.
> 
> Yes, I am not arguing about 32b systems. It is quite common to see
> issues which are inherent to the highmem zone.
> 
> > Think of 2G DRAM system on 32bit. Normally, it's 1G normal:1G highmem.
> > It's almost same with one I configured.
> > 
> > > 
> > > > In heavy memory pressure,
> > > > requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list
> > > > is low so it tries to deactivate pages. For it, first of all, it tries
> > > > to isolate pages from active list but there are lots of anonymous pages
> > > > from movable zone so skipping logic in isolate_lru_pages works. With
> > > > the result, isolate_lru_pages cannot isolate any eligible pages so
> > > > reclaim trial is effectively void. It continues to meet OOM.
> > > 
> > > But skipped pages should be rotated and we should eventually hit pages
> > > from the right zone(s). Moreover we should scan the full LRU at priority
> > > 0 so why exactly we hit the OOM killer?
> > 
> > Yes, full scan in priority 0 but keep it in mind that the number of full
> > LRU pages to scan is one of eligible pages, not all pages of the node.
> 
> I have hard time understanding what you are trying to say here.
> 
> > And isolate_lru_pages have accounted skipped pages as scan count so that
> > VM cannot isolate any pages of eligible pages in LRU if non-eligible pages
> > are a lot in the LRU.
> > 
> > > 
> > > Anyway [1] has changed this behavior. Are you seeing the issue with this
> > > patch dropped?
> > 
> > Good point. Before the patch, it didn't increase scan count with skipped
> > pages so with reverting [1], I guess it might work but worry about
> > isolating lots of skipped pages into temporal pages_skipped list which
> > might causes premate OOM. Anyway, I will test it when I returns at
> > office after vacation.
> 
> I do not think we want to drop this patch. I think we might be good
> enough to simply fold this into the patch
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 24efcc20af91..ac146f10f222 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1472,7 +1472,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  	LIST_HEAD(pages_skipped);
>  
>  	for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan &&
> -					!list_empty(src); scan++) {
> +					!list_empty(src);) {
>  		struct page *page;
>  
>  		page = lru_to_page(src);
> @@ -1486,6 +1486,12 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  			continue;
>  		}
>  
> +		/*
> +		 * Do not count skipped pages because we do want to isolate
> +		 * some pages even when the LRU mostly contains ineligible
> +		 * pages
> +		 */

How about adding comment about "why"?

/*
 * Do not count skipped pages because it makes the function to return with
 * none isolated pages if the LRU mostly contains inelgible pages so that
 * VM cannot reclaim any pages and trigger premature OOM.
 */


> +		scan++;
>  		switch (__isolate_lru_page(page, mode)) {
>  		case 0:
>  			nr_pages = hpage_nr_pages(page);

Confirmed. It works as expected but it changed scan counter's behavior.
How about this?


diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2314aca47d12..846922d7942e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1469,7 +1469,7 @@ static __always_inline void update_lru_sizes(struct lruvec *lruvec,
  *
  * Appropriate locks must be held before calling this function.
  *
- * @nr_to_scan:	The number of pages to look through on the list.
+ * @nr_to_scan:	The number of eligible pages to look through on the list.
  * @lruvec:	The LRU vector to pull pages from.
  * @dst:	The temp list to put pages on to.
  * @nr_scanned:	The number of pages that were scanned.
@@ -1489,11 +1489,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	unsigned long nr_zone_taken[MAX_NR_ZONES] = { 0 };
 	unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
 	unsigned long skipped = 0;
-	unsigned long scan, nr_pages;
+	unsigned long scan, total_scan, nr_pages;
 	LIST_HEAD(pages_skipped);
 
-	for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan &&
-					!list_empty(src); scan++) {
+	for (total_scan = scan = 0; scan < nr_to_scan &&
+					nr_taken < nr_to_scan &&
+					!list_empty(src);
+					total_scan++) {
 		struct page *page;
 
 		page = lru_to_page(src);
@@ -1507,6 +1509,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			continue;
 		}
 
+		/*
+		 * Do not count skipped pages because it makes the function to
+		 * return with none isolated pages if the LRU mostly contains
+		 * inelgible pages so that VM cannot reclaim any pages and
+		 * trigger premature OOM.
+		 */
+		scan++;
 		switch (__isolate_lru_page(page, mode)) {
 		case 0:
 			nr_pages = hpage_nr_pages(page);
@@ -1544,9 +1553,9 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			skipped += nr_skipped[zid];
 		}
 	}
-	*nr_scanned = scan;
+	*nr_scanned = total_scan;
 	trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
-				    scan, skipped, nr_taken, mode, lru);
+				    total_scan, skipped, nr_taken, mode, lru);
 	update_lru_sizes(lruvec, lru, nr_zone_taken);
 	return nr_taken;
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] vmscan: scan pages until it founds eligible pages
  2017-05-10  1:46             ` Minchan Kim
@ 2017-05-10  6:13               ` Michal Hocko
  2017-05-10  7:03                 ` Minchan Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2017-05-10  6:13 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, Andrew Morton, Mel Gorman, kernel-team,
	linux-kernel, linux-mm

On Wed 10-05-17 10:46:54, Minchan Kim wrote:
> On Wed, May 03, 2017 at 08:00:44AM +0200, Michal Hocko wrote:
[...]
> > @@ -1486,6 +1486,12 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> >  			continue;
> >  		}
> >  
> > +		/*
> > +		 * Do not count skipped pages because we do want to isolate
> > +		 * some pages even when the LRU mostly contains ineligible
> > +		 * pages
> > +		 */
> 
> How about adding comment about "why"?
> 
> /*
>  * Do not count skipped pages because it makes the function to return with
>  * none isolated pages if the LRU mostly contains inelgible pages so that
>  * VM cannot reclaim any pages and trigger premature OOM.
>  */

I am not sure this is necessarily any better. Mentioning a pre-mature
OOM would require a much better explanation because a first immediate
question would be "why don't we scan those pages at priority 0". Also
decision about the OOM is at a different layer and it might change in
future when this doesn't apply any more. But it is not like I would
insist...

> > +		scan++;
> >  		switch (__isolate_lru_page(page, mode)) {
> >  		case 0:
> >  			nr_pages = hpage_nr_pages(page);
> 
> Confirmed.

Hmm. I can clearly see how we could skip over too many pages and hit
small reclaim priorities too quickly but I am still scratching my head
about how we could hit the OOM killer as a result. The amount of pages
on the active anonymous list suggests that we are not able to rotate
pages quickly enough. I have to keep thinking about that.

> It works as expected but it changed scan counter's behavior.  How
> about this?

OK, it looks good to me. I believe the main motivation of the original
patch from Johannes was to drop the magical total_skipped.
 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 2314aca47d12..846922d7942e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1469,7 +1469,7 @@ static __always_inline void update_lru_sizes(struct lruvec *lruvec,
>   *
>   * Appropriate locks must be held before calling this function.
>   *
> - * @nr_to_scan:	The number of pages to look through on the list.
> + * @nr_to_scan:	The number of eligible pages to look through on the list.
>   * @lruvec:	The LRU vector to pull pages from.
>   * @dst:	The temp list to put pages on to.
>   * @nr_scanned:	The number of pages that were scanned.
> @@ -1489,11 +1489,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  	unsigned long nr_zone_taken[MAX_NR_ZONES] = { 0 };
>  	unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
>  	unsigned long skipped = 0;
> -	unsigned long scan, nr_pages;
> +	unsigned long scan, total_scan, nr_pages;
>  	LIST_HEAD(pages_skipped);
>  
> -	for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan &&
> -					!list_empty(src); scan++) {
> +	for (total_scan = scan = 0; scan < nr_to_scan &&
> +					nr_taken < nr_to_scan &&
> +					!list_empty(src);
> +					total_scan++) {
>  		struct page *page;
>  
>  		page = lru_to_page(src);
> @@ -1507,6 +1509,13 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  			continue;
>  		}
>  
> +		/*
> +		 * Do not count skipped pages because it makes the function to
> +		 * return with none isolated pages if the LRU mostly contains
> +		 * inelgible pages so that VM cannot reclaim any pages and
> +		 * trigger premature OOM.
> +		 */
> +		scan++;
>  		switch (__isolate_lru_page(page, mode)) {
>  		case 0:
>  			nr_pages = hpage_nr_pages(page);
> @@ -1544,9 +1553,9 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  			skipped += nr_skipped[zid];
>  		}
>  	}
> -	*nr_scanned = scan;
> +	*nr_scanned = total_scan;
>  	trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
> -				    scan, skipped, nr_taken, mode, lru);
> +				    total_scan, skipped, nr_taken, mode, lru);
>  	update_lru_sizes(lruvec, lru, nr_zone_taken);
>  	return nr_taken;
>  }

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] vmscan: scan pages until it founds eligible pages
  2017-05-10  6:13               ` Michal Hocko
@ 2017-05-10  7:03                 ` Minchan Kim
  2017-05-10  7:22                   ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Minchan Kim @ 2017-05-10  7:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Andrew Morton, Mel Gorman, kernel-team,
	linux-kernel, linux-mm

On Wed, May 10, 2017 at 08:13:12AM +0200, Michal Hocko wrote:
> On Wed 10-05-17 10:46:54, Minchan Kim wrote:
> > On Wed, May 03, 2017 at 08:00:44AM +0200, Michal Hocko wrote:
> [...]
> > > @@ -1486,6 +1486,12 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> > >  			continue;
> > >  		}
> > >  
> > > +		/*
> > > +		 * Do not count skipped pages because we do want to isolate
> > > +		 * some pages even when the LRU mostly contains ineligible
> > > +		 * pages
> > > +		 */
> > 
> > How about adding comment about "why"?
> > 
> > /*
> >  * Do not count skipped pages because it makes the function to return with
> >  * none isolated pages if the LRU mostly contains inelgible pages so that
> >  * VM cannot reclaim any pages and trigger premature OOM.
> >  */
> 
> I am not sure this is necessarily any better. Mentioning a pre-mature
> OOM would require a much better explanation because a first immediate
> question would be "why don't we scan those pages at priority 0". Also
> decision about the OOM is at a different layer and it might change in
> future when this doesn't apply any more. But it is not like I would
> insist...
> 
> > > +		scan++;
> > >  		switch (__isolate_lru_page(page, mode)) {
> > >  		case 0:
> > >  			nr_pages = hpage_nr_pages(page);
> > 
> > Confirmed.
> 
> Hmm. I can clearly see how we could skip over too many pages and hit
> small reclaim priorities too quickly but I am still scratching my head
> about how we could hit the OOM killer as a result. The amount of pages
> on the active anonymous list suggests that we are not able to rotate
> pages quickly enough. I have to keep thinking about that.

I explained it but seems to be not enouggh. Let me try again.

The problem is that get_scan_count determines nr_to_scan with
eligible zones.

        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
        size = size >> sc->priority;

Assumes sc->priority is 0 and LRU list is as follows.

        N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H

(Ie, small eligible pages are in the head of LRU but others are
almost ineligible pages)

In that case, size becomes 4 so VM want to scan 4 pages but 4 pages
from tail of the LRU are not eligible pages.
If get_scan_count counts skipped pages, it doesn't reclaim remained
pages after scanning 4 pages.

If it's more helpful to understand the problem, I will add it to
the description.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] vmscan: scan pages until it founds eligible pages
  2017-05-10  7:03                 ` Minchan Kim
@ 2017-05-10  7:22                   ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-05-10  7:22 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, Andrew Morton, Mel Gorman, kernel-team,
	linux-kernel, linux-mm

On Wed 10-05-17 16:03:11, Minchan Kim wrote:
> On Wed, May 10, 2017 at 08:13:12AM +0200, Michal Hocko wrote:
> > On Wed 10-05-17 10:46:54, Minchan Kim wrote:
> > > On Wed, May 03, 2017 at 08:00:44AM +0200, Michal Hocko wrote:
[...]
> > > > +		scan++;
> > > >  		switch (__isolate_lru_page(page, mode)) {
> > > >  		case 0:
> > > >  			nr_pages = hpage_nr_pages(page);
> > > 
> > > Confirmed.
> > 
> > Hmm. I can clearly see how we could skip over too many pages and hit
> > small reclaim priorities too quickly but I am still scratching my head
> > about how we could hit the OOM killer as a result. The amount of pages
> > on the active anonymous list suggests that we are not able to rotate
> > pages quickly enough. I have to keep thinking about that.
> 
> I explained it but seems to be not enouggh. Let me try again.
> 
> The problem is that get_scan_count determines nr_to_scan with
> eligible zones.
> 
>         size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
>         size = size >> sc->priority;

Ohh, right. Who has done that ;) Now it is much more clear. We simply
reclaimed all the pages on the inactive LRU list and only very slowly
progress over active list and hit the OOM before we can actually reach
anything. I completely forgot about the scan window not being the full
LRU list.

Thanks for bearing with me!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-05-10  7:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1493700038-27091-1-git-send-email-minchan@kernel.org>
2017-05-02  5:14 ` [PATCH] vmscan: scan pages until it founds eligible pages Minchan Kim
2017-05-02  7:54   ` Michal Hocko
2017-05-02 14:51     ` Minchan Kim
2017-05-02 15:14       ` Michal Hocko
2017-05-03  4:48         ` Minchan Kim
2017-05-03  6:00           ` Michal Hocko
2017-05-10  1:46             ` Minchan Kim
2017-05-10  6:13               ` Michal Hocko
2017-05-10  7:03                 ` Minchan Kim
2017-05-10  7:22                   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).