linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Vlastimil Babka <vbabka@suse.cz>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v1
Date: Thu, 21 Jul 2016 17:39:56 +0900	[thread overview]
Message-ID: <20160721083956.GB8356@bbox> (raw)
In-Reply-To: <20160721073156.GC27554@js1304-P5Q-DELUXE>

On Thu, Jul 21, 2016 at 04:31:56PM +0900, Joonsoo Kim wrote:
> On Wed, Jul 20, 2016 at 04:21:46PM +0100, Mel Gorman wrote:
> > Both Joonsoo Kim and Minchan Kim have reported premature OOM kills on
> > a 32-bit platform. The common element is a zone-constrained high-order
> > allocation failing. Two factors appear to be at fault -- pgdat being
> > considered unreclaimable prematurely and insufficient rotation of the
> > active list.
> > 
> > Unfortunately to date I have been unable to reproduce this with a variety
> > of stress workloads on a 2G 32-bit KVM instance. It's not clear why as
> > the steps are similar to what was described. It means I've been unable to
> > determine if this series addresses the problem or not. I'm hoping they can
> > test and report back before these are merged to mmotm. What I have checked
> > is that a basic parallel DD workload completed successfully on the same
> > machine I used for the node-lru performance tests. I'll leave the other
> > tests running just in case anything interesting falls out.
> 
> Hello, Mel.
> 
> I tested this series and it doesn't solve my problem. But, with this
> series and one change below, my problem is solved.
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f5ab357..d451c29 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1819,7 +1819,7 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
>  
>                 nr_pages = hpage_nr_pages(page);
>                 update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
> -               list_move(&page->lru, &lruvec->lists[lru]);
> +               list_move_tail(&page->lru, &lruvec->lists[lru]);
>                 pgmoved += nr_pages;
>  
>                 if (put_page_testzero(page)) {
> 
> It is brain-dead work-around so it is better you to find a better solution.

I tested below patch roughly and it enhanced performance a lot.

diff --git a/mm/vmscan.c b/mm/vmscan.c
index cd68a18..9061e5a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1809,7 +1809,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 static void move_active_pages_to_lru(struct lruvec *lruvec,
 				     struct list_head *list,
 				     struct list_head *pages_to_free,
-				     enum lru_list lru)
+				     enum lru_list lru,
+				     bool tail)
 {
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	unsigned long pgmoved = 0;
@@ -1825,7 +1826,10 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
 
 		nr_pages = hpage_nr_pages(page);
 		update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
-		list_move(&page->lru, &lruvec->lists[lru]);
+		if (!tail)
+			list_move(&page->lru, &lruvec->lists[lru]);
+		else
+			list_move_tail(&page->lru, &lruvec->lists[lru]);
 		pgmoved += nr_pages;
 
 		if (put_page_testzero(page)) {
@@ -1847,6 +1851,47 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
 		__count_vm_events(PGDEACTIVATE, pgmoved);
 }
 
+static bool inactive_list_is_extreme_low(struct lruvec *lruvec, bool file,
+						struct scan_control *sc)
+{
+	unsigned long inactive;
+
+	/*
+	 * If we don't have swap space, anonymous page deactivation
+	 * is pointless.
+	 */
+	if (!file && !total_swap_pages)
+		return false;
+
+	inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
+
+	/*
+	 * For global reclaim on zone-constrained allocations, it is necessary
+	 * to check if rotations are required for lowmem to be reclaimed. This
+	 * calculates the inactive/active pages available in eligible zones.
+	 */
+	if (global_reclaim(sc)) {
+		struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+		int zid;
+
+		for (zid = sc->reclaim_idx + 1; zid < MAX_NR_ZONES; zid++) {
+			struct zone *zone = &pgdat->node_zones[zid];
+			unsigned long inactive_zone;
+
+			if (!populated_zone(zone))
+				continue;
+
+			inactive_zone = zone_page_state(zone,
+					NR_ZONE_LRU_BASE + (file * LRU_FILE));
+
+			inactive -= min(inactive, inactive_zone);
+		}
+	}
+
+
+	return inactive <= (SWAP_CLUSTER_MAX * num_online_cpus());
+}
+
 static void shrink_active_list(unsigned long nr_to_scan,
 			       struct lruvec *lruvec,
 			       struct scan_control *sc,
@@ -1937,9 +1982,11 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	 * get_scan_count.
 	 */
 	reclaim_stat->recent_rotated[file] += nr_rotated;
+	move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru, false);
+	move_active_pages_to_lru(lruvec, &l_inactive,
+		&l_hold, lru - LRU_ACTIVE,
+		inactive_list_is_extreme_low(lruvec, is_file_lru(lru), sc));
 
-	move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru);
-	move_active_pages_to_lru(lruvec, &l_inactive, &l_hold, lru - LRU_ACTIVE);
 	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
 	spin_unlock_irq(&pgdat->lru_lock);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-07-21  8:39 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-20 15:21 [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v1 Mel Gorman
2016-07-20 15:21 ` [PATCH 1/5] mm, vmscan: Do not account skipped pages as scanned Mel Gorman
2016-07-21  5:16   ` Minchan Kim
2016-07-21  8:15     ` Mel Gorman
2016-07-21  8:31       ` Minchan Kim
2016-07-25  8:04   ` Minchan Kim
2016-07-25  9:20     ` Mel Gorman
2016-07-28  1:38       ` Minchan Kim
2016-07-20 15:21 ` [PATCH 2/5] mm: add per-zone lru list stat Mel Gorman
2016-07-21  7:10   ` Joonsoo Kim
2016-07-23  0:45     ` Fengguang Wu
2016-07-23  1:25       ` Minchan Kim
2016-07-20 15:21 ` [PATCH 3/5] mm, vmscan: Remove highmem_file_pages Mel Gorman
2016-07-20 15:21 ` [PATCH 4/5] mm: Remove reclaim and compaction retry approximations Mel Gorman
2016-07-20 15:21 ` [PATCH 5/5] mm: consider per-zone inactive ratio to deactivate Mel Gorman
2016-07-21  5:30   ` Minchan Kim
2016-07-21  8:08     ` Mel Gorman
2016-07-21  7:10   ` Joonsoo Kim
2016-07-21  8:16     ` Mel Gorman
2016-07-21  7:07 ` [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v1 Minchan Kim
2016-07-21  9:15   ` Mel Gorman
2016-07-21  7:31 ` Joonsoo Kim
2016-07-21  8:39   ` Minchan Kim [this message]
2016-07-21  9:16   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160721083956.GB8356@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.cz \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).