All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: <linux-mm@kvack.org>
Cc: Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: [RFC PATCH 2/2] mm, vmscan: do not loop on too_many_isolated for ever
Date: Wed, 18 Jan 2017 14:44:53 +0100	[thread overview]
Message-ID: <20170118134453.11725-3-mhocko@kernel.org> (raw)
In-Reply-To: <20170118134453.11725-1-mhocko@kernel.org>

From: Michal Hocko <mhocko@suse.com>

Tetsuo Handa has reported [1] that direct reclaimers might get stuck in
too_many_isolated loop basically for ever because the last few pages on
the LRU lists are isolated by the kswapd which is stuck on fs locks when
doing the pageout. This in turn means that there is nobody to actually
trigger the oom killer and the system is basically unusable.

too_many_isolated has been introduced by 35cd78156c49 ("vmscan: throttle
direct reclaim when too many pages are isolated already") to prevent
from pre-mature oom killer invocations because back then no reclaim
progress could indeed trigger the OOM killer too early. But since the
oom detection rework 0a0337e0d1d1 ("mm, oom: rework oom detection")
the allocation/reclaim retry loop considers all the reclaimable pages
including those which are isolated - see 9f6c399ddc36 ("mm, vmscan:
consider isolated pages in zone_reclaimable_pages") so we can loosen
the direct reclaim throttling and instead rely on should_reclaim_retry
logic which is the proper layer to control how to throttle and retry
reclaim attempts.

Move the too_many_isolated check outside shrink_inactive_list because
in fact active list might theoretically see too many isolated pages as
well.

[1] http://lkml.kernel.org/r/201602092349.ACG81273.OSVtMJQHLOFOFF@I-love.SAKURA.ne.jp
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmscan.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4b1ed1b1f1db..9f6be3b10ff0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -204,10 +204,12 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
 	unsigned long nr;
 
 	nr = zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_FILE) +
-		zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_FILE);
+		zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_FILE) +
+		zone_page_state_snapshot(zone, NR_ZONE_ISOLATED_FILE);
 	if (get_nr_swap_pages() > 0)
 		nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
-			zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
+			zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON) +
+			zone_page_state_snapshot(zone, NR_ZONE_ISOLATED_ANON);
 
 	return nr;
 }
@@ -1728,14 +1730,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
 
-	while (unlikely(too_many_isolated(pgdat, lru, sc))) {
-		congestion_wait(BLK_RW_ASYNC, HZ/10);
-
-		/* We are about to die and free our memory. Return now. */
-		if (fatal_signal_pending(current))
-			return SWAP_CLUSTER_MAX;
-	}
-
 	lru_add_drain();
 
 	if (!sc->may_unmap)
@@ -2083,6 +2077,29 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 				 struct lruvec *lruvec, struct scan_control *sc)
 {
+	int stalled = false;
+
+	/* We are about to die and free our memory. Return now. */
+	if (fatal_signal_pending(current))
+		return SWAP_CLUSTER_MAX;
+
+	/*
+	 * throttle direct reclaimers but do not loop for ever. We rely
+	 * on should_reclaim_retry to not allow pre-mature OOM when
+	 * there are too many pages under reclaim.
+	 */
+	while (too_many_isolated(lruvec_pgdat(lruvec), lru, sc)) {
+		if (stalled)
+			return 0;
+
+		/*
+		 * TODO we should wait on a different event here - do the wake up
+		 * after we decrement NR_ZONE_ISOLATED_*
+		 */
+		congestion_wait(BLK_RW_ASYNC, HZ/10);
+		stalled = true;
+	}
+
 	if (is_active_lru(lru)) {
 		if (inactive_list_is_low(lruvec, is_file_lru(lru), sc, true))
 			shrink_active_list(nr_to_scan, lruvec, sc, lru);
-- 
2.11.0

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: [RFC PATCH 2/2] mm, vmscan: do not loop on too_many_isolated for ever
Date: Wed, 18 Jan 2017 14:44:53 +0100	[thread overview]
Message-ID: <20170118134453.11725-3-mhocko@kernel.org> (raw)
In-Reply-To: <20170118134453.11725-1-mhocko@kernel.org>

From: Michal Hocko <mhocko@suse.com>

Tetsuo Handa has reported [1] that direct reclaimers might get stuck in
too_many_isolated loop basically for ever because the last few pages on
the LRU lists are isolated by the kswapd which is stuck on fs locks when
doing the pageout. This in turn means that there is nobody to actually
trigger the oom killer and the system is basically unusable.

too_many_isolated has been introduced by 35cd78156c49 ("vmscan: throttle
direct reclaim when too many pages are isolated already") to prevent
from pre-mature oom killer invocations because back then no reclaim
progress could indeed trigger the OOM killer too early. But since the
oom detection rework 0a0337e0d1d1 ("mm, oom: rework oom detection")
the allocation/reclaim retry loop considers all the reclaimable pages
including those which are isolated - see 9f6c399ddc36 ("mm, vmscan:
consider isolated pages in zone_reclaimable_pages") so we can loosen
the direct reclaim throttling and instead rely on should_reclaim_retry
logic which is the proper layer to control how to throttle and retry
reclaim attempts.

Move the too_many_isolated check outside shrink_inactive_list because
in fact active list might theoretically see too many isolated pages as
well.

[1] http://lkml.kernel.org/r/201602092349.ACG81273.OSVtMJQHLOFOFF@I-love.SAKURA.ne.jp
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmscan.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4b1ed1b1f1db..9f6be3b10ff0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -204,10 +204,12 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
 	unsigned long nr;
 
 	nr = zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_FILE) +
-		zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_FILE);
+		zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_FILE) +
+		zone_page_state_snapshot(zone, NR_ZONE_ISOLATED_FILE);
 	if (get_nr_swap_pages() > 0)
 		nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
-			zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
+			zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON) +
+			zone_page_state_snapshot(zone, NR_ZONE_ISOLATED_ANON);
 
 	return nr;
 }
@@ -1728,14 +1730,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
 
-	while (unlikely(too_many_isolated(pgdat, lru, sc))) {
-		congestion_wait(BLK_RW_ASYNC, HZ/10);
-
-		/* We are about to die and free our memory. Return now. */
-		if (fatal_signal_pending(current))
-			return SWAP_CLUSTER_MAX;
-	}
-
 	lru_add_drain();
 
 	if (!sc->may_unmap)
@@ -2083,6 +2077,29 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 				 struct lruvec *lruvec, struct scan_control *sc)
 {
+	int stalled = false;
+
+	/* We are about to die and free our memory. Return now. */
+	if (fatal_signal_pending(current))
+		return SWAP_CLUSTER_MAX;
+
+	/*
+	 * throttle direct reclaimers but do not loop for ever. We rely
+	 * on should_reclaim_retry to not allow pre-mature OOM when
+	 * there are too many pages under reclaim.
+	 */
+	while (too_many_isolated(lruvec_pgdat(lruvec), lru, sc)) {
+		if (stalled)
+			return 0;
+
+		/*
+		 * TODO we should wait on a different event here - do the wake up
+		 * after we decrement NR_ZONE_ISOLATED_*
+		 */
+		congestion_wait(BLK_RW_ASYNC, HZ/10);
+		stalled = true;
+	}
+
 	if (is_active_lru(lru)) {
 		if (inactive_list_is_low(lruvec, is_file_lru(lru), sc, true))
 			shrink_active_list(nr_to_scan, lruvec, sc, lru);
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-01-18 13:51 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-18 13:44 [RFC PATCH 0/2] fix unbounded too_many_isolated Michal Hocko
2017-01-18 13:44 ` Michal Hocko
2017-01-18 13:44 ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Michal Hocko
2017-01-18 13:44   ` Michal Hocko
2017-01-18 14:46   ` Mel Gorman
2017-01-18 14:46     ` Mel Gorman
2017-01-18 15:15     ` Michal Hocko
2017-01-18 15:15       ` Michal Hocko
2017-01-18 15:54       ` Mel Gorman
2017-01-18 15:54         ` Mel Gorman
2017-01-18 16:17         ` Michal Hocko
2017-01-18 16:17           ` Michal Hocko
2017-01-18 17:00           ` Mel Gorman
2017-01-18 17:00             ` Mel Gorman
2017-01-18 17:29             ` Michal Hocko
2017-01-18 17:29               ` Michal Hocko
2017-01-19 10:07               ` Mel Gorman
2017-01-19 10:07                 ` Mel Gorman
2017-01-19 11:23                 ` Michal Hocko
2017-01-19 11:23                   ` Michal Hocko
2017-01-19 13:11                   ` Mel Gorman
2017-01-19 13:11                     ` Mel Gorman
2017-01-20 13:27                     ` Tetsuo Handa
2017-01-20 13:27                       ` Tetsuo Handa
2017-01-21  7:42                       ` Tetsuo Handa
2017-01-21  7:42                         ` Tetsuo Handa
2017-01-25 10:15                         ` Michal Hocko
2017-01-25 10:15                           ` Michal Hocko
2017-01-25 10:19                           ` Christoph Hellwig
2017-01-25 10:19                             ` Christoph Hellwig
2017-01-25 10:46                             ` Michal Hocko
2017-01-25 10:46                               ` Michal Hocko
2017-01-25 11:09                               ` Tetsuo Handa
2017-01-25 11:09                                 ` Tetsuo Handa
2017-01-25 13:00                                 ` Michal Hocko
2017-01-25 13:00                                   ` Michal Hocko
2017-01-27 14:49                                   ` Michal Hocko
2017-01-27 14:49                                     ` Michal Hocko
2017-01-28 15:27                                     ` Tetsuo Handa
2017-01-28 15:27                                       ` Tetsuo Handa
2017-01-30  8:55                                       ` Michal Hocko
2017-01-30  8:55                                         ` Michal Hocko
2017-02-02 10:14                                         ` Michal Hocko
2017-02-02 10:14                                           ` Michal Hocko
2017-02-03 10:57                                           ` Tetsuo Handa
2017-02-03 10:57                                             ` Tetsuo Handa
2017-02-03 14:41                                             ` Michal Hocko
2017-02-03 14:41                                               ` Michal Hocko
2017-02-03 14:50                                             ` Michal Hocko
2017-02-03 14:50                                               ` Michal Hocko
2017-02-03 17:24                                               ` Brian Foster
2017-02-03 17:24                                                 ` Brian Foster
2017-02-06  6:29                                                 ` Tetsuo Handa
2017-02-06  6:29                                                   ` Tetsuo Handa
2017-02-06 14:35                                                   ` Brian Foster
2017-02-06 14:35                                                     ` Brian Foster
2017-02-06 14:42                                                     ` Michal Hocko
2017-02-06 14:42                                                       ` Michal Hocko
2017-02-06 15:47                                                       ` Brian Foster
2017-02-06 15:47                                                         ` Brian Foster
2017-02-07 10:30                                                     ` Tetsuo Handa
2017-02-07 10:30                                                       ` Tetsuo Handa
2017-02-07 16:54                                                       ` Brian Foster
2017-02-07 16:54                                                         ` Brian Foster
2017-02-03 14:55                                             ` Michal Hocko
2017-02-03 14:55                                               ` Michal Hocko
2017-02-05 10:43                                               ` Tetsuo Handa
2017-02-05 10:43                                                 ` Tetsuo Handa
2017-02-06 10:34                                                 ` Michal Hocko
2017-02-06 10:34                                                   ` Michal Hocko
2017-02-06 10:39                                                 ` Michal Hocko
2017-02-06 10:39                                                   ` Michal Hocko
2017-02-07 21:12                                                   ` Michal Hocko
2017-02-07 21:12                                                     ` Michal Hocko
2017-02-08  9:24                                                     ` Peter Zijlstra
2017-02-08  9:24                                                       ` Peter Zijlstra
2017-02-21  9:40                                             ` Michal Hocko
2017-02-21  9:40                                               ` Michal Hocko
2017-02-21 14:35                                               ` Tetsuo Handa
2017-02-21 14:35                                                 ` Tetsuo Handa
2017-02-21 15:53                                                 ` Michal Hocko
2017-02-21 15:53                                                   ` Michal Hocko
2017-02-22  2:02                                                   ` Tetsuo Handa
2017-02-22  2:02                                                     ` Tetsuo Handa
2017-02-22  7:54                                                     ` Michal Hocko
2017-02-22  7:54                                                       ` Michal Hocko
2017-02-26  6:30                                                       ` Tetsuo Handa
2017-02-26  6:30                                                         ` Tetsuo Handa
2017-01-31 11:58                                   ` Michal Hocko
2017-01-31 11:58                                     ` Michal Hocko
2017-01-31 12:51                                     ` Christoph Hellwig
2017-01-31 12:51                                       ` Christoph Hellwig
2017-01-31 13:21                                       ` Michal Hocko
2017-01-31 13:21                                         ` Michal Hocko
2017-01-25 10:33                           ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pagesper zone Tetsuo Handa
2017-01-25 10:33                             ` Tetsuo Handa
2017-01-25 12:34                             ` Michal Hocko
2017-01-25 12:34                               ` Michal Hocko
2017-01-25 13:13                               ` [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone Tetsuo Handa
2017-01-25 13:13                                 ` Tetsuo Handa
2017-01-25  9:53                       ` Michal Hocko
2017-01-25  9:53                         ` Michal Hocko
2017-01-20  6:42                 ` Hillf Danton
2017-01-20  6:42                   ` Hillf Danton
2017-01-20  9:25                   ` Mel Gorman
2017-01-20  9:25                     ` Mel Gorman
2017-01-18 13:44 ` Michal Hocko [this message]
2017-01-18 13:44   ` [RFC PATCH 2/2] mm, vmscan: do not loop on too_many_isolated for ever Michal Hocko
2017-01-18 14:50   ` Mel Gorman
2017-01-18 14:50     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170118134453.11725-3-mhocko@kernel.org \
    --to=mhocko@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.