linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Joonsoo Kim <js1304@gmail.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	Vlastimil Babka <vbabka@suse.cz>, <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: [PATCH 11/14] mm: throttle on IO only when there are too many dirty and writeback pages
Date: Wed, 20 Apr 2016 15:47:24 -0400	[thread overview]
Message-ID: <1461181647-8039-12-git-send-email-mhocko@kernel.org> (raw)
In-Reply-To: <1461181647-8039-1-git-send-email-mhocko@kernel.org>

From: Michal Hocko <mhocko@suse.com>

wait_iff_congested has been used to throttle allocator before it retried
another round of direct reclaim to allow the writeback to make some
progress and prevent reclaim from looping over dirty/writeback pages
without making any progress. We used to do congestion_wait before
0e093d99763e ("writeback: do not sleep on the congestion queue if
there are no congested BDIs or if significant congestion is not being
encountered in the current zone") but that led to undesirable stalls
and sleeping for the full timeout even when the BDI wasn't congested.
Hence wait_iff_congested was used instead. But it seems that even
wait_iff_congested doesn't work as expected. We might have a small file
LRU list with all pages dirty/writeback and yet the bdi is not congested
so this is just a cond_resched in the end and can end up triggering pre
mature OOM.

This patch replaces the unconditional wait_iff_congested by
congestion_wait which is executed only if we _know_ that the last round
of direct reclaim didn't make any progress and dirty+writeback pages are
more than a half of the reclaimable pages on the zone which might be
usable for our target allocation. This shouldn't reintroduce stalls
fixed by 0e093d99763e because congestion_wait is called only when we
are getting hopeless when sleeping is a better choice than OOM with many
pages under IO.

We have to preserve logic introduced by 373ccbe59270 ("mm, vmstat: allow
WQ concurrency to discover memory reclaim doesn't make any progress")
into the __alloc_pages_slowpath now that wait_iff_congested is not
used anymore.  As the only remaining user of wait_iff_congested is
shrink_inactive_list we can remove the WQ specific short sleep from
wait_iff_congested because the sleep is needed to be done only once in
the allocation retry cycle.

Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/backing-dev.c | 20 +++-----------------
 mm/page_alloc.c  | 39 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index bfbd7096b6ed..08e3a58628ed 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -957,9 +957,8 @@ EXPORT_SYMBOL(congestion_wait);
  * jiffies for either a BDI to exit congestion of the given @sync queue
  * or a write to complete.
  *
- * In the absence of zone congestion, a short sleep or a cond_resched is
- * performed to yield the processor and to allow other subsystems to make
- * a forward progress.
+ * In the absence of zone congestion, cond_resched() is called to yield
+ * the processor if necessary but otherwise does not sleep.
  *
  * The return value is 0 if the sleep is for the full timeout. Otherwise,
  * it is the number of jiffies that were still remaining when the function
@@ -979,20 +978,7 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout)
 	 */
 	if (atomic_read(&nr_wb_congested[sync]) == 0 ||
 	    !test_bit(ZONE_CONGESTED, &zone->flags)) {
-
-		/*
-		 * Memory allocation/reclaim might be called from a WQ
-		 * context and the current implementation of the WQ
-		 * concurrency control doesn't recognize that a particular
-		 * WQ is congested if the worker thread is looping without
-		 * ever sleeping. Therefore we have to do a short sleep
-		 * here rather than calling cond_resched().
-		 */
-		if (current->flags & PF_WQ_WORKER)
-			schedule_timeout_uninterruptible(1);
-		else
-			cond_resched();
-
+		cond_resched();
 		/* In case we scheduled, work out time remaining */
 		ret = timeout - (jiffies - start);
 		if (ret < 0)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 38302c2041a3..3b78936eca70 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3195,8 +3195,9 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
 					ac->nodemask) {
 		unsigned long available;
+		unsigned long reclaimable;
 
-		available = zone_reclaimable_pages(zone);
+		available = reclaimable = zone_reclaimable_pages(zone);
 		available -= DIV_ROUND_UP(no_progress_loops * available,
 					  MAX_RECLAIM_RETRIES);
 		available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
@@ -3207,8 +3208,40 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 		 */
 		if (__zone_watermark_ok(zone, order, min_wmark_pages(zone),
 				ac->high_zoneidx, alloc_flags, available)) {
-			/* Wait for some write requests to complete then retry */
-			wait_iff_congested(zone, BLK_RW_ASYNC, HZ/50);
+			/*
+			 * If we didn't make any progress and have a lot of
+			 * dirty + writeback pages then we should wait for
+			 * an IO to complete to slow down the reclaim and
+			 * prevent from pre mature OOM
+			 */
+			if (!did_some_progress) {
+				unsigned long writeback;
+				unsigned long dirty;
+
+				writeback = zone_page_state_snapshot(zone,
+								     NR_WRITEBACK);
+				dirty = zone_page_state_snapshot(zone, NR_FILE_DIRTY);
+
+				if (2*(writeback + dirty) > reclaimable) {
+					congestion_wait(BLK_RW_ASYNC, HZ/10);
+					return true;
+				}
+			}
+
+			/*
+			 * Memory allocation/reclaim might be called from a WQ
+			 * context and the current implementation of the WQ
+			 * concurrency control doesn't recognize that
+			 * a particular WQ is congested if the worker thread is
+			 * looping without ever sleeping. Therefore we have to
+			 * do a short sleep here rather than calling
+			 * cond_resched().
+			 */
+			if (current->flags & PF_WQ_WORKER)
+				schedule_timeout_uninterruptible(1);
+			else
+				cond_resched();
+
 			return true;
 		}
 	}
-- 
2.8.0.rc3

  parent reply	other threads:[~2016-04-20 19:48 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-20 19:47 [PATCH 0.14] oom detection rework v6 Michal Hocko
2016-04-20 19:47 ` [PATCH 01/14] vmscan: consider classzone_idx in compaction_ready Michal Hocko
2016-04-21  3:32   ` Hillf Danton
2016-05-04 13:56   ` Michal Hocko
2016-04-20 19:47 ` [PATCH 02/14] mm, compaction: change COMPACT_ constants into enum Michal Hocko
2016-04-20 19:47 ` [PATCH 03/14] mm, compaction: cover all compaction mode in compact_zone Michal Hocko
2016-04-20 19:47 ` [PATCH 04/14] mm, compaction: distinguish COMPACT_DEFERRED from COMPACT_SKIPPED Michal Hocko
2016-04-21  7:08   ` Hillf Danton
2016-04-20 19:47 ` [PATCH 05/14] mm, compaction: distinguish between full and partial COMPACT_COMPLETE Michal Hocko
2016-04-21  6:39   ` Hillf Danton
2016-04-20 19:47 ` [PATCH 06/14] mm, compaction: Update compaction_result ordering Michal Hocko
2016-04-21  6:45   ` Hillf Danton
2016-04-20 19:47 ` [PATCH 07/14] mm, compaction: Simplify __alloc_pages_direct_compact feedback interface Michal Hocko
2016-04-21  6:50   ` Hillf Danton
2016-04-20 19:47 ` [PATCH 08/14] mm, compaction: Abstract compaction feedback to helpers Michal Hocko
2016-04-21  6:57   ` Hillf Danton
2016-04-28  8:47   ` Vlastimil Babka
2016-04-20 19:47 ` [PATCH 09/14] mm: use compaction feedback for thp backoff conditions Michal Hocko
2016-04-21  7:05   ` Hillf Danton
2016-04-28  8:53   ` Vlastimil Babka
2016-04-28 12:35     ` Michal Hocko
2016-04-29  9:16       ` Vlastimil Babka
2016-04-29  9:28         ` Michal Hocko
2016-04-20 19:47 ` [PATCH 10/14] mm, oom: rework oom detection Michal Hocko
2016-04-20 19:47 ` Michal Hocko [this message]
2016-04-20 19:47 ` [PATCH 12/14] mm, oom: protect !costly allocations some more Michal Hocko
2016-04-21  8:03   ` Hillf Danton
2016-05-04  6:01   ` Joonsoo Kim
2016-05-04  6:31     ` Joonsoo Kim
2016-05-04  8:56       ` Michal Hocko
2016-05-04 14:57         ` Joonsoo Kim
2016-05-04 18:19           ` Michal Hocko
2016-05-04  8:53     ` Michal Hocko
2016-05-04 14:39       ` Joonsoo Kim
2016-05-04 18:20         ` Michal Hocko
2016-04-20 19:47 ` [PATCH 13/14] mm: consider compaction feedback also for costly allocation Michal Hocko
2016-04-21  8:13   ` Hillf Danton
2016-04-20 19:47 ` [PATCH 14/14] mm, oom, compaction: prevent from should_compact_retry looping for ever for costly orders Michal Hocko
2016-04-21  8:24   ` Hillf Danton
2016-04-28  8:59   ` Vlastimil Babka
2016-04-28 12:39     ` Michal Hocko
2016-05-04  6:27   ` Joonsoo Kim
2016-05-04  9:04     ` Michal Hocko
2016-05-04 15:14       ` Joonsoo Kim
2016-05-04 19:22         ` Michal Hocko
2016-05-04  5:45 ` [PATCH 0.14] oom detection rework v6 Joonsoo Kim
2016-05-04  8:12   ` Vlastimil Babka
2016-05-04  8:32     ` Joonsoo Kim
2016-05-04  8:50     ` Michal Hocko
2016-05-04  8:47   ` Michal Hocko
2016-05-04 14:32     ` Joonsoo Kim
2016-05-04 18:16       ` Michal Hocko
2016-05-10  6:41         ` Joonsoo Kim
2016-05-10  7:09           ` Vlastimil Babka
2016-05-10  8:00             ` Joonsoo Kim
2016-05-10  9:44               ` Michal Hocko
2016-05-10  9:43           ` Michal Hocko
2016-05-12  2:23             ` Joonsoo Kim
2016-05-12  5:19               ` Joonsoo Kim
2016-05-12 10:59               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1461181647-8039-12-git-send-email-mhocko@kernel.org \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=js1304@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).