linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Ondrej Kozina <okozina@redhat.com>,
	Jerome Marchand <jmarchan@redhat.com>,
	Stanislav Kozina <skozina@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: System freezes after OOM
Date: Fri, 15 Jul 2016 10:23:47 +0200	[thread overview]
Message-ID: <20160715082347.GC11811@dhcp22.suse.cz> (raw)
In-Reply-To: <20160715072242.GB11811@dhcp22.suse.cz>

Let me paste the patch with the full changelog and the explanation so
that we can reason about it more easily. If I am making some false
assumptions then please point them out.
--- 
>From ed46e3f7f5a6e896331eeadc9d09e2796acb3d01 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 14 Jul 2016 19:31:54 +0200
Subject: [PATCH] mempool: do not consume memory reserves from the reclaim path

There has been a report about OOM killer invoked when swapping out to
a dm-crypt device. The primary reason seems to be that the swapout
out IO managed to completely deplete memory reserves. Mikulas was
able to bisect and explained the issue by pointing to f9054c70d28b
("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements").

The reason is that the swapout path is not throttled properly because
the md layer needs to allocate from the generic_make_request path
which means it allocates from the PF_MEMALLOC context. dm layer uses
mempool_alloc in order to guarantee a forward progress which used to
inhibit access to memory reserves when using page allocator. This has
changed by f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if
there are free elements") which has dropped the __GFP_NOMEMALLOC
protection when the memory pool is depleted.

If we are running out of memory and the only way forward to free memory
is to perform swapout we just keep consuming memory reserves rather than
throttling the mempool allocations and allowing the pending IO to
complete up to a moment when the memory is depleted completely and there
is no way forward but invoking the OOM killer. This is less than
optimal.

The original intention of f9054c70d28b was to help with the OOM
situations where the oom victim depends on mempool allocation to make a
forward progress. We can handle that case in a different way, though. We
can check whether the current task has access to memory reserves ad an
OOM victim (TIF_MEMDIE) and drop __GFP_NOMEMALLOC protection if the pool
is empty.

David Rientjes was objecting that such an approach wouldn't help if the
oom victim was blocked on a lock held by process doing mempool_alloc. This
is very similar to other oom deadlock situations and we have oom_reaper
to deal with them so it is reasonable to rely on the same mechanism
rather inventing a different one which has negative side effects.

Fixes: f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements")
Bisected-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/mempool.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/mempool.c b/mm/mempool.c
index 8f65464da5de..ea26d75c8adf 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -322,20 +322,20 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
 
 	might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
 
+	gfp_mask |= __GFP_NOMEMALLOC;   /* don't allocate emergency reserves */
 	gfp_mask |= __GFP_NORETRY;	/* don't loop in __alloc_pages */
 	gfp_mask |= __GFP_NOWARN;	/* failures are OK */
 
 	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
 
 repeat_alloc:
-	if (likely(pool->curr_nr)) {
-		/*
-		 * Don't allocate from emergency reserves if there are
-		 * elements available.  This check is racy, but it will
-		 * be rechecked each loop.
-		 */
-		gfp_temp |= __GFP_NOMEMALLOC;
-	}
+	/*
+	 * Make sure that the OOM victim will get access to memory reserves
+	 * properly if there are no objects in the pool to prevent from
+	 * livelocks.
+	 */
+	if (!likely(pool->curr_nr) && test_thread_flag(TIF_MEMDIE))
+		gfp_temp &= ~__GFP_NOMEMALLOC;
 
 	element = pool->alloc(gfp_temp, pool->pool_data);
 	if (likely(element != NULL))
@@ -359,7 +359,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
 	 * We use gfp mask w/o direct reclaim or IO for the first round.  If
 	 * alloc failed with that and @pool was empty, retry immediately.
 	 */
-	if ((gfp_temp & ~__GFP_NOMEMALLOC) != gfp_mask) {
+	if ((gfp_temp & __GFP_DIRECT_RECLAIM) != (gfp_mask & __GFP_DIRECT_RECLAIM)) {
 		spin_unlock_irqrestore(&pool->lock, flags);
 		gfp_temp = gfp_mask;
 		goto repeat_alloc;
-- 
2.8.1

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2016-07-15  8:23 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <57837CEE.1010609@redhat.com>
     [not found] ` <f80dc690-7e71-26b2-59a2-5a1557d26713@redhat.com>
     [not found]   ` <9be09452-de7f-d8be-fd5d-4a80d1cd1ba3@redhat.com>
2016-07-11 15:43     ` System freezes after OOM Mikulas Patocka
2016-07-12  6:49       ` Michal Hocko
2016-07-12 23:44         ` Mikulas Patocka
2016-07-13  8:35           ` Jerome Marchand
2016-07-13 11:14             ` Michal Hocko
2016-07-13 14:21               ` Mikulas Patocka
2016-07-13 11:10           ` Michal Hocko
2016-07-13 12:50             ` Michal Hocko
2016-07-13 13:44               ` Milan Broz
2016-07-13 15:21                 ` Mikulas Patocka
2016-07-14  9:09                   ` Michal Hocko
2016-07-14  9:46                     ` Milan Broz
2016-07-13 15:02             ` Mikulas Patocka
2016-07-14 10:51               ` [dm-devel] " Ondrej Kozina
2016-07-14 12:51               ` Michal Hocko
2016-07-14 14:00                 ` Mikulas Patocka
2016-07-14 14:59                   ` Michal Hocko
2016-07-14 15:25                     ` Ondrej Kozina
2016-07-14 17:35                     ` Mikulas Patocka
2016-07-15  8:35                       ` Michal Hocko
2016-07-15 12:11                         ` Mikulas Patocka
2016-07-15 12:22                           ` Michal Hocko
2016-07-15 17:02                             ` Mikulas Patocka
2016-07-18  7:22                               ` Michal Hocko
2016-07-14 14:08                 ` Ondrej Kozina
2016-07-14 15:31                   ` Michal Hocko
2016-07-14 17:07                     ` Ondrej Kozina
2016-07-14 17:36                       ` Michal Hocko
2016-07-14 17:39                         ` Michal Hocko
2016-07-15 11:42                       ` Tetsuo Handa
2016-07-13 13:19           ` Tetsuo Handa
2016-07-13 13:39             ` Michal Hocko
2016-07-13 14:18               ` Mikulas Patocka
2016-07-13 14:56                 ` Michal Hocko
2016-07-13 15:11                   ` Mikulas Patocka
2016-07-13 23:53                     ` David Rientjes
2016-07-14 11:01                       ` Tetsuo Handa
2016-07-14 12:29                         ` Mikulas Patocka
2016-07-14 20:26                         ` David Rientjes
2016-07-14 21:40                           ` Tetsuo Handa
2016-07-14 22:04                             ` David Rientjes
2016-07-15 11:25                           ` Mikulas Patocka
2016-07-15 21:21                             ` David Rientjes
2016-07-14 12:27                       ` Mikulas Patocka
2016-07-14 20:22                         ` David Rientjes
2016-07-15 11:21                           ` Mikulas Patocka
2016-07-15 21:25                             ` David Rientjes
2016-07-15 21:39                               ` Mikulas Patocka
2016-07-15 21:58                                 ` David Rientjes
2016-07-15 23:53                                   ` Mikulas Patocka
2016-07-18 15:14                             ` Johannes Weiner
2016-07-14 15:29                       ` Michal Hocko
2016-07-14 20:38                         ` David Rientjes
2016-07-15  7:22                           ` Michal Hocko
2016-07-15  8:23                             ` Michal Hocko [this message]
2016-07-15 12:00                             ` Mikulas Patocka
2016-07-15 21:47                             ` David Rientjes
2016-07-18  7:39                               ` Michal Hocko
2016-07-18 21:03                                 ` David Rientjes
2016-07-14  0:01             ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160715082347.GC11811@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=jmarchan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mpatocka@redhat.com \
    --cc=okozina@redhat.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    --cc=skozina@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).