All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vasily Averin <vvs@virtuozzo.com>
To: Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel@openvz.org
Subject: [PATCH mm] vmalloc: back off when the current task is OOM-killed
Date: Fri, 17 Sep 2021 11:06:49 +0300	[thread overview]
Message-ID: <d07a5540-3e07-44ba-1e59-067500f024d9@virtuozzo.com> (raw)
In-Reply-To: <YT8PEBbYZhLixEJD@dhcp22.suse.cz>

Huge vmalloc allocation on heavy loaded node can lead to a global
memory shortage. A task called vmalloc can have the worst badness
and be chosen by OOM-killer, however received fatal signal and
oom victim mark does not interrupt allocation cycle. Vmalloc will
continue allocating pages over and over again, exacerbating the crisis
and consuming the memory freed up by another killed tasks.

This patch allows OOM-killer to break vmalloc cycle, makes OOM more
effective and avoid host panic.

Unfortunately it is not 100% safe. Previous attempt to break vmalloc
cycle was reverted by commit b8c8a338f75e ("Revert "vmalloc: back off when
the current task is killed"") due to some vmalloc callers did not handled
failures properly. Found issues was resolved, however, there may
be other similar places.

Such failures may be acceptable for emergencies, such as OOM. On the other
hand, we would like to detect them earlier. However they are quite rare,
and will be hidden by OOM messages, so I'm afraid they wikk have quite
small chance of being noticed and reported.

To improve the detection of such places this patch also interrupts the vmalloc
allocation cycle for all fatal signals. The checks are hidden under DEBUG_VM
config option to do not break unaware production kernels.

Vmalloc uses new alloc_pages_bulk subsystem, so newly added checks can
affect other users of this subsystem.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
 mm/page_alloc.c | 5 +++++
 mm/vmalloc.c    | 6 ++++++
 2 files changed, 11 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b37435c274cf..133d52e507ff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5288,6 +5288,11 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 			continue;
 		}
 
+		if (tsk_is_oom_victim(current) ||
+		    (IS_ENABLED(CONFIG_DEBUG_VM) &&
+		     fatal_signal_pending(current)))
+			break;
+
 		page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
 								pcp, pcp_list);
 		if (unlikely(!page)) {
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index c3b8e3e5cfc5..04b291076726 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -38,6 +38,7 @@
 #include <linux/pgtable.h>
 #include <linux/uaccess.h>
 #include <linux/hugetlb.h>
+#include <linux/oom.h>
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
@@ -2860,6 +2861,11 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 		struct page *page;
 		int i;
 
+		if (tsk_is_oom_victim(current) ||
+		    (IS_ENABLED(CONFIG_DEBUG_VM) &&
+		     fatal_signal_pending(current)))
+			break;
+
 		page = alloc_pages_node(nid, gfp, order);
 		if (unlikely(!page))
 			break;
-- 
2.31.1


WARNING: multiple messages have this Message-ID (diff)
From: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
To: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Vladimir Davydov
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Tetsuo Handa
	<penguin-kernel-JPay3/Yim36HaxMnTkn67Xf5DAMn2ifp@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org
Subject: [PATCH mm] vmalloc: back off when the current task is OOM-killed
Date: Fri, 17 Sep 2021 11:06:49 +0300	[thread overview]
Message-ID: <d07a5540-3e07-44ba-1e59-067500f024d9@virtuozzo.com> (raw)
In-Reply-To: <YT8PEBbYZhLixEJD-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>

Huge vmalloc allocation on heavy loaded node can lead to a global
memory shortage. A task called vmalloc can have the worst badness
and be chosen by OOM-killer, however received fatal signal and
oom victim mark does not interrupt allocation cycle. Vmalloc will
continue allocating pages over and over again, exacerbating the crisis
and consuming the memory freed up by another killed tasks.

This patch allows OOM-killer to break vmalloc cycle, makes OOM more
effective and avoid host panic.

Unfortunately it is not 100% safe. Previous attempt to break vmalloc
cycle was reverted by commit b8c8a338f75e ("Revert "vmalloc: back off when
the current task is killed"") due to some vmalloc callers did not handled
failures properly. Found issues was resolved, however, there may
be other similar places.

Such failures may be acceptable for emergencies, such as OOM. On the other
hand, we would like to detect them earlier. However they are quite rare,
and will be hidden by OOM messages, so I'm afraid they wikk have quite
small chance of being noticed and reported.

To improve the detection of such places this patch also interrupts the vmalloc
allocation cycle for all fatal signals. The checks are hidden under DEBUG_VM
config option to do not break unaware production kernels.

Vmalloc uses new alloc_pages_bulk subsystem, so newly added checks can
affect other users of this subsystem.

Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
---
 mm/page_alloc.c | 5 +++++
 mm/vmalloc.c    | 6 ++++++
 2 files changed, 11 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b37435c274cf..133d52e507ff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5288,6 +5288,11 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 			continue;
 		}
 
+		if (tsk_is_oom_victim(current) ||
+		    (IS_ENABLED(CONFIG_DEBUG_VM) &&
+		     fatal_signal_pending(current)))
+			break;
+
 		page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
 								pcp, pcp_list);
 		if (unlikely(!page)) {
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index c3b8e3e5cfc5..04b291076726 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -38,6 +38,7 @@
 #include <linux/pgtable.h>
 #include <linux/uaccess.h>
 #include <linux/hugetlb.h>
+#include <linux/oom.h>
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
@@ -2860,6 +2861,11 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 		struct page *page;
 		int i;
 
+		if (tsk_is_oom_victim(current) ||
+		    (IS_ENABLED(CONFIG_DEBUG_VM) &&
+		     fatal_signal_pending(current)))
+			break;
+
 		page = alloc_pages_node(nid, gfp, order);
 		if (unlikely(!page))
 			break;
-- 
2.31.1


  reply	other threads:[~2021-09-17  8:08 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-10 12:39 [PATCH memcg] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
2021-09-10 13:04 ` Tetsuo Handa
2021-09-10 13:04   ` Tetsuo Handa
2021-09-10 13:20   ` Vasily Averin
2021-09-10 13:20     ` Vasily Averin
2021-09-10 14:55     ` Michal Hocko
2021-09-13  8:29       ` Vasily Averin
2021-09-13  8:29         ` Vasily Averin
2021-09-13  8:42         ` Michal Hocko
2021-09-13  8:42           ` Michal Hocko
2021-09-17  8:06           ` Vasily Averin [this message]
2021-09-17  8:06             ` [PATCH mm] vmalloc: back off when the current task is OOM-killed Vasily Averin
2021-09-19 23:31             ` Andrew Morton
2021-09-19 23:31               ` Andrew Morton
2021-09-20  1:22               ` Tetsuo Handa
2021-09-20 10:59                 ` Vasily Averin
2021-09-20 10:59                   ` Vasily Averin
2021-09-21 18:55                   ` Andrew Morton
2021-09-22  6:18                     ` Vasily Averin
2021-09-22 12:27             ` Michal Hocko
2021-09-22 12:27               ` Michal Hocko
2021-09-23  6:49               ` Vasily Averin
2021-09-23  6:49                 ` Vasily Averin
2021-09-24  7:55                 ` Michal Hocko
2021-09-24  7:55                   ` Michal Hocko
2021-09-27  9:36                   ` Vasily Averin
2021-09-27  9:36                     ` Vasily Averin
2021-09-27 11:08                     ` Michal Hocko
2021-09-27 11:08                       ` Michal Hocko
2021-10-05 13:52                       ` [PATCH mm v2] " Vasily Averin
2021-10-05 13:52                         ` Vasily Averin
2021-10-05 14:00                         ` Vasily Averin
2021-10-05 14:00                           ` Vasily Averin
2021-10-07 10:47                         ` Michal Hocko
2021-10-07 10:47                           ` Michal Hocko
2021-10-07 19:55                         ` Andrew Morton
2021-10-07 19:55                           ` Andrew Morton
2021-09-10 13:07 ` [PATCH memcg] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
2021-09-10 13:07   ` Vasily Averin
2021-09-13  7:51 ` Vasily Averin
2021-09-13  7:51   ` Vasily Averin
2021-09-13  8:39   ` Michal Hocko
2021-09-13  8:39     ` Michal Hocko
2021-09-13  9:37     ` Vasily Averin
2021-09-13  9:37       ` Vasily Averin
2021-09-13 10:10       ` Michal Hocko
2021-09-13 10:10         ` Michal Hocko
2021-09-13  8:53 ` Michal Hocko
2021-09-13 10:35   ` Vasily Averin
2021-09-13 10:35     ` Vasily Averin
2021-09-13 10:55     ` Michal Hocko
2021-09-13 10:55       ` Michal Hocko
2021-09-14 10:01       ` Vasily Averin
2021-09-14 10:01         ` Vasily Averin
2021-09-14 10:10         ` [PATCH memcg v2] " Vasily Averin
2021-09-14 10:10           ` Vasily Averin
2021-09-16 12:55           ` Michal Hocko
2021-09-16 12:55             ` Michal Hocko
2021-10-05 13:52             ` [PATCH memcg v3] " Vasily Averin
2021-10-05 13:52               ` Vasily Averin
2021-10-05 14:55               ` Michal Hocko
2021-10-05 14:55                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d07a5540-3e07-44ba-1e59-067500f024d9@virtuozzo.com \
    --to=vvs@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.