All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vasily Averin <vvs@virtuozzo.com>
To: Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <guro@fb.com>, Uladzislau Rezki <urezki@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Shakeel Butt <shakeelb@google.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel@openvz.org
Subject: [PATCH memcg v2 2/2] memcg: prohibit unconditional exceeding the limit of dying tasks
Date: Fri, 22 Oct 2021 11:11:29 +0300	[thread overview]
Message-ID: <4b315938-5600-b7f5-bde9-82f638a2e595@virtuozzo.com> (raw)
In-Reply-To: <cover.1634889066.git.vvs@virtuozzo.com>

Memory cgroup charging allows killed or exiting tasks to exceed the hard
limit. It is assumed that the amount of the memory charged by those
tasks is bound and most of the memory will get released while the task
is exiting. This is resembling a heuristic for the global OOM situation
when tasks get access to memory reserves. There is no global memory
shortage at the memcg level so the memcg heuristic is more relieved.

The above assumption is overly optimistic though. E.g. vmalloc can scale
to really large requests and the heuristic would allow that. We used to
have an early break in the vmalloc allocator for killed tasks but this
has been reverted by commit b8c8a338f75e ("Revert "vmalloc: back off when
the current task is killed""). There are likely other similar code paths
which do not check for fatal signals in an allocation&charge loop.
Also there are some kernel objects charged to a memcg which are not
bound to a process life time.

It has been observed that it is not really hard to trigger these
bypasses and cause global OOM situation.

One potential way to address these runaways would be to limit the amount
of excess (similar to the global OOM with limited oom reserves). This is
certainly possible but it is not really clear how much of an excess is
desirable and still protects from global OOMs as that would have to
consider the overall memcg configuration.

This patch is addressing the problem by removing the heuristic
altogether. Bypass is only allowed for requests which either cannot fail
or where the failure is not desirable while excess should be still
limited (e.g. atomic requests). Implementation wise a killed or dying
task fails to charge if it has passed the OOM killer stage. That should
give all forms of reclaim chance to restore the limit before the
failure (ENOMEM) and tell the caller to back off.

In addition, this patch renames should_force_charge() helper
to task_is_dying() because now its use is not associated witch forced
charging.

Fixes: a636b327f731 ("memcg: avoid unnecessary system-wide-oom-killer")
Cc: stable@vger.kernel.org
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
 mm/memcontrol.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6da5020a8656..87e41c3cac10 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -239,7 +239,7 @@ enum res_type {
 	     iter != NULL;				\
 	     iter = mem_cgroup_iter(NULL, iter, NULL))
 
-static inline bool should_force_charge(void)
+static inline bool task_is_dying(void)
 {
 	return tsk_is_oom_victim(current) || fatal_signal_pending(current) ||
 		(current->flags & PF_EXITING);
@@ -1575,7 +1575,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 * A few threads which were not waiting at mutex_lock_killable() can
 	 * fail to bail out. Therefore, check again after holding oom_lock.
 	 */
-	ret = should_force_charge() || out_of_memory(&oc);
+	ret = task_is_dying() || out_of_memory(&oc);
 
 unlock:
 	mutex_unlock(&oom_lock);
@@ -2530,6 +2530,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	struct page_counter *counter;
 	enum oom_status oom_status;
 	unsigned long nr_reclaimed;
+	bool passed_oom = false;
 	bool may_swap = true;
 	bool drained = false;
 	unsigned long pflags;
@@ -2564,15 +2565,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (gfp_mask & __GFP_ATOMIC)
 		goto force;
 
-	/*
-	 * Unlike in global OOM situations, memcg is not in a physical
-	 * memory shortage.  Allow dying and OOM-killed tasks to
-	 * bypass the last charges so that they can exit quickly and
-	 * free their memory.
-	 */
-	if (unlikely(should_force_charge()))
-		goto force;
-
 	/*
 	 * Prevent unbounded recursion when reclaim operations need to
 	 * allocate memory. This might exceed the limits temporarily,
@@ -2630,8 +2622,9 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (gfp_mask & __GFP_RETRY_MAYFAIL)
 		goto nomem;
 
-	if (fatal_signal_pending(current))
-		goto force;
+	/* Avoid endless loop for tasks bypassed by the oom killer */
+	if (passed_oom && task_is_dying())
+		goto nomem;
 
 	/*
 	 * keep retrying as long as the memcg oom killer is able to make
@@ -2640,14 +2633,10 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 */
 	oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask,
 		       get_order(nr_pages * PAGE_SIZE));
-	switch (oom_status) {
-	case OOM_SUCCESS:
+	if (oom_status == OOM_SUCCESS) {
+		passed_oom = true;
 		nr_retries = MAX_RECLAIM_RETRIES;
 		goto retry;
-	case OOM_FAILED:
-		goto force;
-	default:
-		goto nomem;
 	}
 nomem:
 	if (!(gfp_mask & __GFP_NOFAIL))
-- 
2.32.0


WARNING: multiple messages have this Message-ID (diff)
From: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
To: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Vladimir Davydov
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>,
	Uladzislau Rezki <urezki-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org>,
	Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Mel Gorman
	<mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org>,
	Tetsuo Handa
	<penguin-kernel-1yMVhJb1mP/7nzcFbJAaVXf5DAMn2ifp@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org
Subject: [PATCH memcg v2 2/2] memcg: prohibit unconditional exceeding the limit of dying tasks
Date: Fri, 22 Oct 2021 11:11:29 +0300	[thread overview]
Message-ID: <4b315938-5600-b7f5-bde9-82f638a2e595@virtuozzo.com> (raw)
In-Reply-To: <cover.1634889066.git.vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>

Memory cgroup charging allows killed or exiting tasks to exceed the hard
limit. It is assumed that the amount of the memory charged by those
tasks is bound and most of the memory will get released while the task
is exiting. This is resembling a heuristic for the global OOM situation
when tasks get access to memory reserves. There is no global memory
shortage at the memcg level so the memcg heuristic is more relieved.

The above assumption is overly optimistic though. E.g. vmalloc can scale
to really large requests and the heuristic would allow that. We used to
have an early break in the vmalloc allocator for killed tasks but this
has been reverted by commit b8c8a338f75e ("Revert "vmalloc: back off when
the current task is killed""). There are likely other similar code paths
which do not check for fatal signals in an allocation&charge loop.
Also there are some kernel objects charged to a memcg which are not
bound to a process life time.

It has been observed that it is not really hard to trigger these
bypasses and cause global OOM situation.

One potential way to address these runaways would be to limit the amount
of excess (similar to the global OOM with limited oom reserves). This is
certainly possible but it is not really clear how much of an excess is
desirable and still protects from global OOMs as that would have to
consider the overall memcg configuration.

This patch is addressing the problem by removing the heuristic
altogether. Bypass is only allowed for requests which either cannot fail
or where the failure is not desirable while excess should be still
limited (e.g. atomic requests). Implementation wise a killed or dying
task fails to charge if it has passed the OOM killer stage. That should
give all forms of reclaim chance to restore the limit before the
failure (ENOMEM) and tell the caller to back off.

In addition, this patch renames should_force_charge() helper
to task_is_dying() because now its use is not associated witch forced
charging.

Fixes: a636b327f731 ("memcg: avoid unnecessary system-wide-oom-killer")
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Suggested-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
---
 mm/memcontrol.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6da5020a8656..87e41c3cac10 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -239,7 +239,7 @@ enum res_type {
 	     iter != NULL;				\
 	     iter = mem_cgroup_iter(NULL, iter, NULL))
 
-static inline bool should_force_charge(void)
+static inline bool task_is_dying(void)
 {
 	return tsk_is_oom_victim(current) || fatal_signal_pending(current) ||
 		(current->flags & PF_EXITING);
@@ -1575,7 +1575,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 * A few threads which were not waiting at mutex_lock_killable() can
 	 * fail to bail out. Therefore, check again after holding oom_lock.
 	 */
-	ret = should_force_charge() || out_of_memory(&oc);
+	ret = task_is_dying() || out_of_memory(&oc);
 
 unlock:
 	mutex_unlock(&oom_lock);
@@ -2530,6 +2530,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	struct page_counter *counter;
 	enum oom_status oom_status;
 	unsigned long nr_reclaimed;
+	bool passed_oom = false;
 	bool may_swap = true;
 	bool drained = false;
 	unsigned long pflags;
@@ -2564,15 +2565,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (gfp_mask & __GFP_ATOMIC)
 		goto force;
 
-	/*
-	 * Unlike in global OOM situations, memcg is not in a physical
-	 * memory shortage.  Allow dying and OOM-killed tasks to
-	 * bypass the last charges so that they can exit quickly and
-	 * free their memory.
-	 */
-	if (unlikely(should_force_charge()))
-		goto force;
-
 	/*
 	 * Prevent unbounded recursion when reclaim operations need to
 	 * allocate memory. This might exceed the limits temporarily,
@@ -2630,8 +2622,9 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (gfp_mask & __GFP_RETRY_MAYFAIL)
 		goto nomem;
 
-	if (fatal_signal_pending(current))
-		goto force;
+	/* Avoid endless loop for tasks bypassed by the oom killer */
+	if (passed_oom && task_is_dying())
+		goto nomem;
 
 	/*
 	 * keep retrying as long as the memcg oom killer is able to make
@@ -2640,14 +2633,10 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 */
 	oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask,
 		       get_order(nr_pages * PAGE_SIZE));
-	switch (oom_status) {
-	case OOM_SUCCESS:
+	if (oom_status == OOM_SUCCESS) {
+		passed_oom = true;
 		nr_retries = MAX_RECLAIM_RETRIES;
 		goto retry;
-	case OOM_FAILED:
-		goto force;
-	default:
-		goto nomem;
 	}
 nomem:
 	if (!(gfp_mask & __GFP_NOFAIL))
-- 
2.32.0


  parent reply	other threads:[~2021-10-22  8:11 UTC|newest]

Thread overview: 131+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-18  8:13 [PATCH memcg 0/1] false global OOM triggered by memcg-limited task Vasily Averin
2021-10-18  8:13 ` Vasily Averin
2021-10-18  9:04 ` Michal Hocko
2021-10-18  9:04   ` Michal Hocko
2021-10-18 10:05   ` Vasily Averin
2021-10-18 10:05     ` Vasily Averin
2021-10-18 10:12     ` Vasily Averin
2021-10-18 10:12       ` Vasily Averin
2021-10-18 11:53     ` Michal Hocko
2021-10-18 11:53       ` Michal Hocko
     [not found]       ` <27dc0c49-a0d6-875b-49c6-0ef5c0cc3ac8@virtuozzo.com>
2021-10-18 12:27         ` Michal Hocko
2021-10-18 12:27           ` Michal Hocko
2021-10-18 15:07           ` Shakeel Butt
2021-10-18 15:07             ` Shakeel Butt
2021-10-18 16:51             ` Michal Hocko
2021-10-18 16:51               ` Michal Hocko
2021-10-18 17:13               ` Shakeel Butt
2021-10-18 18:52             ` Vasily Averin
2021-10-18 18:52               ` Vasily Averin
2021-10-18 19:18               ` Vasily Averin
2021-10-18 19:18                 ` Vasily Averin
2021-10-19  5:34                 ` Shakeel Butt
2021-10-19  5:34                   ` Shakeel Butt
2021-10-19  5:33               ` Shakeel Butt
2021-10-19  5:33                 ` Shakeel Butt
2021-10-19  6:42                 ` Vasily Averin
2021-10-19  6:42                   ` Vasily Averin
2021-10-19  8:47                   ` Michal Hocko
2021-10-19  8:47                     ` Michal Hocko
2021-10-19  6:30       ` Vasily Averin
2021-10-19  6:30         ` Vasily Averin
2021-10-19  8:49         ` Michal Hocko
2021-10-19  8:49           ` Michal Hocko
2021-10-19 10:30           ` Vasily Averin
2021-10-19 10:30             ` Vasily Averin
2021-10-19 11:54             ` Michal Hocko
2021-10-19 11:54               ` Michal Hocko
2021-10-19 12:04               ` Michal Hocko
2021-10-19 12:04                 ` Michal Hocko
2021-10-19 13:26                 ` Vasily Averin
2021-10-19 13:26                   ` Vasily Averin
2021-10-19 14:13                   ` Michal Hocko
2021-10-19 14:13                     ` Michal Hocko
2021-10-19 14:19                     ` Michal Hocko
2021-10-19 14:19                       ` Michal Hocko
2021-10-19 19:09                     ` Vasily Averin
2021-10-19 19:09                       ` Vasily Averin
2021-10-20  8:07                       ` [PATCH memcg v4] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
2021-10-20  8:07                         ` Vasily Averin
2021-10-20  8:43                         ` Michal Hocko
2021-10-20  8:43                           ` Michal Hocko
2021-10-20 12:11                           ` [PATCH memcg RFC 0/3] " Vasily Averin
2021-10-20 12:11                             ` Vasily Averin
     [not found]                           ` <cover.1634730787.git.vvs@virtuozzo.com>
2021-10-20 12:12                             ` [PATCH memcg 1/3] mm: do not firce global OOM from inside " Vasily Averin
2021-10-20 12:12                               ` Vasily Averin
2021-10-20 12:33                               ` Michal Hocko
2021-10-20 12:33                                 ` Michal Hocko
2021-10-20 13:52                                 ` Vasily Averin
2021-10-20 13:52                                   ` Vasily Averin
2021-10-20 12:13                             ` [PATCH memcg 2/3] memcg: remove charge forcinig for " Vasily Averin
2021-10-20 12:13                               ` Vasily Averin
2021-10-20 12:41                               ` Michal Hocko
2021-10-20 12:41                                 ` Michal Hocko
2021-10-20 14:21                                 ` Vasily Averin
2021-10-20 14:21                                   ` Vasily Averin
2021-10-20 14:57                                   ` Michal Hocko
2021-10-20 14:57                                     ` Michal Hocko
2021-10-20 15:20                                     ` Tetsuo Handa
2021-10-20 15:20                                       ` Tetsuo Handa
2021-10-21 10:03                                       ` Michal Hocko
2021-10-21 10:03                                         ` Michal Hocko
2021-10-20 12:14                             ` [PATCH memcg 3/3] memcg: handle memcg oom failures Vasily Averin
2021-10-20 12:14                               ` Vasily Averin
2021-10-20 13:02                               ` Michal Hocko
2021-10-20 15:46                                 ` Vasily Averin
2021-10-20 15:46                                   ` Vasily Averin
2021-10-21 11:49                                   ` Michal Hocko
2021-10-21 11:49                                     ` Michal Hocko
2021-10-21 15:05                                     ` Vasily Averin
2021-10-21 15:05                                       ` Vasily Averin
2021-10-21 16:47                                       ` Michal Hocko
2021-10-21 16:47                                         ` Michal Hocko
2021-10-22  8:10                                         ` [PATCH memcg v2 0/2] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
2021-10-22  8:10                                           ` Vasily Averin
     [not found]                                         ` <cover.1634889066.git.vvs@virtuozzo.com>
2021-10-22  8:11                                           ` [PATCH memcg v2 1/2] mm, oom: do not trigger out_of_memory from the #PF Vasily Averin
2021-10-22  8:11                                             ` Vasily Averin
2021-10-22  8:55                                             ` Michal Hocko
2021-10-22  8:55                                               ` Michal Hocko
2021-10-22  8:11                                           ` Vasily Averin [this message]
2021-10-22  8:11                                             ` [PATCH memcg v2 2/2] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
2021-10-22  9:10                                             ` Michal Hocko
2021-10-22  9:10                                               ` Michal Hocko
2021-10-23 13:18                                               ` [PATCH memcg v3 0/3] " Vasily Averin
2021-10-23 13:18                                                 ` Vasily Averin
     [not found]                                               ` <cover.1634994605.git.vvs@virtuozzo.com>
2021-10-23 13:19                                                 ` [PATCH memcg v3 1/3] mm, oom: pagefault_out_of_memory: don't force global OOM for " Vasily Averin
2021-10-23 13:19                                                   ` Vasily Averin
2021-10-25  9:27                                                   ` Michal Hocko
2021-10-25  9:27                                                     ` Michal Hocko
2021-10-23 13:20                                                 ` [PATCH memcg v3 2/3] mm, oom: do not trigger out_of_memory from the #PF Vasily Averin
2021-10-23 13:20                                                   ` Vasily Averin
2021-10-23 15:01                                                   ` Tetsuo Handa
2021-10-23 15:01                                                     ` Tetsuo Handa
2021-10-23 19:15                                                     ` Vasily Averin
2021-10-25  8:04                                                     ` Michal Hocko
2021-10-25  8:04                                                       ` Michal Hocko
2021-10-26 13:56                                                       ` Tetsuo Handa
2021-10-26 13:56                                                         ` Tetsuo Handa
2021-10-26 14:07                                                         ` Michal Hocko
2021-10-26 14:07                                                           ` Michal Hocko
2021-10-25  9:34                                                   ` Michal Hocko
2021-10-25  9:34                                                     ` Michal Hocko
2021-10-23 13:20                                                 ` [PATCH memcg v3 3/3] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
2021-10-23 13:20                                                   ` Vasily Averin
2021-10-25  9:36                                                   ` Michal Hocko
2021-10-25  9:36                                                     ` Michal Hocko
2021-10-27 22:36                                                     ` Andrew Morton
2021-10-27 22:36                                                       ` Andrew Morton
2021-10-28  7:22                                                       ` Vasily Averin
2021-10-28  7:22                                                         ` Vasily Averin
2021-10-29  7:46                                                         ` Greg Kroah-Hartman
2021-10-29  7:46                                                           ` Greg Kroah-Hartman
2021-10-29  7:58                                                       ` Michal Hocko
2021-10-29  7:58                                                         ` Michal Hocko
2021-11-12 23:48                         ` [PATCH memcg v4] " kernel test robot
2021-11-26  4:32                         ` kernel test robot
2021-10-21  8:03   ` [PATCH memcg 0/1] false global OOM triggered by memcg-limited task Vasily Averin
2021-10-21  8:03     ` Vasily Averin
2021-10-21 11:49     ` Michal Hocko
2021-10-21 11:49       ` Michal Hocko
2021-10-21 13:24       ` Vasily Averin
2021-10-21 13:24         ` Vasily Averin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b315938-5600-b7f5-bde9-82f638a2e595@virtuozzo.com \
    --to=vvs@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=shakeelb@google.com \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.