All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vasily Averin <vvs@virtuozzo.com>
To: Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel@openvz.org
Subject: [PATCH memcg v2] memcg: prohibit unconditional exceeding the limit of dying tasks
Date: Tue, 14 Sep 2021 13:10:04 +0300	[thread overview]
Message-ID: <817a6ce2-4da9-72ac-c5b9-edd398d28a15@virtuozzo.com> (raw)
In-Reply-To: <bab6c1d2-38d8-9098-206f-54894f9871b6@virtuozzo.com>

The kernel currently allows dying tasks to exceed the memcg limits.
The allocation is expected to be the last one and the occupied memory
will be freed soon.
This is not always true because it can be part of the huge vmalloc
allocation. Allowed once, they will repeat over and over again.
Moreover lifetime of the allocated object can differ from the lifetime
of the dying task.
Multiple such allocations running concurrently can not only overuse
the memcg limit, but can lead to a global out of memory and,
in the worst case, cause the host to panic.

This patch removes checks forced exceed of the memcg limit for dying
tasks. Also it breaks endless loop for tasks bypassed by the oom killer.
In addition, it renames should_force_charge() helper to task_is_dying()
because now its use do not lead to the forced charge.

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
 mm/memcontrol.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 389b5766e74f..707f6640edda 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -234,7 +234,7 @@ enum res_type {
 	     iter != NULL;				\
 	     iter = mem_cgroup_iter(NULL, iter, NULL))
 
-static inline bool should_force_charge(void)
+static inline bool task_is_dying(void)
 {
 	return tsk_is_oom_victim(current) || fatal_signal_pending(current) ||
 		(current->flags & PF_EXITING);
@@ -1607,7 +1607,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 * A few threads which were not waiting at mutex_lock_killable() can
 	 * fail to bail out. Therefore, check again after holding oom_lock.
 	 */
-	ret = should_force_charge() || out_of_memory(&oc);
+	ret = task_is_dying() || out_of_memory(&oc);
 
 unlock:
 	mutex_unlock(&oom_lock);
@@ -2588,6 +2588,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	struct page_counter *counter;
 	enum oom_status oom_status;
 	unsigned long nr_reclaimed;
+	bool passed_oom = false;
 	bool may_swap = true;
 	bool drained = false;
 	unsigned long pflags;
@@ -2622,15 +2623,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (gfp_mask & __GFP_ATOMIC)
 		goto force;
 
-	/*
-	 * Unlike in global OOM situations, memcg is not in a physical
-	 * memory shortage.  Allow dying and OOM-killed tasks to
-	 * bypass the last charges so that they can exit quickly and
-	 * free their memory.
-	 */
-	if (unlikely(should_force_charge()))
-		goto force;
-
 	/*
 	 * Prevent unbounded recursion when reclaim operations need to
 	 * allocate memory. This might exceed the limits temporarily,
@@ -2688,8 +2680,9 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (gfp_mask & __GFP_RETRY_MAYFAIL)
 		goto nomem;
 
-	if (fatal_signal_pending(current))
-		goto force;
+	/* Avoid endless loop for tasks bypassed by the oom killer */
+	if (passed_oom && task_is_dying())
+		goto nomem;
 
 	/*
 	 * keep retrying as long as the memcg oom killer is able to make
@@ -2698,14 +2691,10 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 */
 	oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask,
 		       get_order(nr_pages * PAGE_SIZE));
-	switch (oom_status) {
-	case OOM_SUCCESS:
+	if (oom_status == OOM_SUCCESS) {
+		passed_oom = true;
 		nr_retries = MAX_RECLAIM_RETRIES;
 		goto retry;
-	case OOM_FAILED:
-		goto force;
-	default:
-		goto nomem;
 	}
 nomem:
 	if (!(gfp_mask & __GFP_NOFAIL))
-- 
2.31.1


WARNING: multiple messages have this Message-ID (diff)
From: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
To: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Vladimir Davydov
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Tetsuo Handa
	<penguin-kernel-JPay3/Yim36HaxMnTkn67Xf5DAMn2ifp@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org
Subject: [PATCH memcg v2] memcg: prohibit unconditional exceeding the limit of dying tasks
Date: Tue, 14 Sep 2021 13:10:04 +0300	[thread overview]
Message-ID: <817a6ce2-4da9-72ac-c5b9-edd398d28a15@virtuozzo.com> (raw)
In-Reply-To: <bab6c1d2-38d8-9098-206f-54894f9871b6-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>

The kernel currently allows dying tasks to exceed the memcg limits.
The allocation is expected to be the last one and the occupied memory
will be freed soon.
This is not always true because it can be part of the huge vmalloc
allocation. Allowed once, they will repeat over and over again.
Moreover lifetime of the allocated object can differ from the lifetime
of the dying task.
Multiple such allocations running concurrently can not only overuse
the memcg limit, but can lead to a global out of memory and,
in the worst case, cause the host to panic.

This patch removes checks forced exceed of the memcg limit for dying
tasks. Also it breaks endless loop for tasks bypassed by the oom killer.
In addition, it renames should_force_charge() helper to task_is_dying()
because now its use do not lead to the forced charge.

Suggested-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
---
 mm/memcontrol.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 389b5766e74f..707f6640edda 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -234,7 +234,7 @@ enum res_type {
 	     iter != NULL;				\
 	     iter = mem_cgroup_iter(NULL, iter, NULL))
 
-static inline bool should_force_charge(void)
+static inline bool task_is_dying(void)
 {
 	return tsk_is_oom_victim(current) || fatal_signal_pending(current) ||
 		(current->flags & PF_EXITING);
@@ -1607,7 +1607,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 * A few threads which were not waiting at mutex_lock_killable() can
 	 * fail to bail out. Therefore, check again after holding oom_lock.
 	 */
-	ret = should_force_charge() || out_of_memory(&oc);
+	ret = task_is_dying() || out_of_memory(&oc);
 
 unlock:
 	mutex_unlock(&oom_lock);
@@ -2588,6 +2588,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	struct page_counter *counter;
 	enum oom_status oom_status;
 	unsigned long nr_reclaimed;
+	bool passed_oom = false;
 	bool may_swap = true;
 	bool drained = false;
 	unsigned long pflags;
@@ -2622,15 +2623,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (gfp_mask & __GFP_ATOMIC)
 		goto force;
 
-	/*
-	 * Unlike in global OOM situations, memcg is not in a physical
-	 * memory shortage.  Allow dying and OOM-killed tasks to
-	 * bypass the last charges so that they can exit quickly and
-	 * free their memory.
-	 */
-	if (unlikely(should_force_charge()))
-		goto force;
-
 	/*
 	 * Prevent unbounded recursion when reclaim operations need to
 	 * allocate memory. This might exceed the limits temporarily,
@@ -2688,8 +2680,9 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (gfp_mask & __GFP_RETRY_MAYFAIL)
 		goto nomem;
 
-	if (fatal_signal_pending(current))
-		goto force;
+	/* Avoid endless loop for tasks bypassed by the oom killer */
+	if (passed_oom && task_is_dying())
+		goto nomem;
 
 	/*
 	 * keep retrying as long as the memcg oom killer is able to make
@@ -2698,14 +2691,10 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 */
 	oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask,
 		       get_order(nr_pages * PAGE_SIZE));
-	switch (oom_status) {
-	case OOM_SUCCESS:
+	if (oom_status == OOM_SUCCESS) {
+		passed_oom = true;
 		nr_retries = MAX_RECLAIM_RETRIES;
 		goto retry;
-	case OOM_FAILED:
-		goto force;
-	default:
-		goto nomem;
 	}
 nomem:
 	if (!(gfp_mask & __GFP_NOFAIL))
-- 
2.31.1


  reply	other threads:[~2021-09-14 10:10 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-10 12:39 [PATCH memcg] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
2021-09-10 13:04 ` Tetsuo Handa
2021-09-10 13:04   ` Tetsuo Handa
2021-09-10 13:20   ` Vasily Averin
2021-09-10 13:20     ` Vasily Averin
2021-09-10 14:55     ` Michal Hocko
2021-09-13  8:29       ` Vasily Averin
2021-09-13  8:29         ` Vasily Averin
2021-09-13  8:42         ` Michal Hocko
2021-09-13  8:42           ` Michal Hocko
2021-09-17  8:06           ` [PATCH mm] vmalloc: back off when the current task is OOM-killed Vasily Averin
2021-09-17  8:06             ` Vasily Averin
2021-09-19 23:31             ` Andrew Morton
2021-09-19 23:31               ` Andrew Morton
2021-09-20  1:22               ` Tetsuo Handa
2021-09-20 10:59                 ` Vasily Averin
2021-09-20 10:59                   ` Vasily Averin
2021-09-21 18:55                   ` Andrew Morton
2021-09-22  6:18                     ` Vasily Averin
2021-09-22 12:27             ` Michal Hocko
2021-09-22 12:27               ` Michal Hocko
2021-09-23  6:49               ` Vasily Averin
2021-09-23  6:49                 ` Vasily Averin
2021-09-24  7:55                 ` Michal Hocko
2021-09-24  7:55                   ` Michal Hocko
2021-09-27  9:36                   ` Vasily Averin
2021-09-27  9:36                     ` Vasily Averin
2021-09-27 11:08                     ` Michal Hocko
2021-09-27 11:08                       ` Michal Hocko
2021-10-05 13:52                       ` [PATCH mm v2] " Vasily Averin
2021-10-05 13:52                         ` Vasily Averin
2021-10-05 14:00                         ` Vasily Averin
2021-10-05 14:00                           ` Vasily Averin
2021-10-07 10:47                         ` Michal Hocko
2021-10-07 10:47                           ` Michal Hocko
2021-10-07 19:55                         ` Andrew Morton
2021-10-07 19:55                           ` Andrew Morton
2021-09-10 13:07 ` [PATCH memcg] memcg: prohibit unconditional exceeding the limit of dying tasks Vasily Averin
2021-09-10 13:07   ` Vasily Averin
2021-09-13  7:51 ` Vasily Averin
2021-09-13  7:51   ` Vasily Averin
2021-09-13  8:39   ` Michal Hocko
2021-09-13  8:39     ` Michal Hocko
2021-09-13  9:37     ` Vasily Averin
2021-09-13  9:37       ` Vasily Averin
2021-09-13 10:10       ` Michal Hocko
2021-09-13 10:10         ` Michal Hocko
2021-09-13  8:53 ` Michal Hocko
2021-09-13 10:35   ` Vasily Averin
2021-09-13 10:35     ` Vasily Averin
2021-09-13 10:55     ` Michal Hocko
2021-09-13 10:55       ` Michal Hocko
2021-09-14 10:01       ` Vasily Averin
2021-09-14 10:01         ` Vasily Averin
2021-09-14 10:10         ` Vasily Averin [this message]
2021-09-14 10:10           ` [PATCH memcg v2] " Vasily Averin
2021-09-16 12:55           ` Michal Hocko
2021-09-16 12:55             ` Michal Hocko
2021-10-05 13:52             ` [PATCH memcg v3] " Vasily Averin
2021-10-05 13:52               ` Vasily Averin
2021-10-05 14:55               ` Michal Hocko
2021-10-05 14:55                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=817a6ce2-4da9-72ac-c5b9-edd398d28a15@virtuozzo.com \
    --to=vvs@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.