All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: penguin-kernel@i-love.sakura.ne.jp, rientjes@google.com,
	akpm@linux-foundation.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm, oom: check memcg margin for parallel oom
Date: Tue, 14 Jul 2020 14:37:26 +0200	[thread overview]
Message-ID: <20200714123726.GI24642@dhcp22.suse.cz> (raw)
In-Reply-To: <1594728512-18969-1-git-send-email-laoar.shao@gmail.com>

On Tue 14-07-20 08:08:32, Yafang Shao wrote:
> The commit 7775face2079 ("memcg: killed threads should not invoke memcg OOM
> killer") resolves the problem that different threads in a multi-threaded
> task doing parallel memcg oom, but it doesn't solve the problem that
> different tasks doing parallel memcg oom.
> 
> It may happens that many different tasks in the same memcg are waiting
> oom_lock at the same time, if one of them has already made progress and
> freed enough available memory, the others don't need to trigger the oom
> killer again. By checking memcg margin after hold oom_lock can help
> achieve it.

While the changelog makes sense it I believe it can be improved. I would
use something like the following. Feel free to use its parts.

"
Memcg oom killer invocation is synchronized by the global oom_lock and
tasks are sleeping on the lock while somebody is selecting the victim or
potentially race with the oom_reaper is releasing the victim's memory.
This can result in a pointless oom killer invocation because a waiter
might be racing with the oom_reaper

	P1		oom_reaper		P2
			oom_reap_task		mutex_lock(oom_lock)
						out_of_memory # no victim because we have one already
			__oom_reap_task_mm	mute_unlock(oom_lock)
mutex_lock(oom_lock)
			set MMF_OOM_SKIP
select_bad_process
# finds a new victim

The page allocator prevents from this race by trying to allocate after
the lock can be acquired (in __alloc_pages_may_oom) which acts as a last
minute check. Moreover page allocator simply doesn't block on the
oom_lock and simply retries the whole reclaim process.

Memcg oom killer should do the last minute check as well. Call
mem_cgroup_margin to do that. Trylock on the oom_lock could be done as
well but this doesn't seem to be necessary at this stage.
"

> Suggested-by: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/memcontrol.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1962232..df141e1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1560,16 +1560,31 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  		.gfp_mask = gfp_mask,
>  		.order = order,
>  	};
> -	bool ret;
> +	bool ret = true;
>  
>  	if (mutex_lock_killable(&oom_lock))
>  		return true;
> +
>  	/*
>  	 * A few threads which were not waiting at mutex_lock_killable() can
>  	 * fail to bail out. Therefore, check again after holding oom_lock.
>  	 */
> -	ret = should_force_charge() || out_of_memory(&oc);
> +	if (should_force_charge())
> +		goto out;
> +
> +	/*
> +	 * Different tasks may be doing parallel oom, so after hold the
> +	 * oom_lock the task should check the memcg margin again to check
> +	 * whether other task has already made progress.
> +	 */
> +	if (mem_cgroup_margin(memcg) >= (1 << order))
> +		goto out;

Is there any reason why you simply haven't done this? (+ your comment
which is helpful).

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 248e6cad0095..2c176825efe3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1561,15 +1561,21 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 		.order = order,
 		.chosen_points = LONG_MIN,
 	};
-	bool ret;
+	bool ret = true;
 
-	if (mutex_lock_killable(&oom_lock))
+	if (!mutex_trylock(&oom_lock))
 		return true;
+
+	if (mem_cgroup_margin(memcg) >= (1 << order))
+		goto unlock;
+
 	/*
 	 * A few threads which were not waiting at mutex_lock_killable() can
 	 * fail to bail out. Therefore, check again after holding oom_lock.
 	 */
 	ret = should_force_charge() || out_of_memory(&oc);
+
+unlock:
 	mutex_unlock(&oom_lock);
 	return ret;
 }
> +
> +	ret = out_of_memory(&oc);
> +
> +out:
>  	mutex_unlock(&oom_lock);
> +
>  	return ret;
>  }
>  
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2020-07-14 12:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-14 12:08 [PATCH] mm, oom: check memcg margin for parallel oom Yafang Shao
2020-07-14 12:37 ` Michal Hocko [this message]
2020-07-14 12:38   ` Michal Hocko
2020-07-14 13:26     ` Yafang Shao
2020-07-14 12:47   ` Tetsuo Handa
2020-07-14 12:57     ` Michal Hocko
2020-07-14 13:25   ` Yafang Shao
2020-07-14 13:34     ` Michal Hocko
2020-07-14 13:40       ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200714123726.GI24642@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.