All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: akpm@linux-foundation.org, rientjes@google.com,
	linux-mm@kvack.org, oleg@redhat.com
Subject: Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
Date: Wed, 18 May 2016 16:15:45 +0200	[thread overview]
Message-ID: <20160518141545.GI21654@dhcp22.suse.cz> (raw)
In-Reply-To: <201605182230.IDC73435.MVSOHLFOQFOJtF@I-love.SAKURA.ne.jp>

On Wed 18-05-16 22:30:14, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 18-05-16 21:20:24, Tetsuo Handa wrote:
> > > Since commit 3a5dda7a17cf3706 ("oom: prevent unnecessary oom kills or
> > > kernel panics"), select_bad_process() is using for_each_process_thread().
> > > 
> > > Since oom_unkillable_task() scans all threads in the caller's thread group
> > > and oom_task_origin() scans signal_struct of the caller's thread group, we
> > > don't need to call oom_unkillable_task() and oom_task_origin() on each
> > > thread. Also, since !mm test will be done later at oom_badness(), we don't
> > > need to do !mm test on each thread. Therefore, we only need to do
> > > TIF_MEMDIE test on each thread.
> > > 
> > > If we track number of TIF_MEMDIE threads inside signal_struct, we don't
> > > need to do TIF_MEMDIE test on each thread. This will allow
> > > select_bad_process() to use for_each_process().
> > 
> > I am wondering whether signal_struct is the best way forward. The oom
> > killing is more about mm_struct than anything else. We can record that
> > the mm was oom killed in mm->flags (similar to MMF_OOM_REAPED). I guess
> > this would require more work at this stage so maybe starting with signal
> > struct is not that bad afterall. Just thinking...
> 
> Even if you call p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN case a bug,
> (p->flags & PF_KTHREAD) || is_global_init(p) case is still possible.

I couldn't care less about such a case to be honest, and that is not a
reason the cripple the code for such an insanity. There simply doesn't
make any sense to share init's mm with a different task.

> Thus, I think we can't mark the mm was oom killed in mm->flags.

[...]
> >From d770bd777e628e9d1ae250249433cf576aae8961 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Wed, 18 May 2016 22:17:47 +0900
> Subject: [PATCH v3] mm,oom: speed up select_bad_process() loop.
> 
> Since commit 3a5dda7a17cf3706 ("oom: prevent unnecessary oom kills or
> kernel panics"), select_bad_process() is using for_each_process_thread().
> 
> Since oom_unkillable_task() scans all threads in the caller's thread group
> and oom_task_origin() scans signal_struct of the caller's thread group, we
> don't need to call oom_unkillable_task() and oom_task_origin() on each
> thread. Also, since !mm test will be done later at oom_badness(), we don't
> need to do !mm test on each thread. Therefore, we only need to do
> TIF_MEMDIE test on each thread.
> 
> If we track number of TIF_MEMDIE threads inside signal_struct, we don't
> need to do TIF_MEMDIE test on each thread. This will allow
> select_bad_process() to use for_each_process().
> 
> This patch adds a counter to signal_struct for tracking how many
> TIF_MEMDIE threads are in a given thread group, and check it at
> oom_scan_process_thread() so that select_bad_process() can use
> for_each_process() rather than for_each_process_thread().

OK, this looks correct. Strictly speaking the patch is missing any note
on _why_ this is needed or an improvement. I would add something like
the following:
"
Although the original code was correct it was quite inefficient because
each thread group was scanned num_threads times which can be a lot
especially with processes with many threads. Even though the OOM is
extremely cold path it is always good to be as effective as possible
when we are inside rcu_read_lock() - aka unpreemptible context.
"

> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Oleg Nesterov <oleg@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/sched.h |  1 +
>  mm/oom_kill.c         | 14 ++++++--------
>  2 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 870a700..1589f8e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -794,6 +794,7 @@ struct signal_struct {
>  	struct tty_audit_buf *tty_audit_buf;
>  #endif
>  
> +	atomic_t oom_victims; /* # of TIF_MEDIE threads in this thread group */
>  	/*
>  	 * Thread is the potential origin of an oom condition; kill first on
>  	 * oom
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index c0e37dd..8e151d0 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -283,12 +283,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  	 * This task already has access to memory reserves and is being killed.
>  	 * Don't allow any other task to have access to the reserves.
>  	 */
> -	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
> -		if (!is_sysrq_oom(oc))
> -			return OOM_SCAN_ABORT;
> -	}
> -	if (!task->mm)
> -		return OOM_SCAN_CONTINUE;
> +	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims))
> +		return OOM_SCAN_ABORT;
>  
>  	/*
>  	 * If task is allocating a lot of memory and has been marked to be
> @@ -307,12 +303,12 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  static struct task_struct *select_bad_process(struct oom_control *oc,
>  		unsigned int *ppoints, unsigned long totalpages)
>  {
> -	struct task_struct *g, *p;
> +	struct task_struct *p;
>  	struct task_struct *chosen = NULL;
>  	unsigned long chosen_points = 0;
>  
>  	rcu_read_lock();
> -	for_each_process_thread(g, p) {
> +	for_each_process(p) {
>  		unsigned int points;
>  
>  		switch (oom_scan_process_thread(oc, p, totalpages)) {
> @@ -673,6 +669,7 @@ void mark_oom_victim(struct task_struct *tsk)
>  	/* OOM killer might race with memcg OOM */
>  	if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
>  		return;
> +	atomic_inc(&tsk->signal->oom_victims);
>  	/*
>  	 * Make sure that the task is woken up from uninterruptible sleep
>  	 * if it is frozen because OOM killer wouldn't be able to free
> @@ -690,6 +687,7 @@ void exit_oom_victim(struct task_struct *tsk)
>  {
>  	if (!test_and_clear_tsk_thread_flag(tsk, TIF_MEMDIE))
>  		return;
> +	atomic_dec(&tsk->signal->oom_victims);
>  
>  	if (!atomic_dec_return(&oom_victims))
>  		wake_up_all(&oom_victims_wait);
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-05-18 14:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-18 12:20 [PATCH v2] mm,oom: speed up select_bad_process() loop Tetsuo Handa
2016-05-18 12:51 ` Michal Hocko
2016-05-18 13:30   ` [PATCH v3] " Tetsuo Handa
2016-05-18 14:15     ` Michal Hocko [this message]
2016-05-18 21:09       ` Andrew Morton
2016-05-19  6:53         ` Michal Hocko
2016-05-19  7:17           ` Michal Hocko
2016-05-19  8:17             ` Michal Hocko
2016-05-20  1:50           ` Oleg Nesterov
2016-05-20  2:13             ` Oleg Nesterov
2016-05-20  6:42             ` Michal Hocko
2016-05-20 22:04               ` Oleg Nesterov
2016-05-18 14:44     ` Tetsuo Handa
2016-05-20  7:50     ` Michal Hocko
2016-05-20 11:51       ` Tetsuo Handa
2016-05-20 12:09         ` Michal Hocko
2016-05-20 13:41           ` Tetsuo Handa
2016-05-20 13:44             ` Tetsuo Handa
2016-05-20 15:23             ` Michal Hocko
2016-05-20 15:56               ` Tetsuo Handa
2016-05-23  7:55                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160518141545.GI21654@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.