All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, oleg@redhat.com, vdavydov@virtuozzo.com,
	rientjes@google.com
Subject: Re: [PATCH] mm,oom: use per signal_struct flag rather than clear TIF_MEMDIE
Date: Sat, 25 Jun 2016 00:54:09 +0900	[thread overview]
Message-ID: <201606250054.AIF67056.OOSLVtMOJFFFQH@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20160624123953.GC20203@dhcp22.suse.cz>

Michal Hocko wrote:
> On Fri 24-06-16 20:02:01, Tetsuo Handa wrote:
> > Currently, the OOM reaper calls exit_oom_victim() on remote TIF_MEMDIE
> > thread after an OOM reap attempt was made. This behavior is intended
> > for allowing oom_scan_process_thread() to select next OOM victim by
> > making atomic_read(&task->signal->oom_victims) == 0.
> > 
> > But since threads can be blocked for unbounded period at __mmput() from
> > mmput() from exit_mm() from do_exit(), we can't risk the OOM reaper
> > being blocked for unbounded period waiting for TIF_MEMDIE threads.
> > Therefore, when we hit a situation that a TIF_MEMDIE thread which is
> > the only thread of that thread group reached tsk->mm = NULL line in
> > exit_mm() from do_exit() before __oom_reap_task() finds a mm via
> > find_lock_task_mm(), oom_reap_task() does not wait for the TIF_MEMDIE
> > thread to return from __mmput() and instead calls exit_oom_victim().
> > 
> > Patch "mm, oom: hide mm which is shared with kthread or global init"
> > tried to avoid OOM livelock by setting MMF_OOM_REAPED, but it is racy
> > because setting MMF_OOM_REAPED will not help when find_lock_task_mm()
> > in oom_scan_process_thread() failed.
> 
> I haven't thought that through yet (I will wait for the monday fresh
> brain) but wouldn't the following be sufficient?
> ---
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4c21f744daa6..72360d7284a6 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -295,7 +295,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  			if (test_bit(MMF_OOM_REAPED, &p->mm->flags))
>  				ret = OOM_SCAN_CONTINUE;
>  			task_unlock(p);
> -		}
> +		} else if (task->state == EXIT_ZOMBIE)
> +			ret = OOM_SCAN_CONTINUE;

I think EXIT_ZOMBIE is too late, for it is exit_notify() stage from do_exit()
which sets EXIT_ZOMBIE state.

The stage my patch is trying to rescue is __mmput() from mmput() from
exit_mm() from do_exit(). It is something like doing

----------
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4c21f74..f1f892e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -289,11 +289,11 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 	 */
 	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims)) {
 		struct task_struct *p = find_lock_task_mm(task);
-		enum oom_scan_t ret = OOM_SCAN_ABORT;
+		enum oom_scan_t ret = OOM_SCAN_CONTINUE;
 
 		if (p) {
-			if (test_bit(MMF_OOM_REAPED, &p->mm->flags))
-				ret = OOM_SCAN_CONTINUE;
+			if (!test_bit(MMF_OOM_REAPED, &p->mm->flags))
+				ret = OOM_SCAN_ABORT;
 			task_unlock(p);
 		}
 
----------

which is effectively nearly equivalent with doing

----------
diff --git a/kernel/exit.c b/kernel/exit.c
index 84ae830..ea188a7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -509,9 +509,9 @@ static void exit_mm(struct task_struct *tsk)
 	enter_lazy_tlb(mm, current);
 	task_unlock(tsk);
 	mm_update_next_owner(mm);
-	mmput(mm);
 	if (test_thread_flag(TIF_MEMDIE))
 		exit_oom_victim(tsk);
+	mmput(mm);
 }
 
 static struct task_struct *find_alive_thread(struct task_struct *p)
----------

because we don't want to risk the OOM killer wait forever when __mmput() from
mmput() from exit_mm() from do_exit() is blocked at memory allocation.

If we had some timer or timeout, we can call

 	if (test_thread_flag(TIF_MEMDIE))
 		exit_oom_victim(tsk);

when __mmput() from mmput() from exit_mm() from do_exit() is blocked for
too long. But since we don't put !can_oom_reap TIF_MEMDIE thread under the
OOM reaper's supervision, we need to do something equivalent to calling
exit_oom_victim(tsk) early, at the cost of increasing possibility of
needlessly selecting next OOM victim.

>  
>  		return ret;
>  	}
> @@ -592,14 +593,7 @@ static void oom_reap_task(struct task_struct *tsk)
>  		debug_show_all_locks();
>  	}
>  
> -	/*
> -	 * Clear TIF_MEMDIE because the task shouldn't be sitting on a
> -	 * reasonably reclaimable memory anymore or it is not a good candidate
> -	 * for the oom victim right now because it cannot release its memory
> -	 * itself nor by the oom reaper.
> -	 */
>  	tsk->oom_reaper_list = NULL;
> -	exit_oom_victim(tsk);
>  
>  	/* Drop a reference taken by wake_oom_reaper */
>  	put_task_struct(tsk);
> -- 
> Michal Hocko
> SUSE Labs
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-06-24 15:54 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-24 11:02 [PATCH] mm,oom: use per signal_struct flag rather than clear TIF_MEMDIE Tetsuo Handa
2016-06-24 12:39 ` Michal Hocko
2016-06-24 15:54   ` Tetsuo Handa [this message]
2016-06-24 22:42     ` Oleg Nesterov
2016-06-24 21:56 ` Oleg Nesterov
2016-06-25  5:44   ` Tetsuo Handa
2016-06-27  9:23     ` Michal Hocko
2016-06-27 10:36       ` Michal Hocko
2016-06-27 15:51         ` Oleg Nesterov
2016-06-27 16:06           ` Michal Hocko
2016-06-27 17:55             ` Oleg Nesterov
2016-06-28 10:19               ` Michal Hocko
2016-06-29  0:13                 ` Oleg Nesterov
2016-06-29  8:33                   ` Michal Hocko
2016-06-29 14:19                     ` Michal Hocko
2016-07-01 10:15                       ` Tetsuo Handa
2016-06-29 20:01                     ` Oleg Nesterov
2016-06-30  7:59                       ` Michal Hocko
2016-06-30 10:51                         ` Tetsuo Handa
2016-06-30 11:21                           ` Michal Hocko
2016-07-03 13:32                           ` Oleg Nesterov
2016-07-03 13:21                         ` Oleg Nesterov
2016-07-07 11:51                           ` Michal Hocko
2016-07-07 16:42                             ` Oleg Nesterov
2016-06-29 20:14                 ` Oleg Nesterov
2016-06-30  8:07                   ` Michal Hocko
2016-07-03 13:24                     ` Oleg Nesterov
2016-06-27 21:09       ` Oleg Nesterov
2016-06-28 10:26         ` Michal Hocko
2016-06-29 19:34           ` Oleg Nesterov
2016-06-27 20:40     ` Oleg Nesterov
2016-06-28 10:29       ` Michal Hocko
2016-06-29 20:24         ` Oleg Nesterov
2016-06-30  8:16           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201606250054.AIF67056.OOSLVtMOJFFFQH@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    --cc=vdavydov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.