All of lore.kernel.org
 help / color / mirror / Atom feed
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: kosaki.motohiro@jp.fujitsu.com,
	Roland McGrath <roland@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH] oom: Make coredump interruptible
Date: Fri,  4 Jun 2010 19:54:43 +0900 (JST)	[thread overview]
Message-ID: <20100604194635.72D3.A69D9226@jp.fujitsu.com> (raw)
In-Reply-To: <20100602203827.GA29244@redhat.com>

> On 06/02, Roland McGrath wrote:
> >
> > > when select_bad_process() finds the task P to kill it can participate
> > > in the core dump (sleep in exit_mm), but we should somehow inform the
> > > thread which actually dumps the core: P->mm->core_state->dumper.
> >
> > Perhaps it should simply do that: if you would choose P to oom-kill, and
> > P->mm->core_state!=NULL, then choose P->mm->core_state->dumper instead.
> 
> ... to set TIF_MEMDIE which should be checked in elf_core_dump().
> 
> Probably yes.

Yep, probably. but can you please allow me additonal explanation?

In multi threaded OOM case, we have two problematic routine, coredump
and vmscan. Roland's idea can only solve the former. 

But I also interest vmscan quickly exit if OOM received. if other threads
get stuck in vmscan for freeing addional pages (this is very typical because
usually every thread call any syscall and eventually call kmalloc etc), 
recovering oom become very slow even if this doesn't makes deadlock.

Unfortunatelly, vmscan need much refactoring before appling this idea.
then, I didn't include such fixes.

I mean I hope to implement per-process OOM flag even if coredump don't
really need it.

So, I created MMF_OOM patch today. It is just for discussion, still.

From f099e1ba6e7b5654b35b468c13e1ae9e4f182ea4 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 4 Jun 2010 18:56:56 +0900
Subject: [RFC][PATCH v2] oom: make coredump interruptible

If oom victim process is under core dumping, sending SIGKILL cause
no-op. Unfortunately, coredump need relatively much memory. It mean
OOM vs coredump can makes deadlock.

Then, coredump logic should check the task has received SIGKILL
from OOM.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/binfmt_elf.c       |    4 ++++
 include/linux/sched.h |    1 +
 mm/oom_kill.c         |    3 ++-
 3 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 535e763..2aca748 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2038,6 +2038,10 @@ static int elf_core_dump(struct coredump_params *cprm)
 				page_cache_release(page);
 			} else
 				stop = !dump_seek(cprm->file, PAGE_SIZE);
+
+			/* The task need to exit ASAP if received OOM. */
+			if (test_bit(MMF_OOM_KILLED, &current->mm->flags))
+				stop = 1;
 			if (stop)
 				goto end_coredump;
 		}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8485aa2..53b7caa 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -436,6 +436,7 @@ extern int get_dumpable(struct mm_struct *mm);
 #endif
 					/* leave room for more dump flags */
 #define MMF_VM_MERGEABLE	16	/* KSM may merge identical pages */
+#define MMF_OOM_KILLED		17	/* Killed by OOM */
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK)
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 2678a04..29850c4 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -401,7 +401,6 @@ static int __oom_kill_process(struct task_struct *p, struct mem_cgroup *mem,
 		       K(p->mm->total_vm),
 		       K(get_mm_counter(p->mm, MM_ANONPAGES)),
 		       K(get_mm_counter(p->mm, MM_FILEPAGES)));
-	task_unlock(p);
 
 	/*
 	 * We give our sacrificial lamb high priority and access to
@@ -410,6 +409,8 @@ static int __oom_kill_process(struct task_struct *p, struct mem_cgroup *mem,
 	 */
 	p->rt.time_slice = HZ;
 	set_tsk_thread_flag(p, TIF_MEMDIE);
+	set_bit(MMF_OOM_KILLED, &p->mm->flags);
+	task_unlock(p);
 
 	force_sig(SIGKILL, p);
 
-- 
1.6.5.2




WARNING: multiple messages have this Message-ID (diff)
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: kosaki.motohiro@jp.fujitsu.com,
	Roland McGrath <roland@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH] oom: Make coredump interruptible
Date: Fri,  4 Jun 2010 19:54:43 +0900 (JST)	[thread overview]
Message-ID: <20100604194635.72D3.A69D9226@jp.fujitsu.com> (raw)
In-Reply-To: <20100602203827.GA29244@redhat.com>

> On 06/02, Roland McGrath wrote:
> >
> > > when select_bad_process() finds the task P to kill it can participate
> > > in the core dump (sleep in exit_mm), but we should somehow inform the
> > > thread which actually dumps the core: P->mm->core_state->dumper.
> >
> > Perhaps it should simply do that: if you would choose P to oom-kill, and
> > P->mm->core_state!=NULL, then choose P->mm->core_state->dumper instead.
> 
> ... to set TIF_MEMDIE which should be checked in elf_core_dump().
> 
> Probably yes.

Yep, probably. but can you please allow me additonal explanation?

In multi threaded OOM case, we have two problematic routine, coredump
and vmscan. Roland's idea can only solve the former. 

But I also interest vmscan quickly exit if OOM received. if other threads
get stuck in vmscan for freeing addional pages (this is very typical because
usually every thread call any syscall and eventually call kmalloc etc), 
recovering oom become very slow even if this doesn't makes deadlock.

Unfortunatelly, vmscan need much refactoring before appling this idea.
then, I didn't include such fixes.

I mean I hope to implement per-process OOM flag even if coredump don't
really need it.

So, I created MMF_OOM patch today. It is just for discussion, still.

From f099e1ba6e7b5654b35b468c13e1ae9e4f182ea4 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 4 Jun 2010 18:56:56 +0900
Subject: [RFC][PATCH v2] oom: make coredump interruptible

If oom victim process is under core dumping, sending SIGKILL cause
no-op. Unfortunately, coredump need relatively much memory. It mean
OOM vs coredump can makes deadlock.

Then, coredump logic should check the task has received SIGKILL
from OOM.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/binfmt_elf.c       |    4 ++++
 include/linux/sched.h |    1 +
 mm/oom_kill.c         |    3 ++-
 3 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 535e763..2aca748 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2038,6 +2038,10 @@ static int elf_core_dump(struct coredump_params *cprm)
 				page_cache_release(page);
 			} else
 				stop = !dump_seek(cprm->file, PAGE_SIZE);
+
+			/* The task need to exit ASAP if received OOM. */
+			if (test_bit(MMF_OOM_KILLED, &current->mm->flags))
+				stop = 1;
 			if (stop)
 				goto end_coredump;
 		}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8485aa2..53b7caa 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -436,6 +436,7 @@ extern int get_dumpable(struct mm_struct *mm);
 #endif
 					/* leave room for more dump flags */
 #define MMF_VM_MERGEABLE	16	/* KSM may merge identical pages */
+#define MMF_OOM_KILLED		17	/* Killed by OOM */
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK)
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 2678a04..29850c4 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -401,7 +401,6 @@ static int __oom_kill_process(struct task_struct *p, struct mem_cgroup *mem,
 		       K(p->mm->total_vm),
 		       K(get_mm_counter(p->mm, MM_ANONPAGES)),
 		       K(get_mm_counter(p->mm, MM_FILEPAGES)));
-	task_unlock(p);
 
 	/*
 	 * We give our sacrificial lamb high priority and access to
@@ -410,6 +409,8 @@ static int __oom_kill_process(struct task_struct *p, struct mem_cgroup *mem,
 	 */
 	p->rt.time_slice = HZ;
 	set_tsk_thread_flag(p, TIF_MEMDIE);
+	set_bit(MMF_OOM_KILLED, &p->mm->flags);
+	task_unlock(p);
 
 	force_sig(SIGKILL, p);
 
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-06-04 10:55 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-31  9:33 [PATCH 1/5] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads KOSAKI Motohiro
2010-05-31  9:33 ` KOSAKI Motohiro
2010-05-31  9:35 ` [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account KOSAKI Motohiro
2010-05-31  9:35   ` KOSAKI Motohiro
2010-05-31 16:43   ` Oleg Nesterov
2010-05-31 16:43     ` Oleg Nesterov
2010-06-01  1:10     ` KOSAKI Motohiro
2010-06-01  1:10       ` KOSAKI Motohiro
2010-06-01 20:18       ` Oleg Nesterov
2010-06-01 20:18         ` Oleg Nesterov
2010-06-02 13:54         ` [PATCH] oom: remove PF_EXITING check completely KOSAKI Motohiro
2010-06-02 13:54           ` KOSAKI Motohiro
2010-06-02 15:54           ` Oleg Nesterov
2010-06-02 15:54             ` Oleg Nesterov
2010-06-02 21:02             ` David Rientjes
2010-06-02 21:02               ` David Rientjes
2010-06-03  4:48               ` KOSAKI Motohiro
2010-06-03  4:48                 ` KOSAKI Motohiro
2010-06-03  6:29                 ` David Rientjes
2010-06-03  6:29                   ` David Rientjes
2010-06-02 13:54         ` [PATCH] oom: Make coredump interruptible KOSAKI Motohiro
2010-06-02 13:54           ` KOSAKI Motohiro
2010-06-02 15:42           ` Oleg Nesterov
2010-06-02 15:42             ` Oleg Nesterov
2010-06-02 17:29             ` Roland McGrath
2010-06-02 17:29               ` Roland McGrath
2010-06-02 17:53               ` Oleg Nesterov
2010-06-02 17:53                 ` Oleg Nesterov
2010-06-02 18:58                 ` Roland McGrath
2010-06-02 18:58                   ` Roland McGrath
2010-06-02 20:38                   ` Oleg Nesterov
2010-06-02 20:38                     ` Oleg Nesterov
2010-06-03 14:03                     ` Oleg Nesterov
2010-06-03 14:03                       ` Oleg Nesterov
2010-06-04 10:54                     ` KOSAKI Motohiro [this message]
2010-06-04 10:54                       ` KOSAKI Motohiro
2010-06-04 11:27                       ` Oleg Nesterov
2010-06-04 11:27                         ` Oleg Nesterov
2010-06-04 11:34                         ` Oleg Nesterov
2010-06-04 11:34                           ` Oleg Nesterov
2010-06-09 19:53                         ` Oleg Nesterov
2010-06-09 19:53                           ` Oleg Nesterov
2010-06-09 20:41                           ` David Rientjes
2010-06-09 20:41                             ` David Rientjes
2010-06-09 21:03                             ` Oleg Nesterov
2010-06-09 21:03                               ` Oleg Nesterov
2010-06-13 11:24                           ` KOSAKI Motohiro
2010-06-13 11:24                             ` KOSAKI Motohiro
2010-06-13 15:53                             ` Oleg Nesterov
2010-06-13 15:53                               ` Oleg Nesterov
2010-06-13 17:13                               ` uninterruptible CLONE_VFORK (Was: oom: Make coredump interruptible) Oleg Nesterov
2010-06-13 17:13                                 ` Oleg Nesterov
2010-06-14  0:56                                 ` Roland McGrath
2010-06-14  0:56                                   ` Roland McGrath
2010-06-14 16:33                                   ` Oleg Nesterov
2010-06-14 16:33                                     ` Oleg Nesterov
2010-06-14 19:17                                     ` Roland McGrath
2010-06-14 19:17                                       ` Roland McGrath
2010-06-28 17:33                                       ` Oleg Nesterov
2010-06-28 17:33                                         ` Oleg Nesterov
2010-06-28 18:04                                         ` Roland McGrath
2010-06-28 18:04                                           ` Roland McGrath
2010-06-14  0:36                               ` [PATCH] oom: Make coredump interruptible Roland McGrath
2010-06-14  0:36                                 ` Roland McGrath
2010-06-14  0:26                     ` Roland McGrath
2010-06-14  0:26                       ` Roland McGrath
2010-06-01 20:39   ` [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account David Rientjes
2010-06-01 20:39     ` David Rientjes
2010-05-31  9:36 ` [PATCH 3/5] oom: introduce find_lock_task_mm() to fix !mm false positives KOSAKI Motohiro
2010-05-31  9:36   ` KOSAKI Motohiro
2010-06-01  0:57   ` KAMEZAWA Hiroyuki
2010-06-01  0:57     ` KAMEZAWA Hiroyuki
2010-06-01 20:42   ` David Rientjes
2010-06-01 20:42     ` David Rientjes
2010-06-02 16:05   ` Minchan Kim
2010-06-02 16:05     ` Minchan Kim
2010-05-31  9:37 ` [PATCH 4/5] oom: the points calculation of child processes must use find_lock_task_mm() too KOSAKI Motohiro
2010-05-31  9:37   ` KOSAKI Motohiro
2010-05-31 16:56   ` Oleg Nesterov
2010-05-31 16:56     ` Oleg Nesterov
2010-05-31 23:48     ` KOSAKI Motohiro
2010-05-31 23:48       ` KOSAKI Motohiro
2010-05-31  9:38 ` [PATCH 5/5] oom: __oom_kill_task() " KOSAKI Motohiro
2010-05-31  9:38   ` KOSAKI Motohiro
2010-06-01  1:02   ` KAMEZAWA Hiroyuki
2010-06-01  1:02     ` KAMEZAWA Hiroyuki
2010-06-01 20:44   ` David Rientjes
2010-06-01 20:44     ` David Rientjes
2010-06-01  0:54 ` [PATCH 1/5] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads KAMEZAWA Hiroyuki
2010-06-01  0:54   ` KAMEZAWA Hiroyuki
2010-06-01 20:36 ` David Rientjes
2010-06-01 20:36   ` David Rientjes
2010-06-01 21:20   ` Oleg Nesterov
2010-06-01 21:20     ` Oleg Nesterov
2010-06-01 21:26     ` David Rientjes
2010-06-01 21:26       ` David Rientjes
2010-06-02 13:54       ` KOSAKI Motohiro
2010-06-02 13:54         ` KOSAKI Motohiro
2010-06-02 21:09         ` David Rientjes
2010-06-02 21:09           ` David Rientjes
2010-06-02 21:33           ` Oleg Nesterov
2010-06-02 21:33             ` Oleg Nesterov
2010-06-02 21:46             ` David Rientjes
2010-06-02 21:46               ` David Rientjes
2010-06-03 14:27               ` Oleg Nesterov
2010-06-03 14:27                 ` Oleg Nesterov
2010-06-03 20:11                 ` David Rientjes
2010-06-03 20:11                   ` David Rientjes
2010-06-02 15:32 ` Minchan Kim
2010-06-02 15:32   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100604194635.72D3.A69D9226@jp.fujitsu.com \
    --to=kosaki.motohiro@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    --cc=roland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.