[PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer
@ 2015-09-29 15:54 Oleg Nesterov
  2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Oleg Nesterov @ 2015-09-29 15:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
	Tetsuo Handa, linux-kernel

Just in case, this doesn't depend on the previous series I sent.

Tetsuo, iirc we already discussed the change in 1/2 some time ago,
could you review?

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] coredump: ensure all coredumping tasks have
  2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov
@ 2015-09-29 15:55 ` Oleg Nesterov
  2015-10-05 16:25   ` Michal Hocko
  2015-09-29 15:55 ` [PATCH 2/2] coredump: change zap_threads() and zap_process() to use Oleg Nesterov
  2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa
  2 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2015-09-29 15:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
	Tetsuo Handa, linux-kernel

task_will_free_mem() is wrong in many ways, and in particular the
SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate
in the coredumping without SIGNAL_GROUP_COREDUMP bit set.

change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even
if other CLONE_VM processes can't react to SIGKILL. Fortunately, at
least oom-kill case if fine; it kills all tasks sharing the same mm,
so it should also kill the process which actually dumps the core.

The change in prepare_signal() is not strictly necessary, it just
ensures that the patch does not bring another subtle behavioural
change. But it reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case
needs more changes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/coredump.c   | 12 ++++++------
 kernel/signal.c |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 53d7d46..4fed8d0 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -282,11 +282,13 @@ out:
 	return ispipe;
 }
 
-static int zap_process(struct task_struct *start, int exit_code)
+static int zap_process(struct task_struct *start, int exit_code, int flags)
 {
 	struct task_struct *t;
 	int nr = 0;
 
+	/* ignore all signals except SIGKILL, see prepare_signal() */
+	start->signal->flags = SIGNAL_GROUP_COREDUMP | flags;
 	start->signal->group_exit_code = exit_code;
 	start->signal->group_stop_count = 0;
 
@@ -313,10 +315,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
 	spin_lock_irq(&tsk->sighand->siglock);
 	if (!signal_group_exit(tsk->signal)) {
 		mm->core_state = core_state;
-		nr = zap_process(tsk, exit_code);
 		tsk->signal->group_exit_task = tsk;
-		/* ignore all signals except SIGKILL, see prepare_signal() */
-		tsk->signal->flags = SIGNAL_GROUP_COREDUMP;
+		nr = zap_process(tsk, exit_code, 0);
 		clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
 	}
 	spin_unlock_irq(&tsk->sighand->siglock);
@@ -367,8 +367,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
 			if (p->mm) {
 				if (unlikely(p->mm == mm)) {
 					lock_task_sighand(p, &flags);
-					nr += zap_process(p, exit_code);
-					p->signal->flags = SIGNAL_GROUP_EXIT;
+					nr += zap_process(p, exit_code,
+							  SIGNAL_GROUP_EXIT);
 					unlock_task_sighand(p, &flags);
 				}
 				break;
diff --git a/kernel/signal.c b/kernel/signal.c
index f2cbd4e..c0b01fe 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -788,7 +788,7 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 	sigset_t flush;
 
 	if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) {
-		if (signal->flags & SIGNAL_GROUP_COREDUMP)
+		if (!(signal->flags & SIGNAL_GROUP_EXIT))
 			return sig == SIGKILL;
 		/*
 		 * The process is in the middle of dying, nothing to do.
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] coredump: change zap_threads() and zap_process() to use
  2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov
  2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov
@ 2015-09-29 15:55 ` Oleg Nesterov
  2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa
  2 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2015-09-29 15:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
	Tetsuo Handa, linux-kernel

Change zap_threads() paths to use for_each_thread() rather than
while_each_thread().

While at it, change zap_threads() to avoid the nested if's to make
the code more readable and lessen the indentation.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/coredump.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 4fed8d0..b3c153c 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -292,15 +292,14 @@ static int zap_process(struct task_struct *start, int exit_code, int flags)
 	start->signal->group_exit_code = exit_code;
 	start->signal->group_stop_count = 0;
 
-	t = start;
-	do {
+	for_each_thread(start, t) {
 		task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
 		if (t != current && t->mm) {
 			sigaddset(&t->pending.signal, SIGKILL);
 			signal_wake_up(t, 1);
 			nr++;
 		}
-	} while_each_thread(start, t);
+	}
 
 	return nr;
 }
@@ -362,18 +361,18 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
 			continue;
 		if (g->flags & PF_KTHREAD)
 			continue;
-		p = g;
-		do {
-			if (p->mm) {
-				if (unlikely(p->mm == mm)) {
-					lock_task_sighand(p, &flags);
-					nr += zap_process(p, exit_code,
-							  SIGNAL_GROUP_EXIT);
-					unlock_task_sighand(p, &flags);
-				}
-				break;
+
+		for_each_thread(g, p) {
+			if (unlikely(!p->mm))
+				continue;
+			if (unlikely(p->mm == mm)) {
+				lock_task_sighand(p, &flags);
+				nr += zap_process(p, exit_code,
+							SIGNAL_GROUP_EXIT);
+				unlock_task_sighand(p, &flags);
 			}
-		} while_each_thread(g, p);
+			break;
+		}
 	}
 	rcu_read_unlock();
 done:
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer
  2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov
  2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov
  2015-09-29 15:55 ` [PATCH 2/2] coredump: change zap_threads() and zap_process() to use Oleg Nesterov
@ 2015-09-30 11:49 ` Tetsuo Handa
  2015-09-30 14:15   ` Oleg Nesterov
  2 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2015-09-30 11:49 UTC (permalink / raw)
  To: oleg, akpm; +Cc: rientjes, kwalker, mhocko, skozina, linux-kernel

Oleg Nesterov wrote:
> Just in case, this doesn't depend on the previous series I sent.
> 
> Tetsuo, iirc we already discussed the change in 1/2 some time ago,
> could you review?
> 
> Oleg.

I tested patch 1/2 and 2/2 on next-20150929 using reproducer at
http://lkml.kernel.org/r/201503150240.GII00591.OVSFtQLOFOHJMF@I-love.SAKURA.ne.jp .

  $ while :; do ./a.out; done

Unfortunately, since hangup on coredump to pipe occurs sometimes,
I can't tell whether this patchset solves hangup on coredump to pipe.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer
  2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa
@ 2015-09-30 14:15   ` Oleg Nesterov
  2015-09-30 16:12     ` Tetsuo Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2015-09-30 14:15 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, rientjes, kwalker, mhocko, skozina, linux-kernel

On 09/30, Tetsuo Handa wrote:
>
> Oleg Nesterov wrote:
> > Just in case, this doesn't depend on the previous series I sent.
> >
> > Tetsuo, iirc we already discussed the change in 1/2 some time ago,
> > could you review?
> >
> > Oleg.
>
> I tested patch 1/2 and 2/2 on next-20150929 using reproducer at
> http://lkml.kernel.org/r/201503150240.GII00591.OVSFtQLOFOHJMF@I-love.SAKURA.ne.jp .
>
>   $ while :; do ./a.out; done
>
> Unfortunately, since hangup on coredump to pipe occurs sometimes,
> I can't tell whether this patchset solves hangup on coredump to pipe.

Obviously it doesn't. There are a lot more problems here.

It is hardly possible to enumerate them, but let me quote the changelog
from d003f371b27016354c

    Note: this is only the first step, this patch doesn't try to solve other
    problems.  The SIGNAL_GROUP_COREDUMP check is obviously racy, a task can
    participate in coredump after it was already observed in PF_EXITING state,
    so TIF_MEMDIE (which also blocks oom-killer) still can be wrongly set.
    fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
    out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.
    And even the name/usage of the new helper is confusing, an exiting thread
    can only free its ->mm if it is the only/last task in thread group.

This patch just makes the SIGNAL_GROUP_COREDUMP check in task_will_free_mem()
a bit more correct wrt CLONE_VM tasks, nothing more.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer
  2015-09-30 14:15   ` Oleg Nesterov
@ 2015-09-30 16:12     ` Tetsuo Handa
  2015-09-30 16:40       ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2015-09-30 16:12 UTC (permalink / raw)
  To: oleg; +Cc: akpm, rientjes, kwalker, mhocko, skozina, linux-kernel

Oleg Nesterov wrote:
> This patch just makes the SIGNAL_GROUP_COREDUMP check in task_will_free_mem()
> a bit more correct wrt CLONE_VM tasks, nothing more.

OK. Then, that's out of what I can understand. But I wish for
some description to PATCH 2/2 about why to change from
"do { } while_each_thread()" to "for_each_thread() { }"
because they seem to traverse differently.



#define __for_each_thread(signal, t)    \
	list_for_each_entry_rcu(t, &(signal)->thread_head, thread_node)

#define for_each_thread(p, t)           \
	__for_each_thread((p)->signal, t)

static inline struct task_struct *next_thread(const struct task_struct *p)
{
	return list_entry_rcu(p->thread_group.next,
			      struct task_struct, thread_group);
}

#define while_each_thread(g, t) \
	while ((t = next_thread(t)) != g)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer
  2015-09-30 16:12     ` Tetsuo Handa
@ 2015-09-30 16:40       ` Oleg Nesterov
  0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2015-09-30 16:40 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, rientjes, kwalker, mhocko, skozina, linux-kernel

On 10/01, Tetsuo Handa wrote:
>
> Oleg Nesterov wrote:
> > This patch just makes the SIGNAL_GROUP_COREDUMP check in task_will_free_mem()
> > a bit more correct wrt CLONE_VM tasks, nothing more.
>
> OK. Then, that's out of what I can understand. But I wish for
> some description to PATCH 2/2 about why to change from
> "do { } while_each_thread()" to "for_each_thread() { }"

while_each_thread() is deprecated, see 0c740d0afc

> because they seem to traverse differently.

Not really. And in this particular case (start from group leader)
even the order is the same, although this doesn't matter. Well,
except for_each_thread(p, t) can find no threads, but this is fine
too; this means that they all (including the leader) have exited.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] coredump: ensure all coredumping tasks have
  2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov
@ 2015-10-05 16:25   ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2015-10-05 16:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Rientjes, Kyle Walker, Stanislav Kozina,
	Tetsuo Handa, linux-kernel

The subject is truncated. It is missing SIGNAL_GROUP_COREDUMP

On Tue 29-09-15 17:55:02, Oleg Nesterov wrote:
> task_will_free_mem() is wrong in many ways, and in particular the
> SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate
> in the coredumping without SIGNAL_GROUP_COREDUMP bit set.
> 
> change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even
> if other CLONE_VM processes can't react to SIGKILL. Fortunately, at
> least oom-kill case if fine; it kills all tasks sharing the same mm,
> so it should also kill the process which actually dumps the core.

Yes I do not think it will make too much difference for the oom killer
but it is much better to handle all the processes sharing the mm the
same way.

> 
> The change in prepare_signal() is not strictly necessary, it just
> ensures that the patch does not bring another subtle behavioural
> change. But it reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case
> needs more changes.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  fs/coredump.c   | 12 ++++++------
>  kernel/signal.c |  2 +-
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 53d7d46..4fed8d0 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -282,11 +282,13 @@ out:
>  	return ispipe;
>  }
>  
> -static int zap_process(struct task_struct *start, int exit_code)
> +static int zap_process(struct task_struct *start, int exit_code, int flags)
>  {
>  	struct task_struct *t;
>  	int nr = 0;
>  
> +	/* ignore all signals except SIGKILL, see prepare_signal() */
> +	start->signal->flags = SIGNAL_GROUP_COREDUMP | flags;
>  	start->signal->group_exit_code = exit_code;
>  	start->signal->group_stop_count = 0;
>  
> @@ -313,10 +315,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
>  	spin_lock_irq(&tsk->sighand->siglock);
>  	if (!signal_group_exit(tsk->signal)) {
>  		mm->core_state = core_state;
> -		nr = zap_process(tsk, exit_code);
>  		tsk->signal->group_exit_task = tsk;
> -		/* ignore all signals except SIGKILL, see prepare_signal() */
> -		tsk->signal->flags = SIGNAL_GROUP_COREDUMP;
> +		nr = zap_process(tsk, exit_code, 0);
>  		clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
>  	}
>  	spin_unlock_irq(&tsk->sighand->siglock);
> @@ -367,8 +367,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
>  			if (p->mm) {
>  				if (unlikely(p->mm == mm)) {
>  					lock_task_sighand(p, &flags);
> -					nr += zap_process(p, exit_code);
> -					p->signal->flags = SIGNAL_GROUP_EXIT;
> +					nr += zap_process(p, exit_code,
> +							  SIGNAL_GROUP_EXIT);
>  					unlock_task_sighand(p, &flags);
>  				}
>  				break;
> diff --git a/kernel/signal.c b/kernel/signal.c
> index f2cbd4e..c0b01fe 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -788,7 +788,7 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
>  	sigset_t flush;
>  
>  	if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) {
> -		if (signal->flags & SIGNAL_GROUP_COREDUMP)
> +		if (!(signal->flags & SIGNAL_GROUP_EXIT))
>  			return sig == SIGKILL;
>  		/*
>  		 * The process is in the middle of dying, nothing to do.
> -- 
> 2.4.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-10-05 16:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov
2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov
2015-10-05 16:25   ` Michal Hocko
2015-09-29 15:55 ` [PATCH 2/2] coredump: change zap_threads() and zap_process() to use Oleg Nesterov
2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa
2015-09-30 14:15   ` Oleg Nesterov
2015-09-30 16:12     ` Tetsuo Handa
2015-09-30 16:40       ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).