* [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer
@ 2015-09-29 15:54 Oleg Nesterov
2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Oleg Nesterov @ 2015-09-29 15:54 UTC (permalink / raw)
To: Andrew Morton
Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
Tetsuo Handa, linux-kernel
Just in case, this doesn't depend on the previous series I sent.
Tetsuo, iirc we already discussed the change in 1/2 some time ago,
could you review?
Oleg.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] coredump: ensure all coredumping tasks have
2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov
@ 2015-09-29 15:55 ` Oleg Nesterov
2015-10-05 16:25 ` Michal Hocko
2015-09-29 15:55 ` [PATCH 2/2] coredump: change zap_threads() and zap_process() to use Oleg Nesterov
2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa
2 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2015-09-29 15:55 UTC (permalink / raw)
To: Andrew Morton
Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
Tetsuo Handa, linux-kernel
task_will_free_mem() is wrong in many ways, and in particular the
SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate
in the coredumping without SIGNAL_GROUP_COREDUMP bit set.
change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even
if other CLONE_VM processes can't react to SIGKILL. Fortunately, at
least oom-kill case if fine; it kills all tasks sharing the same mm,
so it should also kill the process which actually dumps the core.
The change in prepare_signal() is not strictly necessary, it just
ensures that the patch does not bring another subtle behavioural
change. But it reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case
needs more changes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/coredump.c | 12 ++++++------
kernel/signal.c | 2 +-
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c
index 53d7d46..4fed8d0 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -282,11 +282,13 @@ out:
return ispipe;
}
-static int zap_process(struct task_struct *start, int exit_code)
+static int zap_process(struct task_struct *start, int exit_code, int flags)
{
struct task_struct *t;
int nr = 0;
+ /* ignore all signals except SIGKILL, see prepare_signal() */
+ start->signal->flags = SIGNAL_GROUP_COREDUMP | flags;
start->signal->group_exit_code = exit_code;
start->signal->group_stop_count = 0;
@@ -313,10 +315,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
spin_lock_irq(&tsk->sighand->siglock);
if (!signal_group_exit(tsk->signal)) {
mm->core_state = core_state;
- nr = zap_process(tsk, exit_code);
tsk->signal->group_exit_task = tsk;
- /* ignore all signals except SIGKILL, see prepare_signal() */
- tsk->signal->flags = SIGNAL_GROUP_COREDUMP;
+ nr = zap_process(tsk, exit_code, 0);
clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
}
spin_unlock_irq(&tsk->sighand->siglock);
@@ -367,8 +367,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
if (p->mm) {
if (unlikely(p->mm == mm)) {
lock_task_sighand(p, &flags);
- nr += zap_process(p, exit_code);
- p->signal->flags = SIGNAL_GROUP_EXIT;
+ nr += zap_process(p, exit_code,
+ SIGNAL_GROUP_EXIT);
unlock_task_sighand(p, &flags);
}
break;
diff --git a/kernel/signal.c b/kernel/signal.c
index f2cbd4e..c0b01fe 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -788,7 +788,7 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
sigset_t flush;
if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) {
- if (signal->flags & SIGNAL_GROUP_COREDUMP)
+ if (!(signal->flags & SIGNAL_GROUP_EXIT))
return sig == SIGKILL;
/*
* The process is in the middle of dying, nothing to do.
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] coredump: change zap_threads() and zap_process() to use
2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov
2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov
@ 2015-09-29 15:55 ` Oleg Nesterov
2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa
2 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2015-09-29 15:55 UTC (permalink / raw)
To: Andrew Morton
Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina,
Tetsuo Handa, linux-kernel
Change zap_threads() paths to use for_each_thread() rather than
while_each_thread().
While at it, change zap_threads() to avoid the nested if's to make
the code more readable and lessen the indentation.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/coredump.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c
index 4fed8d0..b3c153c 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -292,15 +292,14 @@ static int zap_process(struct task_struct *start, int exit_code, int flags)
start->signal->group_exit_code = exit_code;
start->signal->group_stop_count = 0;
- t = start;
- do {
+ for_each_thread(start, t) {
task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
if (t != current && t->mm) {
sigaddset(&t->pending.signal, SIGKILL);
signal_wake_up(t, 1);
nr++;
}
- } while_each_thread(start, t);
+ }
return nr;
}
@@ -362,18 +361,18 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
continue;
if (g->flags & PF_KTHREAD)
continue;
- p = g;
- do {
- if (p->mm) {
- if (unlikely(p->mm == mm)) {
- lock_task_sighand(p, &flags);
- nr += zap_process(p, exit_code,
- SIGNAL_GROUP_EXIT);
- unlock_task_sighand(p, &flags);
- }
- break;
+
+ for_each_thread(g, p) {
+ if (unlikely(!p->mm))
+ continue;
+ if (unlikely(p->mm == mm)) {
+ lock_task_sighand(p, &flags);
+ nr += zap_process(p, exit_code,
+ SIGNAL_GROUP_EXIT);
+ unlock_task_sighand(p, &flags);
}
- } while_each_thread(g, p);
+ break;
+ }
}
rcu_read_unlock();
done:
--
2.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer
2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov
2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov
2015-09-29 15:55 ` [PATCH 2/2] coredump: change zap_threads() and zap_process() to use Oleg Nesterov
@ 2015-09-30 11:49 ` Tetsuo Handa
2015-09-30 14:15 ` Oleg Nesterov
2 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2015-09-30 11:49 UTC (permalink / raw)
To: oleg, akpm; +Cc: rientjes, kwalker, mhocko, skozina, linux-kernel
Oleg Nesterov wrote:
> Just in case, this doesn't depend on the previous series I sent.
>
> Tetsuo, iirc we already discussed the change in 1/2 some time ago,
> could you review?
>
> Oleg.
I tested patch 1/2 and 2/2 on next-20150929 using reproducer at
http://lkml.kernel.org/r/201503150240.GII00591.OVSFtQLOFOHJMF@I-love.SAKURA.ne.jp .
$ while :; do ./a.out; done
Unfortunately, since hangup on coredump to pipe occurs sometimes,
I can't tell whether this patchset solves hangup on coredump to pipe.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer
2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa
@ 2015-09-30 14:15 ` Oleg Nesterov
2015-09-30 16:12 ` Tetsuo Handa
0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2015-09-30 14:15 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: akpm, rientjes, kwalker, mhocko, skozina, linux-kernel
On 09/30, Tetsuo Handa wrote:
>
> Oleg Nesterov wrote:
> > Just in case, this doesn't depend on the previous series I sent.
> >
> > Tetsuo, iirc we already discussed the change in 1/2 some time ago,
> > could you review?
> >
> > Oleg.
>
> I tested patch 1/2 and 2/2 on next-20150929 using reproducer at
> http://lkml.kernel.org/r/201503150240.GII00591.OVSFtQLOFOHJMF@I-love.SAKURA.ne.jp .
>
> $ while :; do ./a.out; done
>
> Unfortunately, since hangup on coredump to pipe occurs sometimes,
> I can't tell whether this patchset solves hangup on coredump to pipe.
Obviously it doesn't. There are a lot more problems here.
It is hardly possible to enumerate them, but let me quote the changelog
from d003f371b27016354c
Note: this is only the first step, this patch doesn't try to solve other
problems. The SIGNAL_GROUP_COREDUMP check is obviously racy, a task can
participate in coredump after it was already observed in PF_EXITING state,
so TIF_MEMDIE (which also blocks oom-killer) still can be wrongly set.
fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.
And even the name/usage of the new helper is confusing, an exiting thread
can only free its ->mm if it is the only/last task in thread group.
This patch just makes the SIGNAL_GROUP_COREDUMP check in task_will_free_mem()
a bit more correct wrt CLONE_VM tasks, nothing more.
Oleg.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer
2015-09-30 14:15 ` Oleg Nesterov
@ 2015-09-30 16:12 ` Tetsuo Handa
2015-09-30 16:40 ` Oleg Nesterov
0 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2015-09-30 16:12 UTC (permalink / raw)
To: oleg; +Cc: akpm, rientjes, kwalker, mhocko, skozina, linux-kernel
Oleg Nesterov wrote:
> This patch just makes the SIGNAL_GROUP_COREDUMP check in task_will_free_mem()
> a bit more correct wrt CLONE_VM tasks, nothing more.
OK. Then, that's out of what I can understand. But I wish for
some description to PATCH 2/2 about why to change from
"do { } while_each_thread()" to "for_each_thread() { }"
because they seem to traverse differently.
#define __for_each_thread(signal, t) \
list_for_each_entry_rcu(t, &(signal)->thread_head, thread_node)
#define for_each_thread(p, t) \
__for_each_thread((p)->signal, t)
static inline struct task_struct *next_thread(const struct task_struct *p)
{
return list_entry_rcu(p->thread_group.next,
struct task_struct, thread_group);
}
#define while_each_thread(g, t) \
while ((t = next_thread(t)) != g)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer
2015-09-30 16:12 ` Tetsuo Handa
@ 2015-09-30 16:40 ` Oleg Nesterov
0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2015-09-30 16:40 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: akpm, rientjes, kwalker, mhocko, skozina, linux-kernel
On 10/01, Tetsuo Handa wrote:
>
> Oleg Nesterov wrote:
> > This patch just makes the SIGNAL_GROUP_COREDUMP check in task_will_free_mem()
> > a bit more correct wrt CLONE_VM tasks, nothing more.
>
> OK. Then, that's out of what I can understand. But I wish for
> some description to PATCH 2/2 about why to change from
> "do { } while_each_thread()" to "for_each_thread() { }"
while_each_thread() is deprecated, see 0c740d0afc
> because they seem to traverse differently.
Not really. And in this particular case (start from group leader)
even the order is the same, although this doesn't matter. Well,
except for_each_thread(p, t) can find no threads, but this is fine
too; this means that they all (including the leader) have exited.
Oleg.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] coredump: ensure all coredumping tasks have
2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov
@ 2015-10-05 16:25 ` Michal Hocko
0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2015-10-05 16:25 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, David Rientjes, Kyle Walker, Stanislav Kozina,
Tetsuo Handa, linux-kernel
The subject is truncated. It is missing SIGNAL_GROUP_COREDUMP
On Tue 29-09-15 17:55:02, Oleg Nesterov wrote:
> task_will_free_mem() is wrong in many ways, and in particular the
> SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate
> in the coredumping without SIGNAL_GROUP_COREDUMP bit set.
>
> change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even
> if other CLONE_VM processes can't react to SIGKILL. Fortunately, at
> least oom-kill case if fine; it kills all tasks sharing the same mm,
> so it should also kill the process which actually dumps the core.
Yes I do not think it will make too much difference for the oom killer
but it is much better to handle all the processes sharing the mm the
same way.
>
> The change in prepare_signal() is not strictly necessary, it just
> ensures that the patch does not bring another subtle behavioural
> change. But it reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case
> needs more changes.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> fs/coredump.c | 12 ++++++------
> kernel/signal.c | 2 +-
> 2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 53d7d46..4fed8d0 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -282,11 +282,13 @@ out:
> return ispipe;
> }
>
> -static int zap_process(struct task_struct *start, int exit_code)
> +static int zap_process(struct task_struct *start, int exit_code, int flags)
> {
> struct task_struct *t;
> int nr = 0;
>
> + /* ignore all signals except SIGKILL, see prepare_signal() */
> + start->signal->flags = SIGNAL_GROUP_COREDUMP | flags;
> start->signal->group_exit_code = exit_code;
> start->signal->group_stop_count = 0;
>
> @@ -313,10 +315,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
> spin_lock_irq(&tsk->sighand->siglock);
> if (!signal_group_exit(tsk->signal)) {
> mm->core_state = core_state;
> - nr = zap_process(tsk, exit_code);
> tsk->signal->group_exit_task = tsk;
> - /* ignore all signals except SIGKILL, see prepare_signal() */
> - tsk->signal->flags = SIGNAL_GROUP_COREDUMP;
> + nr = zap_process(tsk, exit_code, 0);
> clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
> }
> spin_unlock_irq(&tsk->sighand->siglock);
> @@ -367,8 +367,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
> if (p->mm) {
> if (unlikely(p->mm == mm)) {
> lock_task_sighand(p, &flags);
> - nr += zap_process(p, exit_code);
> - p->signal->flags = SIGNAL_GROUP_EXIT;
> + nr += zap_process(p, exit_code,
> + SIGNAL_GROUP_EXIT);
> unlock_task_sighand(p, &flags);
> }
> break;
> diff --git a/kernel/signal.c b/kernel/signal.c
> index f2cbd4e..c0b01fe 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -788,7 +788,7 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
> sigset_t flush;
>
> if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) {
> - if (signal->flags & SIGNAL_GROUP_COREDUMP)
> + if (!(signal->flags & SIGNAL_GROUP_EXIT))
> return sig == SIGKILL;
> /*
> * The process is in the middle of dying, nothing to do.
> --
> 2.4.3
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-10-05 16:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov
2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov
2015-10-05 16:25 ` Michal Hocko
2015-09-29 15:55 ` [PATCH 2/2] coredump: change zap_threads() and zap_process() to use Oleg Nesterov
2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa
2015-09-30 14:15 ` Oleg Nesterov
2015-09-30 16:12 ` Tetsuo Handa
2015-09-30 16:40 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).