* [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer @ 2015-09-29 15:54 Oleg Nesterov 2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Oleg Nesterov @ 2015-09-29 15:54 UTC (permalink / raw) To: Andrew Morton Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina, Tetsuo Handa, linux-kernel Just in case, this doesn't depend on the previous series I sent. Tetsuo, iirc we already discussed the change in 1/2 some time ago, could you review? Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] coredump: ensure all coredumping tasks have 2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov @ 2015-09-29 15:55 ` Oleg Nesterov 2015-10-05 16:25 ` Michal Hocko 2015-09-29 15:55 ` [PATCH 2/2] coredump: change zap_threads() and zap_process() to use Oleg Nesterov 2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa 2 siblings, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2015-09-29 15:55 UTC (permalink / raw) To: Andrew Morton Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina, Tetsuo Handa, linux-kernel task_will_free_mem() is wrong in many ways, and in particular the SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the coredumping without SIGNAL_GROUP_COREDUMP bit set. change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if other CLONE_VM processes can't react to SIGKILL. Fortunately, at least oom-kill case if fine; it kills all tasks sharing the same mm, so it should also kill the process which actually dumps the core. The change in prepare_signal() is not strictly necessary, it just ensures that the patch does not bring another subtle behavioural change. But it reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/coredump.c | 12 ++++++------ kernel/signal.c | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 53d7d46..4fed8d0 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -282,11 +282,13 @@ out: return ispipe; } -static int zap_process(struct task_struct *start, int exit_code) +static int zap_process(struct task_struct *start, int exit_code, int flags) { struct task_struct *t; int nr = 0; + /* ignore all signals except SIGKILL, see prepare_signal() */ + start->signal->flags = SIGNAL_GROUP_COREDUMP | flags; start->signal->group_exit_code = exit_code; start->signal->group_stop_count = 0; @@ -313,10 +315,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm, spin_lock_irq(&tsk->sighand->siglock); if (!signal_group_exit(tsk->signal)) { mm->core_state = core_state; - nr = zap_process(tsk, exit_code); tsk->signal->group_exit_task = tsk; - /* ignore all signals except SIGKILL, see prepare_signal() */ - tsk->signal->flags = SIGNAL_GROUP_COREDUMP; + nr = zap_process(tsk, exit_code, 0); clear_tsk_thread_flag(tsk, TIF_SIGPENDING); } spin_unlock_irq(&tsk->sighand->siglock); @@ -367,8 +367,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm, if (p->mm) { if (unlikely(p->mm == mm)) { lock_task_sighand(p, &flags); - nr += zap_process(p, exit_code); - p->signal->flags = SIGNAL_GROUP_EXIT; + nr += zap_process(p, exit_code, + SIGNAL_GROUP_EXIT); unlock_task_sighand(p, &flags); } break; diff --git a/kernel/signal.c b/kernel/signal.c index f2cbd4e..c0b01fe 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -788,7 +788,7 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) sigset_t flush; if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) { - if (signal->flags & SIGNAL_GROUP_COREDUMP) + if (!(signal->flags & SIGNAL_GROUP_EXIT)) return sig == SIGKILL; /* * The process is in the middle of dying, nothing to do. -- 2.4.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] coredump: ensure all coredumping tasks have 2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov @ 2015-10-05 16:25 ` Michal Hocko 0 siblings, 0 replies; 8+ messages in thread From: Michal Hocko @ 2015-10-05 16:25 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, David Rientjes, Kyle Walker, Stanislav Kozina, Tetsuo Handa, linux-kernel The subject is truncated. It is missing SIGNAL_GROUP_COREDUMP On Tue 29-09-15 17:55:02, Oleg Nesterov wrote: > task_will_free_mem() is wrong in many ways, and in particular the > SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate > in the coredumping without SIGNAL_GROUP_COREDUMP bit set. > > change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even > if other CLONE_VM processes can't react to SIGKILL. Fortunately, at > least oom-kill case if fine; it kills all tasks sharing the same mm, > so it should also kill the process which actually dumps the core. Yes I do not think it will make too much difference for the oom killer but it is much better to handle all the processes sharing the mm the same way. > > The change in prepare_signal() is not strictly necessary, it just > ensures that the patch does not bring another subtle behavioural > change. But it reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case > needs more changes. > > Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> > --- > fs/coredump.c | 12 ++++++------ > kernel/signal.c | 2 +- > 2 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/fs/coredump.c b/fs/coredump.c > index 53d7d46..4fed8d0 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -282,11 +282,13 @@ out: > return ispipe; > } > > -static int zap_process(struct task_struct *start, int exit_code) > +static int zap_process(struct task_struct *start, int exit_code, int flags) > { > struct task_struct *t; > int nr = 0; > > + /* ignore all signals except SIGKILL, see prepare_signal() */ > + start->signal->flags = SIGNAL_GROUP_COREDUMP | flags; > start->signal->group_exit_code = exit_code; > start->signal->group_stop_count = 0; > > @@ -313,10 +315,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm, > spin_lock_irq(&tsk->sighand->siglock); > if (!signal_group_exit(tsk->signal)) { > mm->core_state = core_state; > - nr = zap_process(tsk, exit_code); > tsk->signal->group_exit_task = tsk; > - /* ignore all signals except SIGKILL, see prepare_signal() */ > - tsk->signal->flags = SIGNAL_GROUP_COREDUMP; > + nr = zap_process(tsk, exit_code, 0); > clear_tsk_thread_flag(tsk, TIF_SIGPENDING); > } > spin_unlock_irq(&tsk->sighand->siglock); > @@ -367,8 +367,8 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm, > if (p->mm) { > if (unlikely(p->mm == mm)) { > lock_task_sighand(p, &flags); > - nr += zap_process(p, exit_code); > - p->signal->flags = SIGNAL_GROUP_EXIT; > + nr += zap_process(p, exit_code, > + SIGNAL_GROUP_EXIT); > unlock_task_sighand(p, &flags); > } > break; > diff --git a/kernel/signal.c b/kernel/signal.c > index f2cbd4e..c0b01fe 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -788,7 +788,7 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) > sigset_t flush; > > if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) { > - if (signal->flags & SIGNAL_GROUP_COREDUMP) > + if (!(signal->flags & SIGNAL_GROUP_EXIT)) > return sig == SIGKILL; > /* > * The process is in the middle of dying, nothing to do. > -- > 2.4.3 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] coredump: change zap_threads() and zap_process() to use 2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov 2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov @ 2015-09-29 15:55 ` Oleg Nesterov 2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa 2 siblings, 0 replies; 8+ messages in thread From: Oleg Nesterov @ 2015-09-29 15:55 UTC (permalink / raw) To: Andrew Morton Cc: David Rientjes, Kyle Walker, Michal Hocko, Stanislav Kozina, Tetsuo Handa, linux-kernel Change zap_threads() paths to use for_each_thread() rather than while_each_thread(). While at it, change zap_threads() to avoid the nested if's to make the code more readable and lessen the indentation. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/coredump.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 4fed8d0..b3c153c 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -292,15 +292,14 @@ static int zap_process(struct task_struct *start, int exit_code, int flags) start->signal->group_exit_code = exit_code; start->signal->group_stop_count = 0; - t = start; - do { + for_each_thread(start, t) { task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); if (t != current && t->mm) { sigaddset(&t->pending.signal, SIGKILL); signal_wake_up(t, 1); nr++; } - } while_each_thread(start, t); + } return nr; } @@ -362,18 +361,18 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm, continue; if (g->flags & PF_KTHREAD) continue; - p = g; - do { - if (p->mm) { - if (unlikely(p->mm == mm)) { - lock_task_sighand(p, &flags); - nr += zap_process(p, exit_code, - SIGNAL_GROUP_EXIT); - unlock_task_sighand(p, &flags); - } - break; + + for_each_thread(g, p) { + if (unlikely(!p->mm)) + continue; + if (unlikely(p->mm == mm)) { + lock_task_sighand(p, &flags); + nr += zap_process(p, exit_code, + SIGNAL_GROUP_EXIT); + unlock_task_sighand(p, &flags); } - } while_each_thread(g, p); + break; + } } rcu_read_unlock(); done: -- 2.4.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer 2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov 2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov 2015-09-29 15:55 ` [PATCH 2/2] coredump: change zap_threads() and zap_process() to use Oleg Nesterov @ 2015-09-30 11:49 ` Tetsuo Handa 2015-09-30 14:15 ` Oleg Nesterov 2 siblings, 1 reply; 8+ messages in thread From: Tetsuo Handa @ 2015-09-30 11:49 UTC (permalink / raw) To: oleg, akpm; +Cc: rientjes, kwalker, mhocko, skozina, linux-kernel Oleg Nesterov wrote: > Just in case, this doesn't depend on the previous series I sent. > > Tetsuo, iirc we already discussed the change in 1/2 some time ago, > could you review? > > Oleg. I tested patch 1/2 and 2/2 on next-20150929 using reproducer at http://lkml.kernel.org/r/201503150240.GII00591.OVSFtQLOFOHJMF@I-love.SAKURA.ne.jp . $ while :; do ./a.out; done Unfortunately, since hangup on coredump to pipe occurs sometimes, I can't tell whether this patchset solves hangup on coredump to pipe. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer 2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa @ 2015-09-30 14:15 ` Oleg Nesterov 2015-09-30 16:12 ` Tetsuo Handa 0 siblings, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2015-09-30 14:15 UTC (permalink / raw) To: Tetsuo Handa; +Cc: akpm, rientjes, kwalker, mhocko, skozina, linux-kernel On 09/30, Tetsuo Handa wrote: > > Oleg Nesterov wrote: > > Just in case, this doesn't depend on the previous series I sent. > > > > Tetsuo, iirc we already discussed the change in 1/2 some time ago, > > could you review? > > > > Oleg. > > I tested patch 1/2 and 2/2 on next-20150929 using reproducer at > http://lkml.kernel.org/r/201503150240.GII00591.OVSFtQLOFOHJMF@I-love.SAKURA.ne.jp . > > $ while :; do ./a.out; done > > Unfortunately, since hangup on coredump to pipe occurs sometimes, > I can't tell whether this patchset solves hangup on coredump to pipe. Obviously it doesn't. There are a lot more problems here. It is hardly possible to enumerate them, but let me quote the changelog from d003f371b27016354c Note: this is only the first step, this patch doesn't try to solve other problems. The SIGNAL_GROUP_COREDUMP check is obviously racy, a task can participate in coredump after it was already observed in PF_EXITING state, so TIF_MEMDIE (which also blocks oom-killer) still can be wrongly set. fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it. And even the name/usage of the new helper is confusing, an exiting thread can only free its ->mm if it is the only/last task in thread group. This patch just makes the SIGNAL_GROUP_COREDUMP check in task_will_free_mem() a bit more correct wrt CLONE_VM tasks, nothing more. Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer 2015-09-30 14:15 ` Oleg Nesterov @ 2015-09-30 16:12 ` Tetsuo Handa 2015-09-30 16:40 ` Oleg Nesterov 0 siblings, 1 reply; 8+ messages in thread From: Tetsuo Handa @ 2015-09-30 16:12 UTC (permalink / raw) To: oleg; +Cc: akpm, rientjes, kwalker, mhocko, skozina, linux-kernel Oleg Nesterov wrote: > This patch just makes the SIGNAL_GROUP_COREDUMP check in task_will_free_mem() > a bit more correct wrt CLONE_VM tasks, nothing more. OK. Then, that's out of what I can understand. But I wish for some description to PATCH 2/2 about why to change from "do { } while_each_thread()" to "for_each_thread() { }" because they seem to traverse differently. #define __for_each_thread(signal, t) \ list_for_each_entry_rcu(t, &(signal)->thread_head, thread_node) #define for_each_thread(p, t) \ __for_each_thread((p)->signal, t) static inline struct task_struct *next_thread(const struct task_struct *p) { return list_entry_rcu(p->thread_group.next, struct task_struct, thread_group); } #define while_each_thread(g, t) \ while ((t = next_thread(t)) != g) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer 2015-09-30 16:12 ` Tetsuo Handa @ 2015-09-30 16:40 ` Oleg Nesterov 0 siblings, 0 replies; 8+ messages in thread From: Oleg Nesterov @ 2015-09-30 16:40 UTC (permalink / raw) To: Tetsuo Handa; +Cc: akpm, rientjes, kwalker, mhocko, skozina, linux-kernel On 10/01, Tetsuo Handa wrote: > > Oleg Nesterov wrote: > > This patch just makes the SIGNAL_GROUP_COREDUMP check in task_will_free_mem() > > a bit more correct wrt CLONE_VM tasks, nothing more. > > OK. Then, that's out of what I can understand. But I wish for > some description to PATCH 2/2 about why to change from > "do { } while_each_thread()" to "for_each_thread() { }" while_each_thread() is deprecated, see 0c740d0afc > because they seem to traverse differently. Not really. And in this particular case (start from group leader) even the order is the same, although this doesn't matter. Well, except for_each_thread(p, t) can find no threads, but this is fine too; this means that they all (including the leader) have exited. Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-10-05 16:25 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-09-29 15:54 [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Oleg Nesterov 2015-09-29 15:55 ` [PATCH 1/2] coredump: ensure all coredumping tasks have Oleg Nesterov 2015-10-05 16:25 ` Michal Hocko 2015-09-29 15:55 ` [PATCH 2/2] coredump: change zap_threads() and zap_process() to use Oleg Nesterov 2015-09-30 11:49 ` [PATCH 0/2] coredump: make SIGNAL_GROUP_COREDUMP more friendly to oom-killer Tetsuo Handa 2015-09-30 14:15 ` Oleg Nesterov 2015-09-30 16:12 ` Tetsuo Handa 2015-09-30 16:40 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).