* [PATCH 1/2] x86/dumpstack: on oops do not rewind stack for kthread
@ 2016-09-21 15:43 Roman Pen
2016-09-21 15:43 ` [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead Roman Pen
2016-10-20 23:07 ` [PATCH 1/2] x86/dumpstack: on oops do not rewind stack for kthread Andy Lutomirski
0 siblings, 2 replies; 9+ messages in thread
From: Roman Pen @ 2016-09-21 15:43 UTC (permalink / raw)
Cc: Roman Pen, Andy Lutomirski, Josh Poimboeuf, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H . Peter Anvin, Peter Zijlstra,
Thomas Gleixner, Ingo Molnar, Tejun Heo, x86, linux-kernel
kthread uses stack and keeps completion structure on it to be woken up
on vfork_done completion.
In commit 2deb4be28 Andy Lutomirski rewinds the stack unconditionally
and further completion of task->vfork_done for any kthread leads to stack
corruption (or infinite spin on attempt to spin lock on garbage memory).
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
arch/x86/kernel/dumpstack.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index e0648f7..74be764 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -250,9 +250,14 @@ void oops_end(unsigned long flags, struct pt_regs *regs, int signr)
/*
* We're not going to return, but we might be on an IST stack or
* have very little stack space left. Rewind the stack and kill
- * the task.
+ * the task. But kthread is a special case, since kthread uses
+ * stack to keep completion structure to be woken on vfork_done
+ * completion.
*/
- rewind_stack_do_exit(signr);
+ if (current->flags & PF_KTHREAD)
+ do_exit(signr);
+ else
+ rewind_stack_do_exit(signr);
}
NOKPROBE_SYMBOL(oops_end);
--
2.9.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead
2016-09-21 15:43 [PATCH 1/2] x86/dumpstack: on oops do not rewind stack for kthread Roman Pen
@ 2016-09-21 15:43 ` Roman Pen
2016-10-20 23:08 ` Andy Lutomirski
2016-10-21 5:39 ` Peter Zijlstra
2016-10-20 23:07 ` [PATCH 1/2] x86/dumpstack: on oops do not rewind stack for kthread Andy Lutomirski
1 sibling, 2 replies; 9+ messages in thread
From: Roman Pen @ 2016-09-21 15:43 UTC (permalink / raw)
Cc: Roman Pen, Andy Lutomirski, Josh Poimboeuf, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H . Peter Anvin, Peter Zijlstra,
Thomas Gleixner, Ingo Molnar, Tejun Heo, linux-kernel
If panic_on_oops is not set and oops happens inside workqueue kthread,
kernel kills this kthread. Current patch fixes recursive GPF which
happens in that case with the following stack:
[<ffffffff81397f75>] dump_stack+0x68/0x93
[<ffffffff8106954b>] ? do_exit+0x7ab/0xc10
[<ffffffff8108fd73>] __schedule_bug+0x83/0xe0
[<ffffffff81716d5a>] __schedule+0x7ea/0xba0
[<ffffffff810c864f>] ? vprintk_default+0x1f/0x30
[<ffffffff8116a63c>] ? printk+0x48/0x50
[<ffffffff81717150>] schedule+0x40/0x90
[<ffffffff8106976a>] do_exit+0x9ca/0xc10
[<ffffffff810c8e3d>] ? kmsg_dump+0x11d/0x190
[<ffffffff810c8d37>] ? kmsg_dump+0x17/0x190
[<ffffffff81021ee9>] oops_end+0x99/0xd0
[<ffffffff81052da5>] no_context+0x185/0x3e0
[<ffffffff81053083>] __bad_area_nosemaphore+0x83/0x1c0
[<ffffffff810c820e>] ? vprintk_emit+0x25e/0x530
[<ffffffff810531d4>] bad_area_nosemaphore+0x14/0x20
[<ffffffff8105355c>] __do_page_fault+0xac/0x570
[<ffffffff810c66fe>] ? console_trylock+0x1e/0xe0
[<ffffffff81002036>] ? trace_hardirqs_off_thunk+0x1a/0x1c
[<ffffffff81053a2c>] do_page_fault+0xc/0x10
[<ffffffff8171f812>] page_fault+0x22/0x30
[<ffffffff81089bc3>] ? kthread_data+0x33/0x40
[<ffffffff8108427e>] ? wq_worker_sleeping+0xe/0x80
[<ffffffff817169eb>] __schedule+0x47b/0xba0
[<ffffffff81717150>] schedule+0x40/0x90
[<ffffffff8106957d>] do_exit+0x7dd/0xc10
[<ffffffff81021ee9>] oops_end+0x99/0xd0
The root cause is that zeroed task->vfork_done member is accessed from
wq_worker_sleeping() hook. The zeroing out happens on the following
path:
oops_end()
do_exit()
exit_mm()
mm_release()
complete_vfork_done()
In order to fix a bug dead tasks must be ignored.
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org
---
kernel/sched/core.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2c303e7..50772e5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
* If a worker went to sleep, notify and ask workqueue
* whether it wants to wake up a task to maintain
* concurrency.
+ *
+ * Also the following stack is possible:
+ * oops_end()
+ * do_exit()
+ * schedule()
+ *
+ * If panic_on_oops is not set and oops happens on
+ * a workqueue execution path, thread will be killed.
+ * That is definitly sad, but not to make the situation
+ * even worse we have to ignore dead tasks in order not
+ * to step on zeroed out members (e.g. t->vfork_done is
+ * already NULL on that path, since we were called by
+ * do_exit()))
*/
- if (prev->flags & PF_WQ_WORKER) {
+ if (prev->flags & PF_WQ_WORKER &&
+ prev->state != TASK_DEAD) {
struct task_struct *to_wakeup;
to_wakeup = wq_worker_sleeping(prev);
--
2.9.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] x86/dumpstack: on oops do not rewind stack for kthread
2016-09-21 15:43 [PATCH 1/2] x86/dumpstack: on oops do not rewind stack for kthread Roman Pen
2016-09-21 15:43 ` [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead Roman Pen
@ 2016-10-20 23:07 ` Andy Lutomirski
2016-10-21 5:56 ` Peter Zijlstra
1 sibling, 1 reply; 9+ messages in thread
From: Andy Lutomirski @ 2016-10-20 23:07 UTC (permalink / raw)
To: Roman Pen
Cc: Andy Lutomirski, Josh Poimboeuf, Borislav Petkov, Brian Gerst,
Denys Vlasenko, H . Peter Anvin, Peter Zijlstra, Thomas Gleixner,
Ingo Molnar, Tejun Heo, X86 ML, linux-kernel
On Wed, Sep 21, 2016 at 8:43 AM, Roman Pen
<roman.penyaev@profitbricks.com> wrote:
> kthread uses stack and keeps completion structure on it to be woken up
> on vfork_done completion.
>
> In commit 2deb4be28 Andy Lutomirski rewinds the stack unconditionally
> and further completion of task->vfork_done for any kthread leads to stack
> corruption (or infinite spin on attempt to spin lock on garbage memory).
This is sort of okay, but it will blow up pretty badly if a kthread
overflows its stack. Would it make more sense to change
rewind_stack_do_exit() to leave a big enough gap at the top of the
stack to avoid clobbering the completion?
--Andy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead
2016-09-21 15:43 ` [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead Roman Pen
@ 2016-10-20 23:08 ` Andy Lutomirski
2016-10-21 15:47 ` Oleg Nesterov
2016-10-21 5:39 ` Peter Zijlstra
1 sibling, 1 reply; 9+ messages in thread
From: Andy Lutomirski @ 2016-10-20 23:08 UTC (permalink / raw)
To: Roman Pen, Oleg Nesterov
Cc: Andy Lutomirski, Josh Poimboeuf, Borislav Petkov, Brian Gerst,
Denys Vlasenko, H . Peter Anvin, Peter Zijlstra, Thomas Gleixner,
Ingo Molnar, Tejun Heo, linux-kernel
On Wed, Sep 21, 2016 at 8:43 AM, Roman Pen
<roman.penyaev@profitbricks.com> wrote:
> If panic_on_oops is not set and oops happens inside workqueue kthread,
> kernel kills this kthread. Current patch fixes recursive GPF which
> happens in that case with the following stack:
Oleg, can you take a look at this?
--Andy
>
> [<ffffffff81397f75>] dump_stack+0x68/0x93
> [<ffffffff8106954b>] ? do_exit+0x7ab/0xc10
> [<ffffffff8108fd73>] __schedule_bug+0x83/0xe0
> [<ffffffff81716d5a>] __schedule+0x7ea/0xba0
> [<ffffffff810c864f>] ? vprintk_default+0x1f/0x30
> [<ffffffff8116a63c>] ? printk+0x48/0x50
> [<ffffffff81717150>] schedule+0x40/0x90
> [<ffffffff8106976a>] do_exit+0x9ca/0xc10
> [<ffffffff810c8e3d>] ? kmsg_dump+0x11d/0x190
> [<ffffffff810c8d37>] ? kmsg_dump+0x17/0x190
> [<ffffffff81021ee9>] oops_end+0x99/0xd0
> [<ffffffff81052da5>] no_context+0x185/0x3e0
> [<ffffffff81053083>] __bad_area_nosemaphore+0x83/0x1c0
> [<ffffffff810c820e>] ? vprintk_emit+0x25e/0x530
> [<ffffffff810531d4>] bad_area_nosemaphore+0x14/0x20
> [<ffffffff8105355c>] __do_page_fault+0xac/0x570
> [<ffffffff810c66fe>] ? console_trylock+0x1e/0xe0
> [<ffffffff81002036>] ? trace_hardirqs_off_thunk+0x1a/0x1c
> [<ffffffff81053a2c>] do_page_fault+0xc/0x10
> [<ffffffff8171f812>] page_fault+0x22/0x30
> [<ffffffff81089bc3>] ? kthread_data+0x33/0x40
> [<ffffffff8108427e>] ? wq_worker_sleeping+0xe/0x80
> [<ffffffff817169eb>] __schedule+0x47b/0xba0
> [<ffffffff81717150>] schedule+0x40/0x90
> [<ffffffff8106957d>] do_exit+0x7dd/0xc10
> [<ffffffff81021ee9>] oops_end+0x99/0xd0
>
> The root cause is that zeroed task->vfork_done member is accessed from
> wq_worker_sleeping() hook. The zeroing out happens on the following
> path:
>
> oops_end()
> do_exit()
> exit_mm()
> mm_release()
> complete_vfork_done()
>
> In order to fix a bug dead tasks must be ignored.
>
> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Denys Vlasenko <dvlasenk@redhat.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: linux-kernel@vger.kernel.org
> ---
> kernel/sched/core.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2c303e7..50772e5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
> * If a worker went to sleep, notify and ask workqueue
> * whether it wants to wake up a task to maintain
> * concurrency.
> + *
> + * Also the following stack is possible:
> + * oops_end()
> + * do_exit()
> + * schedule()
> + *
> + * If panic_on_oops is not set and oops happens on
> + * a workqueue execution path, thread will be killed.
> + * That is definitly sad, but not to make the situation
> + * even worse we have to ignore dead tasks in order not
> + * to step on zeroed out members (e.g. t->vfork_done is
> + * already NULL on that path, since we were called by
> + * do_exit()))
> */
> - if (prev->flags & PF_WQ_WORKER) {
> + if (prev->flags & PF_WQ_WORKER &&
> + prev->state != TASK_DEAD) {
> struct task_struct *to_wakeup;
>
> to_wakeup = wq_worker_sleeping(prev);
> --
> 2.9.3
>
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead
2016-09-21 15:43 ` [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead Roman Pen
2016-10-20 23:08 ` Andy Lutomirski
@ 2016-10-21 5:39 ` Peter Zijlstra
1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2016-10-21 5:39 UTC (permalink / raw)
To: Roman Pen
Cc: Andy Lutomirski, Josh Poimboeuf, Borislav Petkov, Brian Gerst,
Denys Vlasenko, H . Peter Anvin, Thomas Gleixner, Ingo Molnar,
Tejun Heo, linux-kernel
On Wed, Sep 21, 2016 at 05:43:50PM +0200, Roman Pen wrote:
> If panic_on_oops is not set and oops happens inside workqueue kthread,
> kernel kills this kthread. Current patch fixes recursive GPF which
> happens in that case with the following stack:
> The root cause is that zeroed task->vfork_done member is accessed from
> wq_worker_sleeping() hook.
This is the kthread_data() -> to_kthread() thing? Could've done with
spelling out, now you had me searching all over :/
Urgh what a mess..
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] x86/dumpstack: on oops do not rewind stack for kthread
2016-10-20 23:07 ` [PATCH 1/2] x86/dumpstack: on oops do not rewind stack for kthread Andy Lutomirski
@ 2016-10-21 5:56 ` Peter Zijlstra
2016-10-21 8:05 ` Thomas Gleixner
0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2016-10-21 5:56 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Roman Pen, Andy Lutomirski, Josh Poimboeuf, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H . Peter Anvin, Thomas Gleixner,
Ingo Molnar, Tejun Heo, X86 ML, linux-kernel
On Thu, Oct 20, 2016 at 04:07:28PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 21, 2016 at 8:43 AM, Roman Pen
> <roman.penyaev@profitbricks.com> wrote:
> > kthread uses stack and keeps completion structure on it to be woken up
> > on vfork_done completion.
> >
> > In commit 2deb4be28 Andy Lutomirski rewinds the stack unconditionally
> > and further completion of task->vfork_done for any kthread leads to stack
> > corruption (or infinite spin on attempt to spin lock on garbage memory).
>
> This is sort of okay, but it will blow up pretty badly if a kthread
> overflows its stack. Would it make more sense to change
> rewind_stack_do_exit() to leave a big enough gap at the top of the
> stack to avoid clobbering the completion?
We need to preserve the entire struct kthread on the stack, kthread just
abuses that pointer to stash an on-stack kthread descriptor. See
kthread():
current->vfork_done = &self.exited;
Its a horrible horrible thing kthread does. I suppose there might have
been some intent by keeping that exited completion last in the
structure, but *shudder*.
But yes, leaving enough stack to not clobber that might keep this horror
show working.
ISTR talk about alternative schemes for this a long time ago, but I
cannot recall :-(
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] x86/dumpstack: on oops do not rewind stack for kthread
2016-10-21 5:56 ` Peter Zijlstra
@ 2016-10-21 8:05 ` Thomas Gleixner
0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2016-10-21 8:05 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andy Lutomirski, Roman Pen, Andy Lutomirski, Josh Poimboeuf,
Borislav Petkov, Brian Gerst, Denys Vlasenko, H . Peter Anvin,
Ingo Molnar, Tejun Heo, X86 ML, linux-kernel
On Fri, 21 Oct 2016, Peter Zijlstra wrote:
> We need to preserve the entire struct kthread on the stack, kthread just
> abuses that pointer to stash an on-stack kthread descriptor. See
> kthread():
>
> current->vfork_done = &self.exited;
>
> Its a horrible horrible thing kthread does. I suppose there might have
> been some intent by keeping that exited completion last in the
> structure, but *shudder*.
>
> But yes, leaving enough stack to not clobber that might keep this horror
> show working.
>
> ISTR talk about alternative schemes for this a long time ago, but I
> cannot recall :-(
The simplest solution would be to stick struct kthread into task_struct,
but that's bloat.
But we can allocate it seperately along with kthread_create_info. That's
pretty straight forward.
Thanks,
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead
2016-10-20 23:08 ` Andy Lutomirski
@ 2016-10-21 15:47 ` Oleg Nesterov
2016-10-24 16:01 ` Roman Penyaev
0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2016-10-21 15:47 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Roman Pen, Andy Lutomirski, Josh Poimboeuf, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H . Peter Anvin, Peter Zijlstra,
Thomas Gleixner, Ingo Molnar, Tejun Heo, linux-kernel
On 10/20, Andy Lutomirski wrote:
>
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
> > * If a worker went to sleep, notify and ask workqueue
> > * whether it wants to wake up a task to maintain
> > * concurrency.
> > + *
> > + * Also the following stack is possible:
> > + * oops_end()
> > + * do_exit()
> > + * schedule()
> > + *
> > + * If panic_on_oops is not set and oops happens on
> > + * a workqueue execution path, thread will be killed.
> > + * That is definitly sad, but not to make the situation
> > + * even worse we have to ignore dead tasks in order not
> > + * to step on zeroed out members (e.g. t->vfork_done is
> > + * already NULL on that path, since we were called by
> > + * do_exit()))
And we have more problems like this. Say, if blk_flush_plug_list()
crashes it will likely crash again and again recursively.
> > */
> > - if (prev->flags & PF_WQ_WORKER) {
> > + if (prev->flags & PF_WQ_WORKER &&
> > + prev->state != TASK_DEAD) {
I don't think we should change __schedule()... Can't we simply clear
PF_WQ_WORKER in complete_vfork_done() ? Or add the PF_EXITING checks
into wq_worker_sleeping() and wq_worker_waking_up().
Or perhaps something like the change below.
Oleg.
--- x/kernel/workqueue.c
+++ x/kernel/workqueue.c
@@ -2157,6 +2157,14 @@ static void process_scheduled_works(stru
}
}
+static void oops_handler(struct callback_head *oops_work)
+{
+ if (!(current->flags & PF_WQ_WORKER))
+ return;
+
+ clear PF_WQ_WORKER, probably do more cleanups
+}
+
/**
* worker_thread - the worker thread function
* @__worker: self
@@ -2171,11 +2179,14 @@ static void process_scheduled_works(stru
*/
static int worker_thread(void *__worker)
{
+ struct callback_head oops_work;
struct worker *worker = __worker;
struct worker_pool *pool = worker->pool;
/* tell the scheduler that this is a workqueue worker */
worker->task->flags |= PF_WQ_WORKER;
+ init_task_work(&oops_work, oops_handler);
+ task_work_add(current, &oops_work, false);
woke_up:
spin_lock_irq(&pool->lock);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead
2016-10-21 15:47 ` Oleg Nesterov
@ 2016-10-24 16:01 ` Roman Penyaev
0 siblings, 0 replies; 9+ messages in thread
From: Roman Penyaev @ 2016-10-24 16:01 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andy Lutomirski, Andy Lutomirski, Josh Poimboeuf,
Borislav Petkov, Brian Gerst, Denys Vlasenko, H . Peter Anvin,
Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Tejun Heo,
linux-kernel
On Fri, Oct 21, 2016 at 5:47 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 10/20, Andy Lutomirski wrote:
>>
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>> > @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
>> > * If a worker went to sleep, notify and ask workqueue
>> > * whether it wants to wake up a task to maintain
>> > * concurrency.
>> > + *
>> > + * Also the following stack is possible:
>> > + * oops_end()
>> > + * do_exit()
>> > + * schedule()
>> > + *
>> > + * If panic_on_oops is not set and oops happens on
>> > + * a workqueue execution path, thread will be killed.
>> > + * That is definitly sad, but not to make the situation
>> > + * even worse we have to ignore dead tasks in order not
>> > + * to step on zeroed out members (e.g. t->vfork_done is
>> > + * already NULL on that path, since we were called by
>> > + * do_exit()))
>
> And we have more problems like this. Say, if blk_flush_plug_list()
> crashes it will likely crash again and again recursively.
I will send a patch if reproduce it :)
>
>> > */
>> > - if (prev->flags & PF_WQ_WORKER) {
>> > + if (prev->flags & PF_WQ_WORKER &&
>> > + prev->state != TASK_DEAD) {
>
> I don't think we should change __schedule()... Can't we simply clear
> PF_WQ_WORKER in complete_vfork_done() ? Or add the PF_EXITING checks
> into wq_worker_sleeping() and wq_worker_waking_up().
Yeah, probably handling this corner case in wq_worker_sleeping() func
is much better.
>
> Or perhaps something like the change below.
That's a nice stuff, thanks Oleg. I simply did not know about these
callbacks.
But the huge problem is that after commit 2deb4be28 by Andy Lutomirski
we can't use stack when we are already in do_exit(). And putting this
callback head inside a worker structure is a bloat. I will resend this
with a simple task state check in a wq_worker_sleeping().
--
Roman
>
> Oleg.
>
> --- x/kernel/workqueue.c
> +++ x/kernel/workqueue.c
> @@ -2157,6 +2157,14 @@ static void process_scheduled_works(stru
> }
> }
>
> +static void oops_handler(struct callback_head *oops_work)
> +{
> + if (!(current->flags & PF_WQ_WORKER))
> + return;
> +
> + clear PF_WQ_WORKER, probably do more cleanups
> +}
> +
> /**
> * worker_thread - the worker thread function
> * @__worker: self
> @@ -2171,11 +2179,14 @@ static void process_scheduled_works(stru
> */
> static int worker_thread(void *__worker)
> {
> + struct callback_head oops_work;
> struct worker *worker = __worker;
> struct worker_pool *pool = worker->pool;
>
> /* tell the scheduler that this is a workqueue worker */
> worker->task->flags |= PF_WQ_WORKER;
> + init_task_work(&oops_work, oops_handler);
> + task_work_add(current, &oops_work, false);
> woke_up:
> spin_lock_irq(&pool->lock);
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-10-24 16:02 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-21 15:43 [PATCH 1/2] x86/dumpstack: on oops do not rewind stack for kthread Roman Pen
2016-09-21 15:43 ` [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead Roman Pen
2016-10-20 23:08 ` Andy Lutomirski
2016-10-21 15:47 ` Oleg Nesterov
2016-10-24 16:01 ` Roman Penyaev
2016-10-21 5:39 ` Peter Zijlstra
2016-10-20 23:07 ` [PATCH 1/2] x86/dumpstack: on oops do not rewind stack for kthread Andy Lutomirski
2016-10-21 5:56 ` Peter Zijlstra
2016-10-21 8:05 ` Thomas Gleixner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.