* __queue_work oops.
@ 2017-02-27 17:14 Dave Jones
2017-02-27 18:39 ` Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Dave Jones @ 2017-02-27 17:14 UTC (permalink / raw)
To: Linux Kernel; +Cc: Tejun Heo
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-think+ #9
task: ffff88017f105440 task.stack: ffffc90000094000
RIP: 0010:__queue_work+0x2d/0x700
RSP: 0018:ffff880507c03df8 EFLAGS: 00010046
RAX: 0000000000000082 RBX: 0000000000000101 RCX: 0000000000000002
RDX: ffff88047bf07c98 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff880507c03e30 R08: 0000000000000001 R09: ffffffff8294bf68
R10: ffff880507c03e58 R11: 0000000000000000 R12: ffff88047bf07ce8
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047bf07c98
FS: 0000000000000000(0000) GS:ffff880507c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000001c2 CR3: 0000000004e11000 CR4: 00000000001406e0
Call Trace:
<IRQ>
? work_on_cpu+0xb0/0xb0
delayed_work_timer_fn+0x1e/0x20
call_timer_fn+0xbd/0x480
? call_timer_fn+0x5/0x480
? work_on_cpu+0xb0/0xb0
run_timer_softirq+0x1a1/0x700
__do_softirq+0xbf/0x5b1
irq_exit+0xb5/0xc0
smp_apic_timer_interrupt+0x3d/0x50
apic_timer_interrupt+0x97/0xa0
RIP: 0010:cpuidle_enter_state+0x12e/0x400
RSP: 0018:ffffc90000097e40 EFLAGS: 00000202
[CONT START] ORIG_RAX: ffffffffffffff10
RAX: ffff88017f105440 RBX: ffffe8ffffa03dc8 RCX: 0000000000000001
RDX: 20c49ba5e353f7cf RSI: 0000000000000001 RDI: ffff88017f105440
RBP: ffffc90000097e80 R08: cccccccccccccccd R09: 0000000000000018
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
R13: ffffffff81eaf198 R14: 0000000000000005 R15: ffffffff81eaf180
</IRQ>
cpuidle_enter+0x17/0x20
call_cpuidle+0x23/0x40
do_idle+0xfb/0x200
cpu_startup_entry+0x71/0x80
start_secondary+0x16a/0x210
start_cpu+0x14/0x14
Code: 44 00 00 55 48 89 e5 41 57 49 89 d7 41 56 41 89 fe 41 55 49 89 f5 41 54 53 48 83 ec 10 89 7d d4 9c 58 f6 c4 02 0f 85 06 04 00 00 <41> f6 85 c2 01 00 00 01 0f 85 22 04 00 00 49 bc eb 83 b5 80 46
Code starting with the faulting instruction
===========================================
0: 41 f6 85 c2 01 00 00 testb $0x1,0x1c2(%r13)
7: 01
8: 0f 85 22 04 00 00 jne 0x430
e: 49 rex.WB
f: bc eb 83 b5 80 mov $0x80b583eb,%esp
14: 46 rex.RX
0000000000003cf0 <__queue_work>:
{
3cf0: e8 00 00 00 00 callq 3cf5 <__queue_work+0x5>
3cf5: 55 push %rbp
3cf6: 48 89 e5 mov %rsp,%rbp
3cf9: 41 57 push %r15
3cfb: 49 89 d7 mov %rdx,%r15
3cfe: 41 56 push %r14
unsigned int req_cpu = cpu;
3d00: 41 89 fe mov %edi,%r14d
{
3d03: 41 55 push %r13
3d05: 49 89 f5 mov %rsi,%r13
3d08: 41 54 push %r12
3d0a: 53 push %rbx
3d0b: 48 83 ec 10 sub $0x10,%rsp
3d0f: 89 7d d4 mov %edi,-0x2c(%rbp)
asm volatile("# __raw_save_flags\n\t"
3d12: 9c pushfq
3d13: 58 pop %rax
WARN_ON_ONCE(!irqs_disabled());
3d14: f6 c4 02 test $0x2,%ah
3d17: 0f 85 06 04 00 00 jne 4123 <__queue_work+0x433>
if (unlikely(wq->flags & __WQ_DRAINING) &&
3d1d: 41 f6 85 c2 01 00 00 testb $0x1,0x1c2(%r13)
So we called __queue_work with a null wq.
Thoughts ?
Dave
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: __queue_work oops.
2017-02-27 17:14 __queue_work oops Dave Jones
@ 2017-02-27 18:39 ` Tejun Heo
2017-02-27 19:57 ` Dave Jones
2017-03-06 20:39 ` Tejun Heo
0 siblings, 2 replies; 4+ messages in thread
From: Tejun Heo @ 2017-02-27 18:39 UTC (permalink / raw)
To: Dave Jones; +Cc: Linux Kernel
Hello,
On Mon, Feb 27, 2017 at 12:14:39PM -0500, Dave Jones wrote:
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-think+ #9
> task: ffff88017f105440 task.stack: ffffc90000094000
> RIP: 0010:__queue_work+0x2d/0x700
> RSP: 0018:ffff880507c03df8 EFLAGS: 00010046
> RAX: 0000000000000082 RBX: 0000000000000101 RCX: 0000000000000002
> RDX: ffff88047bf07c98 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff880507c03e30 R08: 0000000000000001 R09: ffffffff8294bf68
> R10: ffff880507c03e58 R11: 0000000000000000 R12: ffff88047bf07ce8
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047bf07c98
> FS: 0000000000000000(0000) GS:ffff880507c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000001c2 CR3: 0000000004e11000 CR4: 00000000001406e0
> Call Trace:
> <IRQ>
> ? work_on_cpu+0xb0/0xb0
> delayed_work_timer_fn+0x1e/0x20
> call_timer_fn+0xbd/0x480
...
> Code starting with the faulting instruction
> ===========================================
> 0: 41 f6 85 c2 01 00 00 testb $0x1,0x1c2(%r13)
> 7: 01
> 8: 0f 85 22 04 00 00 jne 0x430
> e: 49 rex.WB
> f: bc eb 83 b5 80 mov $0x80b583eb,%esp
> 14: 46 rex.RX
>
> 0000000000003cf0 <__queue_work>:
> {
> 3cf0: e8 00 00 00 00 callq 3cf5 <__queue_work+0x5>
> 3cf5: 55 push %rbp
> 3cf6: 48 89 e5 mov %rsp,%rbp
> 3cf9: 41 57 push %r15
> 3cfb: 49 89 d7 mov %rdx,%r15
> 3cfe: 41 56 push %r14
> unsigned int req_cpu = cpu;
> 3d00: 41 89 fe mov %edi,%r14d
> {
> 3d03: 41 55 push %r13
> 3d05: 49 89 f5 mov %rsi,%r13
> 3d08: 41 54 push %r12
> 3d0a: 53 push %rbx
> 3d0b: 48 83 ec 10 sub $0x10,%rsp
> 3d0f: 89 7d d4 mov %edi,-0x2c(%rbp)
> asm volatile("# __raw_save_flags\n\t"
> 3d12: 9c pushfq
> 3d13: 58 pop %rax
> WARN_ON_ONCE(!irqs_disabled());
> 3d14: f6 c4 02 test $0x2,%ah
> 3d17: 0f 85 06 04 00 00 jne 4123 <__queue_work+0x433>
> if (unlikely(wq->flags & __WQ_DRAINING) &&
> 3d1d: 41 f6 85 c2 01 00 00 testb $0x1,0x1c2(%r13)
>
>
> So we called __queue_work with a null wq.
So, that's somebody calling queue_delayed_work[_on]() with a NULL wq
and when the timeout expires the timer callback trying to queue
against NULL. Hmm... the work function would be able to tell us who
queued it but it isn't part of the information dumped here (would be
0x18(%rdx)).
I'll add a sanity check on queue_delayed_work_on() so that we can
catch it synchronously when it happens.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: __queue_work oops.
2017-02-27 18:39 ` Tejun Heo
@ 2017-02-27 19:57 ` Dave Jones
2017-03-06 20:39 ` Tejun Heo
1 sibling, 0 replies; 4+ messages in thread
From: Dave Jones @ 2017-02-27 19:57 UTC (permalink / raw)
To: Tejun Heo; +Cc: Linux Kernel, Ursula Braun
On Mon, Feb 27, 2017 at 01:39:25PM -0500, Tejun Heo wrote:
> On Mon, Feb 27, 2017 at 12:14:39PM -0500, Dave Jones wrote:
> > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-think+ #9
> > task: ffff88017f105440 task.stack: ffffc90000094000
> > RIP: 0010:__queue_work+0x2d/0x700
> > RSP: 0018:ffff880507c03df8 EFLAGS: 00010046
> > RAX: 0000000000000082 RBX: 0000000000000101 RCX: 0000000000000002
> > RDX: ffff88047bf07c98 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffff880507c03e30 R08: 0000000000000001 R09: ffffffff8294bf68
> > R10: ffff880507c03e58 R11: 0000000000000000 R12: ffff88047bf07ce8
> > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047bf07c98
> > FS: 0000000000000000(0000) GS:ffff880507c00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000000001c2 CR3: 0000000004e11000 CR4: 00000000001406e0
> > Call Trace:
> > <IRQ>
> > ? work_on_cpu+0xb0/0xb0
> > delayed_work_timer_fn+0x1e/0x20
> > call_timer_fn+0xbd/0x480
> ...
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 41 f6 85 c2 01 00 00 testb $0x1,0x1c2(%r13)
> > 7: 01
> > 8: 0f 85 22 04 00 00 jne 0x430
> > e: 49 rex.WB
> > f: bc eb 83 b5 80 mov $0x80b583eb,%esp
> > 14: 46 rex.RX
> >
> > 0000000000003cf0 <__queue_work>:
> > {
> > 3cf0: e8 00 00 00 00 callq 3cf5 <__queue_work+0x5>
> > 3cf5: 55 push %rbp
> > 3cf6: 48 89 e5 mov %rsp,%rbp
> > 3cf9: 41 57 push %r15
> > 3cfb: 49 89 d7 mov %rdx,%r15
> > 3cfe: 41 56 push %r14
> > unsigned int req_cpu = cpu;
> > 3d00: 41 89 fe mov %edi,%r14d
> > {
> > 3d03: 41 55 push %r13
> > 3d05: 49 89 f5 mov %rsi,%r13
> > 3d08: 41 54 push %r12
> > 3d0a: 53 push %rbx
> > 3d0b: 48 83 ec 10 sub $0x10,%rsp
> > 3d0f: 89 7d d4 mov %edi,-0x2c(%rbp)
> > asm volatile("# __raw_save_flags\n\t"
> > 3d12: 9c pushfq
> > 3d13: 58 pop %rax
> > WARN_ON_ONCE(!irqs_disabled());
> > 3d14: f6 c4 02 test $0x2,%ah
> > 3d17: 0f 85 06 04 00 00 jne 4123 <__queue_work+0x433>
> > if (unlikely(wq->flags & __WQ_DRAINING) &&
> > 3d1d: 41 f6 85 c2 01 00 00 testb $0x1,0x1c2(%r13)
> >
> >
> > So we called __queue_work with a null wq.
>
> So, that's somebody calling queue_delayed_work[_on]() with a NULL wq
> and when the timeout expires the timer callback trying to queue
> against NULL. Hmm... the work function would be able to tell us who
> queued it but it isn't part of the information dumped here (would be
> 0x18(%rdx)).
>
> I'll add a sanity check on queue_delayed_work_on() so that we can
> catch it synchronously when it happens.
I dumped work->func, and found it pointed to smc_close_sock_put_work
<smc maintainer cc'd>
Dave
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: __queue_work oops.
2017-02-27 18:39 ` Tejun Heo
2017-02-27 19:57 ` Dave Jones
@ 2017-03-06 20:39 ` Tejun Heo
1 sibling, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2017-03-06 20:39 UTC (permalink / raw)
To: Dave Jones; +Cc: Linux Kernel
Hello,
Applied the following to wq/for-4.11-fixes.
Thanks.
------ 8< ------
>From 637fdbae60d6cb9f6e963c1079d7e0445c86ff7d Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Mon, 6 Mar 2017 15:33:42 -0500
Subject: [PATCH] workqueue: trigger WARN if queue_delayed_work() is called
with NULL @wq
If queue_delayed_work() gets called with NULL @wq, the kernel will
oops asynchronuosly on timer expiration which isn't too helpful in
tracking down the offender. This actually happened with smc.
__queue_delayed_work() already does several input sanity checks
synchronously. Add NULL @wq check.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/workqueue.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 072cbc9..c0168b7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1507,6 +1507,7 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
struct timer_list *timer = &dwork->timer;
struct work_struct *work = &dwork->work;
+ WARN_ON_ONCE(!wq);
WARN_ON_ONCE(timer->function != delayed_work_timer_fn ||
timer->data != (unsigned long)dwork);
WARN_ON_ONCE(timer_pending(timer));
--
2.9.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-03-06 21:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-27 17:14 __queue_work oops Dave Jones
2017-02-27 18:39 ` Tejun Heo
2017-02-27 19:57 ` Dave Jones
2017-03-06 20:39 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).