linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* __queue_work oops.
@ 2017-02-27 17:14 Dave Jones
  2017-02-27 18:39 ` Tejun Heo
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Jones @ 2017-02-27 17:14 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Tejun Heo

Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-think+ #9 
task: ffff88017f105440 task.stack: ffffc90000094000
RIP: 0010:__queue_work+0x2d/0x700
RSP: 0018:ffff880507c03df8 EFLAGS: 00010046
RAX: 0000000000000082 RBX: 0000000000000101 RCX: 0000000000000002
RDX: ffff88047bf07c98 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff880507c03e30 R08: 0000000000000001 R09: ffffffff8294bf68
R10: ffff880507c03e58 R11: 0000000000000000 R12: ffff88047bf07ce8
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047bf07c98
FS:  0000000000000000(0000) GS:ffff880507c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000001c2 CR3: 0000000004e11000 CR4: 00000000001406e0
Call Trace:
 <IRQ>
 ? work_on_cpu+0xb0/0xb0
 delayed_work_timer_fn+0x1e/0x20
 call_timer_fn+0xbd/0x480
 ? call_timer_fn+0x5/0x480
 ? work_on_cpu+0xb0/0xb0
 run_timer_softirq+0x1a1/0x700
 __do_softirq+0xbf/0x5b1
 irq_exit+0xb5/0xc0
 smp_apic_timer_interrupt+0x3d/0x50
 apic_timer_interrupt+0x97/0xa0
RIP: 0010:cpuidle_enter_state+0x12e/0x400
RSP: 0018:ffffc90000097e40 EFLAGS: 00000202
[CONT START]  ORIG_RAX: ffffffffffffff10
RAX: ffff88017f105440 RBX: ffffe8ffffa03dc8 RCX: 0000000000000001
RDX: 20c49ba5e353f7cf RSI: 0000000000000001 RDI: ffff88017f105440
RBP: ffffc90000097e80 R08: cccccccccccccccd R09: 0000000000000018
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
R13: ffffffff81eaf198 R14: 0000000000000005 R15: ffffffff81eaf180
 </IRQ>
 cpuidle_enter+0x17/0x20
 call_cpuidle+0x23/0x40
 do_idle+0xfb/0x200
 cpu_startup_entry+0x71/0x80
 start_secondary+0x16a/0x210
 start_cpu+0x14/0x14
Code: 44 00 00 55 48 89 e5 41 57 49 89 d7 41 56 41 89 fe 41 55 49 89 f5 41 54 53 48 83 ec 10 89 7d d4 9c 58 f6 c4 02 0f 85 06 04 00 00 <41> f6 85 c2 01 00 00 01 0f 85 22 04 00 00 49 bc eb 83 b5 80 46 

Code starting with the faulting instruction
===========================================
   0:	41 f6 85 c2 01 00 00 	testb  $0x1,0x1c2(%r13)
   7:	01 
   8:	0f 85 22 04 00 00    	jne    0x430
   e:	49                   	rex.WB
   f:	bc eb 83 b5 80       	mov    $0x80b583eb,%esp
  14:	46                   	rex.RX



0000000000003cf0 <__queue_work>:
{
    3cf0:       e8 00 00 00 00          callq  3cf5 <__queue_work+0x5>
    3cf5:       55                      push   %rbp
    3cf6:       48 89 e5                mov    %rsp,%rbp
    3cf9:       41 57                   push   %r15
    3cfb:       49 89 d7                mov    %rdx,%r15
    3cfe:       41 56                   push   %r14
        unsigned int req_cpu = cpu;
    3d00:       41 89 fe                mov    %edi,%r14d
{
    3d03:       41 55                   push   %r13
    3d05:       49 89 f5                mov    %rsi,%r13
    3d08:       41 54                   push   %r12
    3d0a:       53                      push   %rbx
    3d0b:       48 83 ec 10             sub    $0x10,%rsp
    3d0f:       89 7d d4                mov    %edi,-0x2c(%rbp)
        asm volatile("# __raw_save_flags\n\t"
    3d12:       9c                      pushfq 
    3d13:       58                      pop    %rax
        WARN_ON_ONCE(!irqs_disabled());
    3d14:       f6 c4 02                test   $0x2,%ah
    3d17:       0f 85 06 04 00 00       jne    4123 <__queue_work+0x433>
        if (unlikely(wq->flags & __WQ_DRAINING) &&
    3d1d:       41 f6 85 c2 01 00 00    testb  $0x1,0x1c2(%r13)


So we called __queue_work with a null wq.

Thoughts ?

	Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: __queue_work oops.
  2017-02-27 17:14 __queue_work oops Dave Jones
@ 2017-02-27 18:39 ` Tejun Heo
  2017-02-27 19:57   ` Dave Jones
  2017-03-06 20:39   ` Tejun Heo
  0 siblings, 2 replies; 4+ messages in thread
From: Tejun Heo @ 2017-02-27 18:39 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel

Hello,

On Mon, Feb 27, 2017 at 12:14:39PM -0500, Dave Jones wrote:
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-think+ #9 
> task: ffff88017f105440 task.stack: ffffc90000094000
> RIP: 0010:__queue_work+0x2d/0x700
> RSP: 0018:ffff880507c03df8 EFLAGS: 00010046
> RAX: 0000000000000082 RBX: 0000000000000101 RCX: 0000000000000002
> RDX: ffff88047bf07c98 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff880507c03e30 R08: 0000000000000001 R09: ffffffff8294bf68
> R10: ffff880507c03e58 R11: 0000000000000000 R12: ffff88047bf07ce8
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047bf07c98
> FS:  0000000000000000(0000) GS:ffff880507c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000001c2 CR3: 0000000004e11000 CR4: 00000000001406e0
> Call Trace:
>  <IRQ>
>  ? work_on_cpu+0xb0/0xb0
>  delayed_work_timer_fn+0x1e/0x20
>  call_timer_fn+0xbd/0x480
...
> Code starting with the faulting instruction
> ===========================================
>    0:	41 f6 85 c2 01 00 00 	testb  $0x1,0x1c2(%r13)
>    7:	01 
>    8:	0f 85 22 04 00 00    	jne    0x430
>    e:	49                   	rex.WB
>    f:	bc eb 83 b5 80       	mov    $0x80b583eb,%esp
>   14:	46                   	rex.RX
> 
> 0000000000003cf0 <__queue_work>:
> {
>     3cf0:       e8 00 00 00 00          callq  3cf5 <__queue_work+0x5>
>     3cf5:       55                      push   %rbp
>     3cf6:       48 89 e5                mov    %rsp,%rbp
>     3cf9:       41 57                   push   %r15
>     3cfb:       49 89 d7                mov    %rdx,%r15
>     3cfe:       41 56                   push   %r14
>         unsigned int req_cpu = cpu;
>     3d00:       41 89 fe                mov    %edi,%r14d
> {
>     3d03:       41 55                   push   %r13
>     3d05:       49 89 f5                mov    %rsi,%r13
>     3d08:       41 54                   push   %r12
>     3d0a:       53                      push   %rbx
>     3d0b:       48 83 ec 10             sub    $0x10,%rsp
>     3d0f:       89 7d d4                mov    %edi,-0x2c(%rbp)
>         asm volatile("# __raw_save_flags\n\t"
>     3d12:       9c                      pushfq 
>     3d13:       58                      pop    %rax
>         WARN_ON_ONCE(!irqs_disabled());
>     3d14:       f6 c4 02                test   $0x2,%ah
>     3d17:       0f 85 06 04 00 00       jne    4123 <__queue_work+0x433>
>         if (unlikely(wq->flags & __WQ_DRAINING) &&
>     3d1d:       41 f6 85 c2 01 00 00    testb  $0x1,0x1c2(%r13)
> 
> 
> So we called __queue_work with a null wq.

So, that's somebody calling queue_delayed_work[_on]() with a NULL wq
and when the timeout expires the timer callback trying to queue
against NULL.  Hmm... the work function would be able to tell us who
queued it but it isn't part of the information dumped here (would be
0x18(%rdx)).

I'll add a sanity check on queue_delayed_work_on() so that we can
catch it synchronously when it happens.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: __queue_work oops.
  2017-02-27 18:39 ` Tejun Heo
@ 2017-02-27 19:57   ` Dave Jones
  2017-03-06 20:39   ` Tejun Heo
  1 sibling, 0 replies; 4+ messages in thread
From: Dave Jones @ 2017-02-27 19:57 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linux Kernel, Ursula Braun

On Mon, Feb 27, 2017 at 01:39:25PM -0500, Tejun Heo wrote:

 > On Mon, Feb 27, 2017 at 12:14:39PM -0500, Dave Jones wrote:
 > > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 > > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-think+ #9 
 > > task: ffff88017f105440 task.stack: ffffc90000094000
 > > RIP: 0010:__queue_work+0x2d/0x700
 > > RSP: 0018:ffff880507c03df8 EFLAGS: 00010046
 > > RAX: 0000000000000082 RBX: 0000000000000101 RCX: 0000000000000002
 > > RDX: ffff88047bf07c98 RSI: 0000000000000000 RDI: 0000000000000000
 > > RBP: ffff880507c03e30 R08: 0000000000000001 R09: ffffffff8294bf68
 > > R10: ffff880507c03e58 R11: 0000000000000000 R12: ffff88047bf07ce8
 > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047bf07c98
 > > FS:  0000000000000000(0000) GS:ffff880507c00000(0000) knlGS:0000000000000000
 > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 > > CR2: 00000000000001c2 CR3: 0000000004e11000 CR4: 00000000001406e0
 > > Call Trace:
 > >  <IRQ>
 > >  ? work_on_cpu+0xb0/0xb0
 > >  delayed_work_timer_fn+0x1e/0x20
 > >  call_timer_fn+0xbd/0x480
 > ...
 > > Code starting with the faulting instruction
 > > ===========================================
 > >    0:	41 f6 85 c2 01 00 00 	testb  $0x1,0x1c2(%r13)
 > >    7:	01 
 > >    8:	0f 85 22 04 00 00    	jne    0x430
 > >    e:	49                   	rex.WB
 > >    f:	bc eb 83 b5 80       	mov    $0x80b583eb,%esp
 > >   14:	46                   	rex.RX
 > > 
 > > 0000000000003cf0 <__queue_work>:
 > > {
 > >     3cf0:       e8 00 00 00 00          callq  3cf5 <__queue_work+0x5>
 > >     3cf5:       55                      push   %rbp
 > >     3cf6:       48 89 e5                mov    %rsp,%rbp
 > >     3cf9:       41 57                   push   %r15
 > >     3cfb:       49 89 d7                mov    %rdx,%r15
 > >     3cfe:       41 56                   push   %r14
 > >         unsigned int req_cpu = cpu;
 > >     3d00:       41 89 fe                mov    %edi,%r14d
 > > {
 > >     3d03:       41 55                   push   %r13
 > >     3d05:       49 89 f5                mov    %rsi,%r13
 > >     3d08:       41 54                   push   %r12
 > >     3d0a:       53                      push   %rbx
 > >     3d0b:       48 83 ec 10             sub    $0x10,%rsp
 > >     3d0f:       89 7d d4                mov    %edi,-0x2c(%rbp)
 > >         asm volatile("# __raw_save_flags\n\t"
 > >     3d12:       9c                      pushfq 
 > >     3d13:       58                      pop    %rax
 > >         WARN_ON_ONCE(!irqs_disabled());
 > >     3d14:       f6 c4 02                test   $0x2,%ah
 > >     3d17:       0f 85 06 04 00 00       jne    4123 <__queue_work+0x433>
 > >         if (unlikely(wq->flags & __WQ_DRAINING) &&
 > >     3d1d:       41 f6 85 c2 01 00 00    testb  $0x1,0x1c2(%r13)
 > > 
 > > 
 > > So we called __queue_work with a null wq.
 > 
 > So, that's somebody calling queue_delayed_work[_on]() with a NULL wq
 > and when the timeout expires the timer callback trying to queue
 > against NULL.  Hmm... the work function would be able to tell us who
 > queued it but it isn't part of the information dumped here (would be
 > 0x18(%rdx)).
 > 
 > I'll add a sanity check on queue_delayed_work_on() so that we can
 > catch it synchronously when it happens.

I dumped work->func, and found it pointed to smc_close_sock_put_work

<smc maintainer cc'd>

	Dave
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: __queue_work oops.
  2017-02-27 18:39 ` Tejun Heo
  2017-02-27 19:57   ` Dave Jones
@ 2017-03-06 20:39   ` Tejun Heo
  1 sibling, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2017-03-06 20:39 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel

Hello,

Applied the following to wq/for-4.11-fixes.

Thanks.

------ 8< ------
>From 637fdbae60d6cb9f6e963c1079d7e0445c86ff7d Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Mon, 6 Mar 2017 15:33:42 -0500
Subject: [PATCH] workqueue: trigger WARN if queue_delayed_work() is called
 with NULL @wq

If queue_delayed_work() gets called with NULL @wq, the kernel will
oops asynchronuosly on timer expiration which isn't too helpful in
tracking down the offender.  This actually happened with smc.

__queue_delayed_work() already does several input sanity checks
synchronously.  Add NULL @wq check.

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 072cbc9..c0168b7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1507,6 +1507,7 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 	struct timer_list *timer = &dwork->timer;
 	struct work_struct *work = &dwork->work;
 
+	WARN_ON_ONCE(!wq);
 	WARN_ON_ONCE(timer->function != delayed_work_timer_fn ||
 		     timer->data != (unsigned long)dwork);
 	WARN_ON_ONCE(timer_pending(timer));
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-03-06 21:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-27 17:14 __queue_work oops Dave Jones
2017-02-27 18:39 ` Tejun Heo
2017-02-27 19:57   ` Dave Jones
2017-03-06 20:39   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).