From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751826AbdB0WJm (ORCPT ); Mon, 27 Feb 2017 17:09:42 -0500 Received: from scorn.kernelslacker.org ([45.56.101.199]:37888 "EHLO scorn.kernelslacker.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751612AbdB0WIs (ORCPT ); Mon, 27 Feb 2017 17:08:48 -0500 Date: Mon, 27 Feb 2017 14:57:56 -0500 From: Dave Jones To: Tejun Heo Cc: Linux Kernel , Ursula Braun Subject: Re: __queue_work oops. Message-ID: <20170227195756.apbhfr2lcygpevaj@codemonkey.org.uk> Mail-Followup-To: Dave Jones , Tejun Heo , Linux Kernel , Ursula Braun References: <20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk> <20170227183925.GA8707@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170227183925.GA8707@htj.duckdns.org> User-Agent: NeoMutt/20170113 (1.7.2) X-Spam-Score: -1.1 (-) X-Spam-Report: Spam detection software, running on the system "scorn.kernelslacker.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On Mon, Feb 27, 2017 at 01:39:25PM -0500, Tejun Heo wrote: > On Mon, Feb 27, 2017 at 12:14:39PM -0500, Dave Jones wrote: > > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-think+ #9 > > task: ffff88017f105440 task.stack: ffffc90000094000 > > RIP: 0010:__queue_work+0x2d/0x700 > > RSP: 0018:ffff880507c03df8 EFLAGS: 00010046 > > RAX: 0000000000000082 RBX: 0000000000000101 RCX: 0000000000000002 > > RDX: ffff88047bf07c98 RSI: 0000000000000000 RDI: 0000000000000000 > > RBP: ffff880507c03e30 R08: 0000000000000001 R09: ffffffff8294bf68 > > R10: ffff880507c03e58 R11: 0000000000000000 R12: ffff88047bf07ce8 > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047bf07c98 > > FS: 0000000000000000(0000) GS:ffff880507c00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00000000000001c2 CR3: 0000000004e11000 CR4: 00000000001406e0 > > Call Trace: > > > > ? work_on_cpu+0xb0/0xb0 > > delayed_work_timer_fn+0x1e/0x20 > > call_timer_fn+0xbd/0x480 > ... > > Code starting with the faulting instruction > > =========================================== > > 0: 41 f6 85 c2 01 00 00 testb $0x1,0x1c2(%r13) > > 7: 01 > > 8: 0f 85 22 04 00 00 jne 0x430 > > e: 49 rex.WB > > f: bc eb 83 b5 80 mov $0x80b583eb,%esp > > 14: 46 rex.RX > > > > 0000000000003cf0 <__queue_work>: > > { > > 3cf0: e8 00 00 00 00 callq 3cf5 <__queue_work+0x5> > > 3cf5: 55 push %rbp > > 3cf6: 48 89 e5 mov %rsp,%rbp > > 3cf9: 41 57 push %r15 > > 3cfb: 49 89 d7 mov %rdx,%r15 > > 3cfe: 41 56 push %r14 > > unsigned int req_cpu = cpu; > > 3d00: 41 89 fe mov %edi,%r14d > > { > > 3d03: 41 55 push %r13 > > 3d05: 49 89 f5 mov %rsi,%r13 > > 3d08: 41 54 push %r12 > > 3d0a: 53 push %rbx > > 3d0b: 48 83 ec 10 sub $0x10,%rsp > > 3d0f: 89 7d d4 mov %edi,-0x2c(%rbp) > > asm volatile("# __raw_save_flags\n\t" > > 3d12: 9c pushfq > > 3d13: 58 pop %rax > > WARN_ON_ONCE(!irqs_disabled()); > > 3d14: f6 c4 02 test $0x2,%ah > > 3d17: 0f 85 06 04 00 00 jne 4123 <__queue_work+0x433> > > if (unlikely(wq->flags & __WQ_DRAINING) && > > 3d1d: 41 f6 85 c2 01 [...] Content analysis details: (-1.1 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 27, 2017 at 01:39:25PM -0500, Tejun Heo wrote: > On Mon, Feb 27, 2017 at 12:14:39PM -0500, Dave Jones wrote: > > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-think+ #9 > > task: ffff88017f105440 task.stack: ffffc90000094000 > > RIP: 0010:__queue_work+0x2d/0x700 > > RSP: 0018:ffff880507c03df8 EFLAGS: 00010046 > > RAX: 0000000000000082 RBX: 0000000000000101 RCX: 0000000000000002 > > RDX: ffff88047bf07c98 RSI: 0000000000000000 RDI: 0000000000000000 > > RBP: ffff880507c03e30 R08: 0000000000000001 R09: ffffffff8294bf68 > > R10: ffff880507c03e58 R11: 0000000000000000 R12: ffff88047bf07ce8 > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047bf07c98 > > FS: 0000000000000000(0000) GS:ffff880507c00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00000000000001c2 CR3: 0000000004e11000 CR4: 00000000001406e0 > > Call Trace: > > > > ? work_on_cpu+0xb0/0xb0 > > delayed_work_timer_fn+0x1e/0x20 > > call_timer_fn+0xbd/0x480 > ... > > Code starting with the faulting instruction > > =========================================== > > 0: 41 f6 85 c2 01 00 00 testb $0x1,0x1c2(%r13) > > 7: 01 > > 8: 0f 85 22 04 00 00 jne 0x430 > > e: 49 rex.WB > > f: bc eb 83 b5 80 mov $0x80b583eb,%esp > > 14: 46 rex.RX > > > > 0000000000003cf0 <__queue_work>: > > { > > 3cf0: e8 00 00 00 00 callq 3cf5 <__queue_work+0x5> > > 3cf5: 55 push %rbp > > 3cf6: 48 89 e5 mov %rsp,%rbp > > 3cf9: 41 57 push %r15 > > 3cfb: 49 89 d7 mov %rdx,%r15 > > 3cfe: 41 56 push %r14 > > unsigned int req_cpu = cpu; > > 3d00: 41 89 fe mov %edi,%r14d > > { > > 3d03: 41 55 push %r13 > > 3d05: 49 89 f5 mov %rsi,%r13 > > 3d08: 41 54 push %r12 > > 3d0a: 53 push %rbx > > 3d0b: 48 83 ec 10 sub $0x10,%rsp > > 3d0f: 89 7d d4 mov %edi,-0x2c(%rbp) > > asm volatile("# __raw_save_flags\n\t" > > 3d12: 9c pushfq > > 3d13: 58 pop %rax > > WARN_ON_ONCE(!irqs_disabled()); > > 3d14: f6 c4 02 test $0x2,%ah > > 3d17: 0f 85 06 04 00 00 jne 4123 <__queue_work+0x433> > > if (unlikely(wq->flags & __WQ_DRAINING) && > > 3d1d: 41 f6 85 c2 01 00 00 testb $0x1,0x1c2(%r13) > > > > > > So we called __queue_work with a null wq. > > So, that's somebody calling queue_delayed_work[_on]() with a NULL wq > and when the timeout expires the timer callback trying to queue > against NULL. Hmm... the work function would be able to tell us who > queued it but it isn't part of the information dumped here (would be > 0x18(%rdx)). > > I'll add a sanity check on queue_delayed_work_on() so that we can > catch it synchronously when it happens. I dumped work->func, and found it pointed to smc_close_sock_put_work Dave