From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756381AbbCCNVX (ORCPT ); Tue, 3 Mar 2015 08:21:23 -0500 Received: from mail-qg0-f51.google.com ([209.85.192.51]:59414 "EHLO mail-qg0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754682AbbCCNVV (ORCPT ); Tue, 3 Mar 2015 08:21:21 -0500 Date: Tue, 3 Mar 2015 08:21:15 -0500 From: Tejun Heo To: Tomeu Vizoso Cc: Jesper Nilsson , Rabin Vincent , Jesper Nilsson , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE Message-ID: <20150303132115.GB3122@htj.duckdns.org> References: <20150206171156.GA8942@axis.com> <20150209161527.GH3220@htj.duckdns.org> <20150302122615.GE11399@axis.com> <20150302162144.GF17694@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Tomeu. On Tue, Mar 03, 2015 at 11:00:43AM +0100, Tomeu Vizoso wrote: ... > [ 7.421803] PC is at wake_bit_function+0x18/0x6c > [ 7.428168] LR is at __wake_up_common+0x5c/0x90 ... > [ 7.673183] [] (wake_bit_function) from [] (__wake_up_common+0x5c/0x90) > [ 7.683300] [] (__wake_up_common) from [] (__wake_up+0x48/0x5c) > [ 7.692744] [] (__wake_up) from [] (__cancel_work_timer+0xe8/0x1ac) > [ 7.702533] [] (__cancel_work_timer) from [] (cancel_work_sync+0x1c/0x20) > [ 7.712854] [] (cancel_work_sync) from [] (css_free_work_fn+0x174/0x2ec) > [ 7.723099] [] (css_free_work_fn) from [] (process_one_work+0x15c/0x3dc) > [ 7.733339] [] (process_one_work) from [] (worker_thread+0x54/0x4e8) > [ 7.743224] [] (worker_thread) from [] (kthread+0xec/0x104) > [ 7.752339] [] (kthread) from [] (ret_from_fork+0x14/0x34) > [ 7.761366] Code: e24cb004 e52de004 e8bd4000 e510400c (e5935000) Hah, weird. How is your machine ending up in wake_bit_function() from there? Can you please apply the following patch and report the dmesg after crash? Thanks. --- kernel/workqueue.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2755,8 +2755,12 @@ static bool __cancel_work_timer(struct w if (unlikely(ret == -ENOENT)) { DEFINE_WAIT(wait); prepare_to_wait(waitq, &wait, TASK_UNINTERRUPTIBLE); - if (work_is_canceling(work)) + WARN_ON_ONCE(wait.func != autoremove_wake_function); + if (work_is_canceling(work)) { + printk("XXX __cancel_work_timer: %d sleeping w/ %pf, wait=%p\n", + task_pid_nr(current), wait.func, &wait); schedule(); + } finish_wait(waitq, &wait); } } while (unlikely(ret < 0)); @@ -2774,8 +2778,16 @@ static bool __cancel_work_timer(struct w * visible there. */ smp_mb(); - if (waitqueue_active(waitq)) + if (waitqueue_active(waitq)) { + wait_queue_t *cur; + + spin_lock_irq(&waitq->lock); + list_for_each_entry(cur, &waitq->task_list, task_list) + printk("XXX __cancel_work_timer: waking up %pf, wait=%p\n", + cur->func, cur); + spin_unlock_irq(&waitq->lock); wake_up(waitq); + } return ret; }