From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751438AbcEIOSM (ORCPT ); Mon, 9 May 2016 10:18:12 -0400 Received: from bh-25.webhostbox.net ([208.91.199.152]:35446 "EHLO bh-25.webhostbox.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751304AbcEIOSK (ORCPT ); Mon, 9 May 2016 10:18:10 -0400 Subject: Re: [PATCH] watchdog: core: Fix circular locking dependency To: Clemens Gruber , Wim Van Sebroeck References: <1461249494-24924-1-git-send-email-linux@roeck-us.net> <20160509135303.GA1474@archie.localdomain> Cc: linux-watchdog@vger.kernel.org, linux-kernel@vger.kernel.org From: Guenter Roeck Message-ID: <57309C21.9070201@roeck-us.net> Date: Mon, 9 May 2016 07:18:09 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <20160509135303.GA1474@archie.localdomain> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated_sender: linux@roeck-us.net X-OutGoing-Spam-Status: No, score=-1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - bh-25.webhostbox.net X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - roeck-us.net X-Get-Message-Sender-Via: bh-25.webhostbox.net: authenticated_id: linux@roeck-us.net X-Authenticated-Sender: bh-25.webhostbox.net: linux@roeck-us.net X-Source: X-Source-Args: X-Source-Dir: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/09/2016 06:53 AM, Clemens Gruber wrote: > On Thu, Apr 21, 2016 at 07:38:14AM -0700, Guenter Roeck wrote: >> lockdep reports the following circular locking dependency. >> >> ====================================================== >> INFO: possible circular locking dependency detected ] >> 4.6.0-rc3-00191-gfabf418 #162 Not tainted >> ------------------------------------------------------- >> systemd/1 is trying to acquire lock: >> ((&(&wd_data->work)->work)){+.+...}, at: [<80141650>] flush_work+0x0/0x280 >> >> but task is already holding lock: >> >> (&wd_data->lock){+.+...}, at: [<804acfa8>] watchdog_release+0x18/0x190 >> >> which lock already depends on the new lock. >> the existing dependency chain (in reverse order) is: >> >> -> #1 (&wd_data->lock){+.+...}: >> [<80662310>] mutex_lock_nested+0x64/0x4a8 >> [<804aca4c>] watchdog_ping_work+0x18/0x4c >> [<80143128>] process_one_work+0x1ac/0x500 >> [<801434b4>] worker_thread+0x38/0x554 >> [<80149510>] kthread+0xf4/0x108 >> [<80107c10>] ret_from_fork+0x14/0x24 >> >> -> #0 ((&(&wd_data->work)->work)){+.+...}: >> [<8017c4e8>] lock_acquire+0x70/0x90 >> [<8014169c>] flush_work+0x4c/0x280 >> [<801440f8>] __cancel_work_timer+0x9c/0x1e0 >> [<804acfcc>] watchdog_release+0x3c/0x190 >> [<8022c5e8>] __fput+0x80/0x1c8 >> [<80147b28>] task_work_run+0x94/0xc8 >> [<8010b998>] do_work_pending+0x8c/0xb4 >> [<80107ba8>] slow_work_pending+0xc/0x20 >> >> other info that might help us debug this: >> Possible unsafe locking scenario: >> >> CPU0 CPU1 >> ---- ---- >> lock(&wd_data->lock); >> lock((&(&wd_data->work)->work)); >> lock(&wd_data->lock); >> lock((&(&wd_data->work)->work)); >> >> *** DEADLOCK *** >> >> 1 lock held by systemd/1: >> >> stack backtrace: >> CPU: 2 PID: 1 Comm: systemd Not tainted 4.6.0-rc3-00191-gfabf418 #162 >> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) >> [<8010f5e4>] (unwind_backtrace) from [<8010c038>] (show_stack+0x10/0x14) >> [<8010c038>] (show_stack) from [<8039d7fc>] (dump_stack+0xa8/0xd4) >> [<8039d7fc>] (dump_stack) from [<80177ee0>] (print_circular_bug+0x214/0x334) >> [<80177ee0>] (print_circular_bug) from [<80179230>] (check_prevs_add+0x4dc/0x8e8) >> [<80179230>] (check_prevs_add) from [<8017b3d8>] (__lock_acquire+0xc6c/0x14ec) >> [<8017b3d8>] (__lock_acquire) from [<8017c4e8>] (lock_acquire+0x70/0x90) >> [<8017c4e8>] (lock_acquire) from [<8014169c>] (flush_work+0x4c/0x280) >> [<8014169c>] (flush_work) from [<801440f8>] (__cancel_work_timer+0x9c/0x1e0) >> [<801440f8>] (__cancel_work_timer) from [<804acfcc>] (watchdog_release+0x3c/0x190) >> [<804acfcc>] (watchdog_release) from [<8022c5e8>] (__fput+0x80/0x1c8) >> [<8022c5e8>] (__fput) from [<80147b28>] (task_work_run+0x94/0xc8) >> [<80147b28>] (task_work_run) from [<8010b998>] (do_work_pending+0x8c/0xb4) >> [<8010b998>] (do_work_pending) from [<80107ba8>] (slow_work_pending+0xc/0x20) >> >> Turns out the call to cancel_delayed_work_sync() in watchdog_release() >> is not necessary and can be dropped. If the worker is no longer necessary, >> the subsequent call to watchdog_update_worker() will cancel it. If it is >> already running, it won't do anything, since the worker function checks >> if it needs to ping the watchdog or not. >> >> Reported-by: Clemens Gruber >> Tested-by: Clemens Gruber >> Fixes: 11d7aba9ceb7 ("watchdog: imx2: Convert to use infrastructure triggered keepalives") >> Signed-off-by: Guenter Roeck >> --- >> drivers/watchdog/watchdog_dev.c | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c >> index e2c5abbb45ff..3595cffa24ea 100644 >> --- a/drivers/watchdog/watchdog_dev.c >> +++ b/drivers/watchdog/watchdog_dev.c >> @@ -736,7 +736,6 @@ static int watchdog_release(struct inode *inode, struct file *file) >> watchdog_ping(wdd); >> } >> >> - cancel_delayed_work_sync(&wd_data->work); >> watchdog_update_worker(wdd); >> >> /* make sure that /dev/watchdog can be re-opened */ >> -- >> 2.5.0 >> > > Hi, > > I don't see this patch in the torvalds/linux tree. > > Will this get in before the 4.6 release? > Hopefully. If not, I assume Wim will pick it up in the next commit window and it will be applied to -stable. Guenter