From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754297Ab2LYWC6 (ORCPT ); Tue, 25 Dec 2012 17:02:58 -0500 Received: from Chamillionaire.breakpoint.cc ([80.244.247.6]:42515 "EHLO Chamillionaire.breakpoint.cc" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754216Ab2LYWCz (ORCPT ); Tue, 25 Dec 2012 17:02:55 -0500 Date: Tue, 25 Dec 2012 23:02:48 +0100 From: Sebastian Andrzej Siewior To: Greg Kroah-Hartman Cc: Sebastian Andrzej Siewior , Peter Hurley , Alan Cox , linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org Subject: [PATCH v3] tty: don't deadlock while flushing workqueue Message-ID: <20121225220248.GA5130@kibibi> References: <1353501542-14707-1-git-send-email-bigeasy@linutronix.de> <20121121140426.1860d093@pyramind.ukuu.org.uk> <20121127095357.GA3536@breakpoint.cc> <20121127172249.GB24592@kroah.com> <20121127180108.GA7376@linutronix.de> <1354556465.2531.169.camel@thor> <50BF732C.6030306@linutronix.de> <20121205171140.03b6ca06@pyramind.ukuu.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20121205171140.03b6ca06@pyramind.ukuu.org.uk> X-Key-Id: 97C4700B X-Key-Fingerprint: 09E2 D1F3 9A3A FF13 C3D3 961C 0688 1C1E 97C4 700B User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sebastian Andrzej Siewior Since commit 89c8d91e31f2 ("tty: localise the lock") I see a dead lock in one of my dummy_hcd + g_nokia test cases. The first run was usually okay, the second often resulted in a splat by lockdep and the third was usually a dead lock. Lockdep complained about tty->hangup_work and tty->legacy_mutex taken both ways: | ====================================================== | [ INFO: possible circular locking dependency detected ] | 3.7.0-rc6+ #204 Not tainted | ------------------------------------------------------- | kworker/2:1/35 is trying to acquire lock: | (&tty->legacy_mutex){+.+.+.}, at: [] tty_lock_nested+0x36/0x80 | | but task is already holding lock: | ((&tty->hangup_work)){+.+...}, at: [] process_one_work+0x124/0x5e0 | | which lock already depends on the new lock. | | the existing dependency chain (in reverse order) is: | | -> #2 ((&tty->hangup_work)){+.+...}: | [] lock_acquire+0x84/0x190 | [] flush_work+0x3d/0x240 | [] tty_ldisc_flush_works+0x16/0x30 | [] tty_ldisc_release+0x21/0x70 | [] tty_release+0x35c/0x470 | [] __fput+0xd8/0x270 | [] ____fput+0xd/0x10 | [] task_work_run+0xb9/0xf0 | [] do_notify_resume+0x51/0x80 | [] work_notifysig+0x35/0x3b | | -> #1 (&tty->legacy_mutex/1){+.+...}: | [] lock_acquire+0x84/0x190 | [] mutex_lock_nested+0x6c/0x2f0 | [] tty_lock_nested+0x36/0x80 | [] tty_lock_pair+0x29/0x70 | [] tty_release+0x118/0x470 | [] __fput+0xd8/0x270 | [] ____fput+0xd/0x10 | [] task_work_run+0xb9/0xf0 | [] do_notify_resume+0x51/0x80 | [] work_notifysig+0x35/0x3b | | -> #0 (&tty->legacy_mutex){+.+.+.}: | [] __lock_acquire+0x1189/0x16a0 | [] lock_acquire+0x84/0x190 | [] mutex_lock_nested+0x6c/0x2f0 | [] tty_lock_nested+0x36/0x80 | [] tty_lock+0xf/0x20 | [] __tty_hangup+0x54/0x410 | [] do_tty_hangup+0x12/0x20 | [] process_one_work+0x1a3/0x5e0 | [] worker_thread+0x119/0x3a0 | [] kthread+0x94/0xa0 | [] ret_from_kernel_thread+0x1b/0x28 | |other info that might help us debug this: | |Chain exists of: | &tty->legacy_mutex --> &tty->legacy_mutex/1 --> (&tty->hangup_work) | | Possible unsafe locking scenario: | | CPU0 CPU1 | ---- ---- | lock((&tty->hangup_work)); | lock(&tty->legacy_mutex/1); | lock((&tty->hangup_work)); | lock(&tty->legacy_mutex); | | *** DEADLOCK *** Before the path mentioned tty_ldisc_release() look like this: | tty_ldisc_halt(tty); | tty_ldisc_flush_works(tty); | tty_lock(); As it can be seen, it first flushes the workqueue and then grabs the tty_lock. Now we grab the lock first: | tty_lock_pair(tty, o_tty); | tty_ldisc_halt(tty); | tty_ldisc_flush_works(tty); so lockdep's complaint seems valid. The earlier version of this patch took the ldisc_mutex since the other user of tty_ldisc_flush_works() (tty_set_ldisc()) did this. Peter Hurley then said that it is should not be requried. Since it wasn't done earlier, I dropped this part. The code under tty_ldisc_kill() was executed earlier with the tty lock taken so it is taken again. I was able to reproduce the deadlock on v3.8-rc1, this patch fixes the problem in my testcase. I didn't notice any problems so far. Cc: Alan Cox Cc: Peter Hurley Signed-off-by: Sebastian Andrzej Siewior --- drivers/tty/tty_ldisc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c index c578229..78f1be2 100644 --- a/drivers/tty/tty_ldisc.c +++ b/drivers/tty/tty_ldisc.c @@ -934,17 +934,17 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty) * race with the set_ldisc code path. */ - tty_lock_pair(tty, o_tty); tty_ldisc_halt(tty); - tty_ldisc_flush_works(tty); - if (o_tty) { + if (o_tty) tty_ldisc_halt(o_tty); + + tty_ldisc_flush_works(tty); + if (o_tty) tty_ldisc_flush_works(o_tty); - } + tty_lock_pair(tty, o_tty); /* This will need doing differently if we need to lock */ tty_ldisc_kill(tty); - if (o_tty) tty_ldisc_kill(o_tty); -- 1.7.10.4