linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tty: don't dead while flushing workqueue
@ 2012-11-21 12:39 Sebastian Andrzej Siewior
  2012-11-21 14:04 ` Alan Cox
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-11-21 12:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-usb, Sebastian Andrzej Siewior, Alan Cox

Since commit 89c8d91e31f2 ("tty: localise the lock") I see a dead lock
in one of my dummy_hcd + g_nokia test cases. The first run one was usually
okay, the second often resulted in a splat by lockdep and the third was
usually a dead lock.
Lockdep complained about tty->hangup_work and tty->legacy_mutex taken
both ways:
| ======================================================
| [ INFO: possible circular locking dependency detected ]
| 3.7.0-rc6+ #204 Not tainted
| -------------------------------------------------------
| kworker/2:1/35 is trying to acquire lock:
|  (&tty->legacy_mutex){+.+.+.}, at: [<c14051e6>] tty_lock_nested+0x36/0x80
|
| but task is already holding lock:
|  ((&tty->hangup_work)){+.+...}, at: [<c104f6e4>] process_one_work+0x124/0x5e0
|
| which lock already depends on the new lock.
|
| the existing dependency chain (in reverse order) is:
|
| -> #2 ((&tty->hangup_work)){+.+...}:
|        [<c107fe74>] lock_acquire+0x84/0x190
|        [<c104d82d>] flush_work+0x3d/0x240
|        [<c12e6986>] tty_ldisc_flush_works+0x16/0x30
|        [<c12e7861>] tty_ldisc_release+0x21/0x70
|        [<c12e0dfc>] tty_release+0x35c/0x470
|        [<c1105e28>] __fput+0xd8/0x270
|        [<c1105fcd>] ____fput+0xd/0x10
|        [<c1051dd9>] task_work_run+0xb9/0xf0
|        [<c1002a51>] do_notify_resume+0x51/0x80
|        [<c140550a>] work_notifysig+0x35/0x3b
|
| -> #1 (&tty->legacy_mutex/1){+.+...}:
|        [<c107fe74>] lock_acquire+0x84/0x190
|        [<c140276c>] mutex_lock_nested+0x6c/0x2f0
|        [<c14051e6>] tty_lock_nested+0x36/0x80
|        [<c1405279>] tty_lock_pair+0x29/0x70
|        [<c12e0bb8>] tty_release+0x118/0x470
|        [<c1105e28>] __fput+0xd8/0x270
|        [<c1105fcd>] ____fput+0xd/0x10
|        [<c1051dd9>] task_work_run+0xb9/0xf0
|        [<c1002a51>] do_notify_resume+0x51/0x80
|        [<c140550a>] work_notifysig+0x35/0x3b
|
| -> #0 (&tty->legacy_mutex){+.+.+.}:
|        [<c107f3c9>] __lock_acquire+0x1189/0x16a0
|        [<c107fe74>] lock_acquire+0x84/0x190
|        [<c140276c>] mutex_lock_nested+0x6c/0x2f0
|        [<c14051e6>] tty_lock_nested+0x36/0x80
|        [<c140523f>] tty_lock+0xf/0x20
|        [<c12df8e4>] __tty_hangup+0x54/0x410
|        [<c12dfcb2>] do_tty_hangup+0x12/0x20
|        [<c104f763>] process_one_work+0x1a3/0x5e0
|        [<c104fec9>] worker_thread+0x119/0x3a0
|        [<c1055084>] kthread+0x94/0xa0
|        [<c140ca37>] ret_from_kernel_thread+0x1b/0x28
|
|other info that might help us debug this:
|
|Chain exists of:
|  &tty->legacy_mutex --> &tty->legacy_mutex/1 --> (&tty->hangup_work)
|
| Possible unsafe locking scenario:
|
|       CPU0                    CPU1
|       ----                    ----
|  lock((&tty->hangup_work));
|                               lock(&tty->legacy_mutex/1);
|                               lock((&tty->hangup_work));
|  lock(&tty->legacy_mutex);
|
| *** DEADLOCK ***

Before the path mentioned tty_ldisc_release() look like this:

|	tty_ldisc_halt(tty);
|	tty_ldisc_flush_works(tty);
|	tty_lock();

As it can be seen, it first flushes the workqueue and then grabs the
tty_lock. Now we grab the lock first:

|	tty_lock_pair(tty, o_tty);
|	tty_ldisc_halt(tty);
|	tty_ldisc_flush_works(tty);

so lockdep's complaint seems valid.

The other user of tty_ldisc_flush_works() is tty_set_ldisc() and I tried
to mimnic its logic:
- grab tty lock
- grab ldisc_mutex lock
- release the tty lock
- call tty_ldisc_halt()
- release ldisc_mutex
- call tty_ldisc_flush_works()
The code under tty_ldisc_kill() was executed earlier with the tty lock
taken so it is taken again.

I don't see any problems in my testcase.

Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/tty/tty_ldisc.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 0f2a2c5..fb76818 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -930,16 +930,21 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)
 	 */
 
 	tty_lock_pair(tty, o_tty);
+	mutex_lock(&tty->ldisc_mutex);
+	tty_unlock_pair(tty, o_tty);
+
 	tty_ldisc_halt(tty);
-	tty_ldisc_flush_works(tty);
-	if (o_tty) {
+	if (o_tty)
 		tty_ldisc_halt(o_tty);
+	mutex_unlock(&tty->ldisc_mutex);
+
+	tty_ldisc_flush_works(tty);
+	if (o_tty)
 		tty_ldisc_flush_works(o_tty);
-	}
 
+	tty_lock_pair(tty, o_tty);
 	/* This will need doing differently if we need to lock */
 	tty_ldisc_kill(tty);
-
 	if (o_tty)
 		tty_ldisc_kill(o_tty);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] tty: don't dead while flushing workqueue
  2012-11-21 12:39 [PATCH] tty: don't dead while flushing workqueue Sebastian Andrzej Siewior
@ 2012-11-21 14:04 ` Alan Cox
  2012-11-27  9:53   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Cox @ 2012-11-21 14:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Greg Kroah-Hartman, linux-kernel, linux-usb, Alan Cox

> I don't see any problems in my testcase.

This looks fine to me as by the time we call tty_ldisc_release we have
already set TTY_CLOSING on both sides.

Alan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] tty: don't dead while flushing workqueue
  2012-11-21 14:04 ` Alan Cox
@ 2012-11-27  9:53   ` Sebastian Andrzej Siewior
  2012-11-27 17:22     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-11-27  9:53 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sebastian Andrzej Siewior, Alan Cox, linux-kernel, linux-usb, Alan Cox

On Wed, Nov 21, 2012 at 02:04:26PM +0000, Alan Cox wrote:
> > I don't see any problems in my testcase.
> 
> This looks fine to me as by the time we call tty_ldisc_release we have
> already set TTY_CLOSING on both sides.

Greg, can you push this into v3.7? This regression has been introduced in
v3.7-rc1. If you don't consider it as this important since I'm only one
complaining, could you please add a stable tag once you apply it unless you
want me resend it with a stable tag.

> Alan

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] tty: don't dead while flushing workqueue
  2012-11-27  9:53   ` Sebastian Andrzej Siewior
@ 2012-11-27 17:22     ` Greg Kroah-Hartman
  2012-11-27 18:01       ` [PATCH RESEND] tty: don't dead lock " Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2012-11-27 17:22 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Sebastian Andrzej Siewior, Alan Cox, linux-kernel, linux-usb, Alan Cox

On Tue, Nov 27, 2012 at 10:53:57AM +0100, Sebastian Andrzej Siewior wrote:
> On Wed, Nov 21, 2012 at 02:04:26PM +0000, Alan Cox wrote:
> > > I don't see any problems in my testcase.
> > 
> > This looks fine to me as by the time we call tty_ldisc_release we have
> > already set TTY_CLOSING on both sides.
> 
> Greg, can you push this into v3.7? This regression has been introduced in
> v3.7-rc1. If you don't consider it as this important since I'm only one
> complaining, could you please add a stable tag once you apply it unless you
> want me resend it with a stable tag.

I don't see this patch anywhere in my queue, or in the tty-next tree, so
someone is going to have to resend it please.

And yes, it's a bit too late for 3.7, but I don't have an issue with
merging it for 3.8-rc1 and tagging it for 3.7-stable.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH RESEND] tty: don't dead lock while flushing workqueue
  2012-11-27 17:22     ` Greg Kroah-Hartman
@ 2012-11-27 18:01       ` Sebastian Andrzej Siewior
  2012-11-30 17:09         ` Sebastian Andrzej Siewior
  2012-12-03 17:41         ` Peter Hurley
  0 siblings, 2 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-11-27 18:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Alan Cox, linux-kernel, linux-usb, Alan Cox

Since commit 89c8d91e31f2 ("tty: localise the lock") I see a dead lock
in one of my dummy_hcd + g_nokia test cases. The first run one was usually
okay, the second often resulted in a splat by lockdep and the third was
usually a dead lock.
Lockdep complained about tty->hangup_work and tty->legacy_mutex taken
both ways:
| ======================================================
| [ INFO: possible circular locking dependency detected ]
| 3.7.0-rc6+ #204 Not tainted
| -------------------------------------------------------
| kworker/2:1/35 is trying to acquire lock:
|  (&tty->legacy_mutex){+.+.+.}, at: [<c14051e6>] tty_lock_nested+0x36/0x80
|
| but task is already holding lock:
|  ((&tty->hangup_work)){+.+...}, at: [<c104f6e4>] process_one_work+0x124/0x5e0
|
| which lock already depends on the new lock.
|
| the existing dependency chain (in reverse order) is:
|
| -> #2 ((&tty->hangup_work)){+.+...}:
|        [<c107fe74>] lock_acquire+0x84/0x190
|        [<c104d82d>] flush_work+0x3d/0x240
|        [<c12e6986>] tty_ldisc_flush_works+0x16/0x30
|        [<c12e7861>] tty_ldisc_release+0x21/0x70
|        [<c12e0dfc>] tty_release+0x35c/0x470
|        [<c1105e28>] __fput+0xd8/0x270
|        [<c1105fcd>] ____fput+0xd/0x10
|        [<c1051dd9>] task_work_run+0xb9/0xf0
|        [<c1002a51>] do_notify_resume+0x51/0x80
|        [<c140550a>] work_notifysig+0x35/0x3b
|
| -> #1 (&tty->legacy_mutex/1){+.+...}:
|        [<c107fe74>] lock_acquire+0x84/0x190
|        [<c140276c>] mutex_lock_nested+0x6c/0x2f0
|        [<c14051e6>] tty_lock_nested+0x36/0x80
|        [<c1405279>] tty_lock_pair+0x29/0x70
|        [<c12e0bb8>] tty_release+0x118/0x470
|        [<c1105e28>] __fput+0xd8/0x270
|        [<c1105fcd>] ____fput+0xd/0x10
|        [<c1051dd9>] task_work_run+0xb9/0xf0
|        [<c1002a51>] do_notify_resume+0x51/0x80
|        [<c140550a>] work_notifysig+0x35/0x3b
|
| -> #0 (&tty->legacy_mutex){+.+.+.}:
|        [<c107f3c9>] __lock_acquire+0x1189/0x16a0
|        [<c107fe74>] lock_acquire+0x84/0x190
|        [<c140276c>] mutex_lock_nested+0x6c/0x2f0
|        [<c14051e6>] tty_lock_nested+0x36/0x80
|        [<c140523f>] tty_lock+0xf/0x20
|        [<c12df8e4>] __tty_hangup+0x54/0x410
|        [<c12dfcb2>] do_tty_hangup+0x12/0x20
|        [<c104f763>] process_one_work+0x1a3/0x5e0
|        [<c104fec9>] worker_thread+0x119/0x3a0
|        [<c1055084>] kthread+0x94/0xa0
|        [<c140ca37>] ret_from_kernel_thread+0x1b/0x28
|
|other info that might help us debug this:
|
|Chain exists of:
|  &tty->legacy_mutex --> &tty->legacy_mutex/1 --> (&tty->hangup_work)
|
| Possible unsafe locking scenario:
|
|       CPU0                    CPU1
|       ----                    ----
|  lock((&tty->hangup_work));
|                               lock(&tty->legacy_mutex/1);
|                               lock((&tty->hangup_work));
|  lock(&tty->legacy_mutex);
|
| *** DEADLOCK ***

Before the path mentioned tty_ldisc_release() look like this:

|	tty_ldisc_halt(tty);
|	tty_ldisc_flush_works(tty);
|	tty_lock();

As it can be seen, it first flushes the workqueue and then grabs the
tty_lock. Now we grab the lock first:

|	tty_lock_pair(tty, o_tty);
|	tty_ldisc_halt(tty);
|	tty_ldisc_flush_works(tty);

so lockdep's complaint seems valid.

The other user of tty_ldisc_flush_works() is tty_set_ldisc() and I tried
to mimnic its logic:
- grab tty lock
- grab ldisc_mutex lock
- release the tty lock
- call tty_ldisc_halt()
- release ldisc_mutex
- call tty_ldisc_flush_works()
The code under tty_ldisc_kill() was executed earlier with the tty lock
taken so it is taken again.

I don't see any problems in my testcase.

Cc: stable@vger.kernel.org #v3.7
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Greg, here is the resend. I added Acked-By Alan Cox because he wrote

|This looks fine to me as by the time we call tty_ldisc_release we have
|already set TTY_CLOSING on both sides.

See http://lkml.org/lkml/2012/11/21/347

 drivers/tty/tty_ldisc.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 0f2a2c5..fb76818 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -930,16 +930,21 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)
 	 */
 
 	tty_lock_pair(tty, o_tty);
+	mutex_lock(&tty->ldisc_mutex);
+	tty_unlock_pair(tty, o_tty);
+
 	tty_ldisc_halt(tty);
-	tty_ldisc_flush_works(tty);
-	if (o_tty) {
+	if (o_tty)
 		tty_ldisc_halt(o_tty);
+	mutex_unlock(&tty->ldisc_mutex);
+
+	tty_ldisc_flush_works(tty);
+	if (o_tty)
 		tty_ldisc_flush_works(o_tty);
-	}
 
+	tty_lock_pair(tty, o_tty);
 	/* This will need doing differently if we need to lock */
 	tty_ldisc_kill(tty);
-
 	if (o_tty)
 		tty_ldisc_kill(o_tty);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND] tty: don't dead lock while flushing workqueue
  2012-11-27 18:01       ` [PATCH RESEND] tty: don't dead lock " Sebastian Andrzej Siewior
@ 2012-11-30 17:09         ` Sebastian Andrzej Siewior
  2012-11-30 17:21           ` Greg Kroah-Hartman
  2012-12-03 17:41         ` Peter Hurley
  1 sibling, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-11-30 17:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Alan Cox, linux-kernel, linux-usb, Alan Cox

On Tue, Nov 27, 2012 at 07:01:08PM +0100, Sebastian Andrzej Siewior wrote:
> Since commit 89c8d91e31f2 ("tty: localise the lock") I see a dead lock
> in one of my dummy_hcd + g_nokia test cases. The first run one was usually
> okay, the second often resulted in a splat by lockdep and the third was
> usually a dead lock.

Ping. Can you feed this to your tty tree? :)

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND] tty: don't dead lock while flushing workqueue
  2012-11-30 17:09         ` Sebastian Andrzej Siewior
@ 2012-11-30 17:21           ` Greg Kroah-Hartman
  2012-11-30 18:11             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2012-11-30 17:21 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Alan Cox, linux-kernel, linux-usb, Alan Cox

On Fri, Nov 30, 2012 at 06:09:38PM +0100, Sebastian Andrzej Siewior wrote:
> On Tue, Nov 27, 2012 at 07:01:08PM +0100, Sebastian Andrzej Siewior wrote:
> > Since commit 89c8d91e31f2 ("tty: localise the lock") I see a dead lock
> > in one of my dummy_hcd + g_nokia test cases. The first run one was usually
> > okay, the second often resulted in a splat by lockdep and the third was
> > usually a dead lock.
> 
> Ping. Can you feed this to your tty tree? :)

It's really late in the release cycle, I would like to have this get
more testing in linux-next before I send it to Linus, so I was going to
wait until after 3.8-rc1 is out before doing it.

I'm doing the same thing for all tty/serial patches right now, so don't
feel like I'm picking on you :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND] tty: don't dead lock while flushing workqueue
  2012-11-30 17:21           ` Greg Kroah-Hartman
@ 2012-11-30 18:11             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-11-30 18:11 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Alan Cox, linux-kernel, linux-usb, Alan Cox

On Fri, Nov 30, 2012 at 09:21:43AM -0800, Greg Kroah-Hartman wrote:
> > Ping. Can you feed this to your tty tree? :)
> 
> It's really late in the release cycle, I would like to have this get
> more testing in linux-next before I send it to Linus, so I was going to
> wait until after 3.8-rc1 is out before doing it.

I assumed that you apply this to your tty-next tree so it appears in
linux-next. So now I wait until -rc2 is out and ping again if nothing
happens :)

> I'm doing the same thing for all tty/serial patches right now, so don't
> feel like I'm picking on you :)

Next time I look for a bug that annoys everyone :)
Nah. I saw some movement in tty-next so I though I ping you. But staging
stuff is probably a different category.

> 
> thanks,
> 
> greg k-h

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND] tty: don't dead lock while flushing workqueue
  2012-11-27 18:01       ` [PATCH RESEND] tty: don't dead lock " Sebastian Andrzej Siewior
  2012-11-30 17:09         ` Sebastian Andrzej Siewior
@ 2012-12-03 17:41         ` Peter Hurley
  2012-12-05 16:15           ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Hurley @ 2012-12-03 17:41 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Greg Kroah-Hartman, Alan Cox, linux-kernel, linux-usb, Alan Cox

On Tue, 2012-11-27 at 19:01 +0100, Sebastian Andrzej Siewior wrote:
> Since commit 89c8d91e31f2 ("tty: localise the lock") I see a dead lock
> in one of my dummy_hcd + g_nokia test cases. The first run one was usually
> okay, the second often resulted in a splat by lockdep and the third was
> usually a dead lock.
....
> 
> Before the path mentioned tty_ldisc_release() look like this:
> 
> |	tty_ldisc_halt(tty);
> |	tty_ldisc_flush_works(tty);
> |	tty_lock();
> 
> As it can be seen, it first flushes the workqueue and then grabs the
> tty_lock. Now we grab the lock first:
> 
> |	tty_lock_pair(tty, o_tty);
> |	tty_ldisc_halt(tty);
> |	tty_ldisc_flush_works(tty);
> 
> so lockdep's complaint seems valid.
> 
> The other user of tty_ldisc_flush_works() is tty_set_ldisc() and I tried
> to mimnic its logic:

The lock logic for tty_set_ldisc() is wrong. Despite existing code in
tty_set_ldisc() and tty_ldisc_hangup(), the ldisc_mutex does **not**
(and should not) play a role in acquiring or releasing ldisc references.
The only thing that needs to happen here is below (don't actually use
below because I just hand-edited it):

> See http://lkml.org/lkml/2012/11/21/347
> 
>  drivers/tty/tty_ldisc.c |   13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
> index 0f2a2c5..fb76818 100644
> --- a/drivers/tty/tty_ldisc.c
> +++ b/drivers/tty/tty_ldisc.c
> @@ -930,16 +930,21 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)
>  	 */
>  
> - 	tty_lock_pair(tty, o_tty);
>  	tty_ldisc_halt(tty);
> 	tty_ldisc_flush_works(tty);

 
> +	tty_lock_pair(tty, o_tty);
>  	/* This will need doing differently if we need to lock */
>  	tty_ldisc_kill(tty);
> -
>  	if (o_tty)
>  		tty_ldisc_kill(o_tty);
>  



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND] tty: don't dead lock while flushing workqueue
  2012-12-03 17:41         ` Peter Hurley
@ 2012-12-05 16:15           ` Sebastian Andrzej Siewior
  2012-12-05 17:11             ` Alan Cox
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-12-05 16:15 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Greg Kroah-Hartman, Alan Cox, linux-kernel, linux-usb, Alan Cox

On 12/03/2012 06:41 PM, Peter Hurley wrote:
> The lock logic for tty_set_ldisc() is wrong. Despite existing code in
> tty_set_ldisc() and tty_ldisc_hangup(), the ldisc_mutex does **not**
> (and should not) play a role in acquiring or releasing ldisc references.
> The only thing that needs to happen here is below (don't actually use
> below because I just hand-edited it):

Hmm. What about I stay in sync with the code that is already in tree
and if the wrong locking gets removed in both places later on?

Alan, what do you prefer?

>> See http://lkml.org/lkml/2012/11/21/347
>>
>>   drivers/tty/tty_ldisc.c |   13 +++++++++----
>>   1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
>> index 0f2a2c5..fb76818 100644
>> --- a/drivers/tty/tty_ldisc.c
>> +++ b/drivers/tty/tty_ldisc.c
>> @@ -930,16 +930,21 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)
>>   	 */
>>
>> - 	tty_lock_pair(tty, o_tty);
>>   	tty_ldisc_halt(tty);
>> 	tty_ldisc_flush_works(tty);
>
>
>> +	tty_lock_pair(tty, o_tty);
>>   	/* This will need doing differently if we need to lock */
>>   	tty_ldisc_kill(tty);
>> -
>>   	if (o_tty)
>>   		tty_ldisc_kill(o_tty);
>>

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND] tty: don't dead lock while flushing workqueue
  2012-12-05 16:15           ` Sebastian Andrzej Siewior
@ 2012-12-05 17:11             ` Alan Cox
  2012-12-25 22:02               ` [PATCH v3] tty: don't deadlock " Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Cox @ 2012-12-05 17:11 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Hurley, Greg Kroah-Hartman, linux-kernel, linux-usb

On Wed, 05 Dec 2012 17:15:40 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 12/03/2012 06:41 PM, Peter Hurley wrote:
> > The lock logic for tty_set_ldisc() is wrong. Despite existing code in
> > tty_set_ldisc() and tty_ldisc_hangup(), the ldisc_mutex does **not**
> > (and should not) play a role in acquiring or releasing ldisc references.
> > The only thing that needs to happen here is below (don't actually use
> > below because I just hand-edited it):
> 
> Hmm. What about I stay in sync with the code that is already in tree
> and if the wrong locking gets removed in both places later on?
> 
> Alan, what do you prefer?

So long as it ends up right I don't care 8)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3] tty: don't deadlock while flushing workqueue
  2012-12-05 17:11             ` Alan Cox
@ 2012-12-25 22:02               ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2012-12-25 22:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sebastian Andrzej Siewior, Peter Hurley, Alan Cox, linux-kernel,
	linux-usb

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Since commit 89c8d91e31f2 ("tty: localise the lock") I see a dead lock
in one of my dummy_hcd + g_nokia test cases. The first run was usually
okay, the second often resulted in a splat by lockdep and the third was
usually a dead lock.
Lockdep complained about tty->hangup_work and tty->legacy_mutex taken
both ways:
| ======================================================
| [ INFO: possible circular locking dependency detected ]
| 3.7.0-rc6+ #204 Not tainted
| -------------------------------------------------------
| kworker/2:1/35 is trying to acquire lock:
|  (&tty->legacy_mutex){+.+.+.}, at: [<c14051e6>] tty_lock_nested+0x36/0x80
|
| but task is already holding lock:
|  ((&tty->hangup_work)){+.+...}, at: [<c104f6e4>] process_one_work+0x124/0x5e0
|
| which lock already depends on the new lock.
|
| the existing dependency chain (in reverse order) is:
|
| -> #2 ((&tty->hangup_work)){+.+...}:
|        [<c107fe74>] lock_acquire+0x84/0x190
|        [<c104d82d>] flush_work+0x3d/0x240
|        [<c12e6986>] tty_ldisc_flush_works+0x16/0x30
|        [<c12e7861>] tty_ldisc_release+0x21/0x70
|        [<c12e0dfc>] tty_release+0x35c/0x470
|        [<c1105e28>] __fput+0xd8/0x270
|        [<c1105fcd>] ____fput+0xd/0x10
|        [<c1051dd9>] task_work_run+0xb9/0xf0
|        [<c1002a51>] do_notify_resume+0x51/0x80
|        [<c140550a>] work_notifysig+0x35/0x3b
|
| -> #1 (&tty->legacy_mutex/1){+.+...}:
|        [<c107fe74>] lock_acquire+0x84/0x190
|        [<c140276c>] mutex_lock_nested+0x6c/0x2f0
|        [<c14051e6>] tty_lock_nested+0x36/0x80
|        [<c1405279>] tty_lock_pair+0x29/0x70
|        [<c12e0bb8>] tty_release+0x118/0x470
|        [<c1105e28>] __fput+0xd8/0x270
|        [<c1105fcd>] ____fput+0xd/0x10
|        [<c1051dd9>] task_work_run+0xb9/0xf0
|        [<c1002a51>] do_notify_resume+0x51/0x80
|        [<c140550a>] work_notifysig+0x35/0x3b
|
| -> #0 (&tty->legacy_mutex){+.+.+.}:
|        [<c107f3c9>] __lock_acquire+0x1189/0x16a0
|        [<c107fe74>] lock_acquire+0x84/0x190
|        [<c140276c>] mutex_lock_nested+0x6c/0x2f0
|        [<c14051e6>] tty_lock_nested+0x36/0x80
|        [<c140523f>] tty_lock+0xf/0x20
|        [<c12df8e4>] __tty_hangup+0x54/0x410
|        [<c12dfcb2>] do_tty_hangup+0x12/0x20
|        [<c104f763>] process_one_work+0x1a3/0x5e0
|        [<c104fec9>] worker_thread+0x119/0x3a0
|        [<c1055084>] kthread+0x94/0xa0
|        [<c140ca37>] ret_from_kernel_thread+0x1b/0x28
|
|other info that might help us debug this:
|
|Chain exists of:
|  &tty->legacy_mutex --> &tty->legacy_mutex/1 --> (&tty->hangup_work)
|
| Possible unsafe locking scenario:
|
|       CPU0                    CPU1
|       ----                    ----
|  lock((&tty->hangup_work));
|                               lock(&tty->legacy_mutex/1);
|                               lock((&tty->hangup_work));
|  lock(&tty->legacy_mutex);
|
| *** DEADLOCK ***

Before the path mentioned tty_ldisc_release() look like this:

|	tty_ldisc_halt(tty);
|	tty_ldisc_flush_works(tty);
|	tty_lock();

As it can be seen, it first flushes the workqueue and then grabs the
tty_lock. Now we grab the lock first:

|	tty_lock_pair(tty, o_tty);
|	tty_ldisc_halt(tty);
|	tty_ldisc_flush_works(tty);

so lockdep's complaint seems valid.

The earlier version of this patch took the ldisc_mutex since the other
user of tty_ldisc_flush_works() (tty_set_ldisc()) did this.
Peter Hurley then said that it is should not be requried. Since it
wasn't done earlier, I dropped this part.
The code under tty_ldisc_kill() was executed earlier with the tty lock
taken so it is taken again.

I was able to reproduce the deadlock on v3.8-rc1, this patch fixes the
problem in my testcase. I didn't notice any problems so far.

Cc: Alan Cox <alan@linux.intel.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/tty/tty_ldisc.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index c578229..78f1be2 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -934,17 +934,17 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)
 	 * race with the set_ldisc code path.
 	 */
 
-	tty_lock_pair(tty, o_tty);
 	tty_ldisc_halt(tty);
-	tty_ldisc_flush_works(tty);
-	if (o_tty) {
+	if (o_tty)
 		tty_ldisc_halt(o_tty);
+
+	tty_ldisc_flush_works(tty);
+	if (o_tty)
 		tty_ldisc_flush_works(o_tty);
-	}
 
+	tty_lock_pair(tty, o_tty);
 	/* This will need doing differently if we need to lock */
 	tty_ldisc_kill(tty);
-
 	if (o_tty)
 		tty_ldisc_kill(o_tty);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-12-25 22:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-21 12:39 [PATCH] tty: don't dead while flushing workqueue Sebastian Andrzej Siewior
2012-11-21 14:04 ` Alan Cox
2012-11-27  9:53   ` Sebastian Andrzej Siewior
2012-11-27 17:22     ` Greg Kroah-Hartman
2012-11-27 18:01       ` [PATCH RESEND] tty: don't dead lock " Sebastian Andrzej Siewior
2012-11-30 17:09         ` Sebastian Andrzej Siewior
2012-11-30 17:21           ` Greg Kroah-Hartman
2012-11-30 18:11             ` Sebastian Andrzej Siewior
2012-12-03 17:41         ` Peter Hurley
2012-12-05 16:15           ` Sebastian Andrzej Siewior
2012-12-05 17:11             ` Alan Cox
2012-12-25 22:02               ` [PATCH v3] tty: don't deadlock " Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).