All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Vyukov <dvyukov@google.com>
To: Peter Hurley <peter@hurleysoftware.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Jiri Slaby <jslaby@suse.com>,
	One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	J Freyensee <james_p_freyensee@linux.intel.com>,
	syzkaller <syzkaller@googlegroups.com>,
	Kostya Serebryany <kcc@google.com>,
	Alexander Potapenko <glider@google.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Eric Dumazet <edumazet@google.com>
Subject: Re: tty: deadlock between n_tracerouter_receivebuf and flush_to_ldisc
Date: Thu, 4 Feb 2016 13:39:26 +0100	[thread overview]
Message-ID: <CACT4Y+b4nrcummmXvnQ2YBOsujDam1TY5_0MJ97Ab2RNRO=U6Q@mail.gmail.com> (raw)
In-Reply-To: <56B25050.9070003@hurleysoftware.com>

On Wed, Feb 3, 2016 at 8:09 PM, Peter Hurley <peter@hurleysoftware.com> wrote:
> On 02/03/2016 09:32 AM, Dmitry Vyukov wrote:
>> On Wed, Feb 3, 2016 at 5:24 AM, Peter Hurley <peter@hurleysoftware.com> wrote:
>>> Hi Dmitry,
>>>
>>> On 01/21/2016 09:43 AM, Peter Hurley wrote:
>>>> On 01/21/2016 02:06 AM, Dmitry Vyukov wrote:
>>>>> On Wed, Jan 20, 2016 at 5:08 PM, Peter Hurley <peter@hurleysoftware.com> wrote:
>>>>>> On 01/20/2016 05:02 AM, Peter Zijlstra wrote:
>>>>>>> On Wed, Dec 30, 2015 at 11:44:01AM +0100, Dmitry Vyukov wrote:
>>>>>>>> -> #3 (&buf->lock){+.+...}:
>>>>>>>>        [<ffffffff813f0acf>] lock_acquire+0x19f/0x3c0 kernel/locking/lockdep.c:3585
>>>>>>>>        [<     inline     >] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:112
>>>>>>>>        [<ffffffff85c8e790>] _raw_spin_lock_irqsave+0x50/0x70 kernel/locking/spinlock.c:159
>>>>>>>>        [<ffffffff82b8c050>] tty_get_pgrp+0x20/0x80 drivers/tty/tty_io.c:2502
>>>>>>>
>>>>>>> So in any recent code that I look at this function tries to acquire
>>>>>>> tty->ctrl_lock, not buf->lock. Am I missing something ?!
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>> The tty locks were annotated with __lockfunc so were being elided from lockdep
>>>>>> stacktraces. Greg has a patch in his queue from me that removes the __lockfunc
>>>>>> annotation ("tty: Remove __lockfunc annotation from tty lock functions").
>>>>>>
>>>>>> Unfortunately, I think syzkaller's post-processing stack trace isn't helping
>>>>>> either, giving the impression that the stack is still inside tty_get_pgrp().
>>>>>>
>>>>>> It's not.
>>>>>
>>>>> I've got a new report on commit
>>>>> a200dcb34693084e56496960d855afdeaaf9578f  (Jan 18).
>>>>> Here is unprocessed version:
>>>>> https://gist.githubusercontent.com/dvyukov/428a0c9bfaa867d8ce84/raw/0754db31668602ad07947f9964238b2f9cf63315/gistfile1.txt
>>>>> and here is processed one:
>>>>> https://gist.githubusercontent.com/dvyukov/42b874213de82d94c35e/raw/2bbced252035821243678de0112e2ed3a766fb5d/gistfile1.txt
>>>>>
>>>>> Peter, what exactly is wrong with the post-processed version?
>>>>
>>>> Yeah, ok, I assumed the problem with this report was post-processing
>>>> because of the other report that had mixed-up info.
>>>>
>>>> However, the #3 stacktrace is obviously wrong, as others have already noted.
>>>> Plus, the #1 stacktrace is wrong as well.
>>>>
>>>>> I would be interested in fixing the processing script.
>>>>
>>>> Not that it's related (since the original, not-edited report has bogus
>>>> stacktraces), but how are you doing debug symbol lookup?
>>>>
>>>> Because below is not correct. Should be kernel/kthread.c:177 (or thereabouts)
>>>>
>>>>        [<ffffffff813b423f>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
>>>>
>>>>
>>>>> As far as I see it contains the same stacks just with line numbers and
>>>>> inlined frames.
>>>>
>>>> Agree, now that I see the original report.
>>>>
>>>>> I am using a significantly different compilation mode
>>>>> (kasan + kcov + very recent gcc), so nobody except me won't be able to
>>>>> figure out line numbers based on offsets.
>>>>
>>>> Weird. Maybe something to do with the compiler.
>>>>
>>>> Can you get me the dmesg output running the patch below?
>>>
>>> Wondering if this is still the priority it was not so long ago?
>>> If not, that's fine and I'll drop this from my followup list.
>>
>>
>> Yes, it is still the priority for me.
>> I've tried to apply your debugging patch, but I noticed that it prints
>> dependencies stacks as it discovers them.
>
> Yeah, that's the point; I need to understand why lockdep doesn't
> store the correct stack trace at dependency discovery.
>
> Since the correct stack trace will be printed instead, it will help
> debug the lockdep problem.
>
> Hopefully, once the problem with the bad stacktraces are fixed, the
> actual circular lock dependencies will be clear.
>
>> But in my setup I don't have
>> all output from machine start (there is just too many of it).
>
> Kernel parameter:
>
>         log_buf_len=1G
>
>
>> And I don't have a localized reproducer for this.
>
> I really just need the lockdep dependency stacks generated during boot,
> and the ctrl+C in a terminal window to trigger one of the dependency
> stacks.
>
>> I will try again.
>
> Ok.
>
>> Do you want me to debug with your "tty: Fix lock inversion in
>> N_TRACEROUTER" patch applied or not (I still see slightly different
>> deadlock reports with it)?
>
> Not.
>
> I think that probably does fix at least one circular dependency, but
> I want to figure out the bad stack trace problem first.
>
> There's probably another circular dependency there, as indicated by
> your other report.


Here is debug output:
https://gist.githubusercontent.com/dvyukov/b18181c849fdd3d51c80/raw/e91ead683fec020f64eed6750aa9f6347d43b9f9/gistfile1.txt

In particular the ctrl+C dependency is:

 new dependency:  (&o_tty->termios_rwsem/1){++++..} =>  (&buf->lock){+.+...}
[  216.817400] Call Trace:
[  216.817400]  [<ffffffff82be450d>] dump_stack+0x6f/0xa2
[  216.817400]  [<ffffffff8145b149>] __lock_acquire+0x4859/0x5710
[  216.817400]  [<ffffffff8145e61c>] lock_acquire+0x1dc/0x430
[  216.817400]  [<ffffffff86656871>] mutex_lock_nested+0xb1/0xa50
[  216.817400]  [<ffffffff82f9f08f>] tty_buffer_flush+0xbf/0x3c0
[  216.817400]  [<ffffffff82fa330c>] pty_flush_buffer+0x5c/0x180
[  216.817400]  [<ffffffff82f97a05>] tty_driver_flush_buffer+0x65/0x80
[  216.817400]  [<ffffffff82f8d162>] isig+0x172/0x2c0
[  216.817400]  [<ffffffff82f8fe52>] n_tty_receive_signal_char+0x22/0xf0
[  216.817400]  [<ffffffff82f93a4e>] n_tty_receive_char_special+0x126e/0x2b30
[  216.817400]  [<ffffffff82f96cb3>] n_tty_receive_buf_common+0x19a3/0x2400
[  216.817400]  [<ffffffff82f97743>] n_tty_receive_buf2+0x33/0x40
[  216.817400]  [<ffffffff82f9e83f>] flush_to_ldisc+0x3bf/0x7f0
[  216.817400]  [<ffffffff813a3eb6>] process_one_work+0x796/0x1440
[  216.817400]  [<ffffffff813a4c3b>] worker_thread+0xdb/0xfc0
[  216.817400]  [<ffffffff813b7edf>] kthread+0x23f/0x2d0
[  216.817400]  [<ffffffff866608ef>] ret_from_fork+0x3f/0x70

While in report it still looks as:

-> #3 (&buf->lock){+.+...}:
[ 1544.187872]        [<ffffffff8145e61c>] lock_acquire+0x1dc/0x430
[ 1544.187872]        [<ffffffff8665fecf>] _raw_spin_lock_irqsave+0x9f/0xd0
[ 1544.187872]        [<ffffffff82f7c810>] tty_get_pgrp+0x20/0x80
[ 1544.187872]        [<ffffffff82f8afca>] __isig+0x1a/0x50
[ 1544.187872]        [<ffffffff82f8d09e>] isig+0xae/0x2c0
[ 1544.187872]        [<ffffffff82f8fe52>] n_tty_receive_signal_char+0x22/0xf0
[ 1544.187872]        [<ffffffff82f93a6d>]
n_tty_receive_char_special+0x128d/0x2b30
[ 1544.187872]        [<ffffffff82f96cb3>]
n_tty_receive_buf_common+0x19a3/0x2400
[ 1544.187872]        [<ffffffff82f97743>] n_tty_receive_buf2+0x33/0x40
[ 1544.187872]        [<ffffffff82f9e83f>] flush_to_ldisc+0x3bf/0x7f0
[ 1544.187872]        [<ffffffff813a3eb6>] process_one_work+0x796/0x1440
[ 1544.187872]        [<ffffffff813a4c3b>] worker_thread+0xdb/0xfc0
[ 1544.187872]        [<ffffffff813b7edf>] kthread+0x23f/0x2d0
[ 1544.187872]        [<ffffffff866608ef>] ret_from_fork+0x3f/0x70


It seems to me that tty_get_pgrp is red herring. Ctrl lock is not
mentioned in reports, and isig indeed calls __isig/tty_get_pgrp just
before tty_driver_flush_buffer, so it looks like stack unwinding bug.

  reply	other threads:[~2016-02-04 12:39 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-30 10:44 tty: deadlock between n_tracerouter_receivebuf and flush_to_ldisc Dmitry Vyukov
2016-01-15  7:51 ` Dmitry Vyukov
2016-01-15 16:33   ` One Thousand Gnomes
2016-01-15 17:22     ` Dmitry Vyukov
2016-01-20  9:36       ` Dmitry Vyukov
2016-01-20 11:44         ` Peter Zijlstra
2016-01-20 11:54           ` Dmitry Vyukov
2016-01-20 12:07             ` Peter Zijlstra
2016-01-20 14:58               ` One Thousand Gnomes
2016-01-20 15:16                 ` Dmitry Vyukov
2016-01-20 16:32                   ` Peter Zijlstra
2016-01-20  2:09     ` J Freyensee
2016-01-20 12:47 ` Jiri Slaby
2016-01-20 13:02 ` Peter Zijlstra
2016-01-20 13:07   ` Dmitry Vyukov
2016-01-20 16:08   ` Peter Hurley
2016-01-20 20:47     ` Peter Hurley
2016-01-21 10:06     ` Dmitry Vyukov
2016-01-21 10:20       ` Peter Zijlstra
2016-01-21 17:51         ` Peter Hurley
2016-01-22 14:10           ` Dmitry Vyukov
2016-01-25 16:56             ` Peter Hurley
2016-01-21 17:43       ` Peter Hurley
2016-02-03  4:24         ` Peter Hurley
2016-02-03 17:32           ` Dmitry Vyukov
2016-02-03 19:09             ` Peter Hurley
2016-02-04 12:39               ` Dmitry Vyukov [this message]
2016-02-04 13:17                 ` Dmitry Vyukov
2016-02-04 18:46                   ` Peter Hurley
2016-02-04 18:48                     ` Dmitry Vyukov
2016-02-05 21:22                       ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACT4Y+b4nrcummmXvnQ2YBOsujDam1TY5_0MJ97Ab2RNRO=U6Q@mail.gmail.com' \
    --to=dvyukov@google.com \
    --cc=edumazet@google.com \
    --cc=glider@google.com \
    --cc=gnomes@lxorguk.ukuu.org.uk \
    --cc=gregkh@linuxfoundation.org \
    --cc=james_p_freyensee@linux.intel.com \
    --cc=jslaby@suse.com \
    --cc=kcc@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peter@hurleysoftware.com \
    --cc=peterz@infradead.org \
    --cc=sasha.levin@oracle.com \
    --cc=syzkaller@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.