* Linus Torvalds wrote: > [ Added some people to the cc - this is very directly related to the > previous thread on "v2.6.31-rc6: BUG: unable to handle kernel NULL > pointer dereference at 0000000000000008", and the deadlock discussion > there ] > > On Tue, 8 Sep 2009, Ingo Molnar wrote: > > > > FYI, i'm getting very (very) rare warnings from the TTY code in this > > place: > > > > [ 28.187364] rc.sysinit used greatest stack depth: 5224 bytes left > > [ 31.422457] Adding 3911816k swap on /dev/sda2. Priority:-1 extents:1 across:3911816k > > [ 32.974830] ssh used greatest stack depth: 5200 bytes left > > [ 33.115028] ------------[ cut here ]------------ > > [ 33.119518] WARNING: at drivers/char/tty_io.c:1267 __tty_open+0x3ef/0x4c0() > > Hmm. I think I see why, and I _suspect_ this is harmless, although > it's obviously very annoying, and it really is indicative of a real > locking problem. > > What's going on is that same horrible deadlocak-avoidance where we have to > drop the ldisc_mutex after clearing TTY_LDISC, in order to then wait for > any pending work. See commit 5c58ceff103d8a654f24769bb1baaf84a841b0cc, > which is probably also the one that introduced the timing that gets your > particular warning. > > So when __tty_open() does this: > > mutex_lock(&tty->ldisc_mutex); > WARN_ON(!test_bit(TTY_LDISC, &tty->flags)); > mutex_unlock(&tty->ldisc_mutex); > > it's really warning about something that really can happen: the things > that clear TTY_LDISC will all release the ldisc_mutex with that bit still > clear, because they all end up having to release the lock that they > _should_ hold in order to avoid a deadlock. > > So the warning is "real" in the sense that it does show a real locking > problem. It's probably not _relevant_ in that it probably will never cause > any other issues in practice. > > > I got it on two systems so far. Config attached (but is probably > > irrelevant). The warnings started in the .31 cycle. They occur every > > 1000-2000 random kernels - i.e. every few days. > > Yeah, the configuration won't matter. > > > These warnings were never fatal and my guess is that they are > > ancient, pre-existing races in the TTY code - but wanted to mention > > them here in case they matter. > > The issue is pre-existing, yes - we've always done that > > tty_ldisc_halt(tty); > flush_scheduled_work(); > > outside the ldisc_mutex, but the commit mentioned above (5c58ceff) added a > new case where we do it (it used to be in just tty_set_ldisc() and in > tty_ldisc_release()). So it's a pre-existing issue that probably just got > _way_ easier to hit fairly recently. > > Quite frankly, the ldisc_mutex problem is not fixable at this stage in > 2.6.31, and it's probably not worth worrying about. I'm planning on > revisiting this after releasing 2.6.31 (probably just deciding that > the sane way to fix it is to turn that flush_to_ldisc thing into just > a timer, not a delayed work - which allows us to hold the mutex), but > there's no way I'm doing that before.. > > If the fix turns out straightforward, we can back-port it through > stable. Just to refresh this older thread - is this warning supposed to be gone in latest -git? It still triggers occasionally in -tip tests: [ 9.243982] quotaon used greatest stack depth: 5396 bytes left [ 13.758784] Adding 3911816k swap on /dev/sda2. Priority:-1 extents:1 across:3911816k [ 15.373560] ------------[ cut here ]------------ [ 15.374283] WARNING: at drivers/char/tty_io.c:1268 tty_open+0x20a/0x3b1() [ 15.375257] Hardware name: System Product Name [ 15.376216] Modules linked in: [ 15.378215] Pid: 1706, comm: modprobe Not tainted 2.6.31-tip #16184 [ 15.379530] Call Trace: [ 15.380217] [<793430e5>] ? tty_open+0x20a/0x3b1 [ 15.381217] [<7904c329>] warn_slowpath_common+0x6f/0xb0 [ 15.382215] [<7904c386>] warn_slowpath_null+0x1c/0x30 [ 15.383215] [<793430e5>] tty_open+0x20a/0x3b1 [ 15.384244] [<790c8f2b>] chrdev_open+0x111/0x139 [ 15.385215] [<790c4001>] __dentry_open+0x16c/0x270 [ 15.386215] [<790c8e1a>] ? chrdev_open+0x0/0x139 [ 15.387215] [<790c420f>] nameidata_to_filp+0x39/0x61 [ 15.388215] [<790d1ab1>] do_filp_open+0x455/0x7d3 [ 15.389246] [<796572fe>] ? _spin_unlock+0x35/0x5c [ 15.390216] [<790daebd>] ? alloc_fd+0xd7/0xf2 [ 15.391215] [<790c3d42>] do_sys_open+0x53/0xe6 [ 15.392215] [<790c3e48>] sys_open+0x2c/0x45 [ 15.393221] [<79023507>] sysenter_do_call+0x12/0x3c [ 15.395245] ---[ end trace 8e8143959784383e ]--- (config attached) It's still never fatal, just a warning. Ingo