On 5 December 2016 at 18:55, Linus Torvalds wrote: > On Mon, Dec 5, 2016 at 9:09 AM, Vegard Nossum wrote: >> >> The warning shows that it made it past the list_empty_careful() check >> in finish_wait() but then bugs out on the &wait->task_list >> dereference. >> >> Anything stick out? > > I hate that shmem waitqueue garbage. It's really subtle. > > I think the problem is that "wake_up_all()" in shmem_fallocate() > doesn't necessarily wake up everything. It wakes up TASK_NORMAL - > which does include TASK_UNINTERRUPTIBLE, but doesn't actually mean > "everything on the list". > > I think that what happens is that the waiters somehow move from > TASK_UNINTERRUPTIBLE to TASK_RUNNING early, and this means that > wake_up_all() will ignore them, leave them on the list, and now that > list on stack is no longer empty at the end. > > And the way *THAT* can happen is that the task is on some *other* > waitqueue as well, and that other waiqueue wakes it up. That's not > impossible, you can certainly have people on wait-queues that still > take faults. > > Or somebody just uses a directed wake_up_process() or something. > > Since you apparently can recreate this fairly easily, how about trying > this stupid patch? > > NOTE! This is entirely untested. I may have screwed this up entirely. > You get the idea, though - just remove the wait queue head from the > list - the list entries stay around, but nothing points to the stack > entry (that we're going to free) any more. > > And add the warning to see if this actually ever triggers (and because > I'd like to see the callchain when it does, to see if it's another > waitqueue somewhere or what..) ------------[ cut here ]------------ WARNING: CPU: 22 PID: 14012 at mm/shmem.c:2668 shmem_fallocate+0x9a7/0xac0 Kernel panic - not syncing: panic_on_warn set ... CPU: 22 PID: 14012 Comm: trinity-c73 Not tainted 4.9.0-rc7+ #220 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 ffff8801e32af970 ffffffff81fb08c1 ffffffff83e74b60 ffff8801e32afa48 ffffffff83ed7600 ffffffff847103e0 ffff8801e32afa38 ffffffff81515244 0000000041b58ab3 ffffffff844e21da ffffffff81515061 ffffffff8151591e Call Trace: [] dump_stack+0x83/0xb2 [] panic+0x1e3/0x3ad [] __warn+0x1bf/0x1e0 [] warn_slowpath_null+0x2c/0x40 [] shmem_fallocate+0x9a7/0xac0 [] vfs_fallocate+0x350/0x620 [] SyS_madvise+0x432/0x1290 [] do_syscall_64+0x1af/0x4d0 [] entry_SYSCALL64_slow_path+0x25/0x25 ------------[ cut here ]------------ Attached a full log. Vegard