linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Jones <davej@codemonkey.org.uk>, Chris Mason <clm@fb.com>,
	Andy Lutomirski <luto@kernel.org>, Jens Axboe <axboe@fb.com>,
	Al Viro <viro@zeniv.linux.org.uk>, Josef Bacik <jbacik@fb.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: bio linked list corruption.
Date: Mon, 24 Oct 2016 13:46:42 -0700	[thread overview]
Message-ID: <CA+55aFycKG_7qpSs_pH7ibnOYL9vM85UMGhUFEGC-qpB4qkb5A@mail.gmail.com> (raw)
In-Reply-To: <CALCETrXoGd3gq=g61q07JDNTSaY7TjDoPQd3F8UgiwDfyJVLug@mail.gmail.com>

On Mon, Oct 24, 2016 at 1:06 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> [69943.450108] Oops: 0003 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>
> This is an unhandled kernel page fault.  The string "Oops" is so helpful :-/

I think there was a line above it that DaveJ just didn't include.

>
>> [69943.454452] CPU: 1 PID: 21558 Comm: trinity-c60 Not tainted 4.9.0-rc1-think+ #11
>> [69943.463510] task: ffff8804f8dd3740 task.stack: ffffc9000b108000
>> [69943.468077] RIP: 0010:[<ffffffff810c3f6b>]
>> [69943.472704]  [<ffffffff810c3f6b>] __lock_acquire.isra.32+0x6b/0x8c0
>> [69943.477489] RSP: 0018:ffffc9000b10b9e8  EFLAGS: 00010086
>> [69943.482368] RAX: ffffffff81789b90 RBX: ffff8804f8dd3740 RCX: 0000000000000000
>> [69943.487410] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> [69943.492515] RBP: ffffc9000b10ba18 R08: 0000000000000001 R09: 0000000000000000
>> [69943.497666] R10: 0000000000000001 R11: 00003f9cfa7f4e73 R12: 0000000000000000
>> [69943.502880] R13: 0000000000000000 R14: ffffc9000af7bd48 R15: ffff8804f8dd3740
>> [69943.508163] FS:  00007f64904a2b40(0000) GS:ffff880507a00000(0000) knlGS:0000000000000000
>> [69943.513591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [69943.518917] CR2: ffffffff81789d28 CR3: 00000004a8f16000 CR4: 00000000001406e0
>> [69943.524253] DR0: 00007f5b97fd4000 DR1: 0000000000000000 DR2: 0000000000000000
>> [69943.529488] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>> [69943.534771] Stack:
>> [69943.540023]  ffff880507bd74c0
>> [69943.545317]  ffff8804f8dd3740 0000000000000046 0000000000000286[69943.545456]  ffffc9000af7bd08
>> [69943.550930]  0000000000000100 ffffc9000b10ba50 ffffffff810c4b68[69943.551069]  ffffffff810ba40c
>> [69943.556657]  ffff880400000000 0000000000000000 ffffc9000af7bd48[69943.556796] Call Trace:
>> [69943.562465]  [<ffffffff810c4b68>] lock_acquire+0x58/0x70
>> [69943.568354]  [<ffffffff810ba40c>] ? finish_wait+0x3c/0x70
>> [69943.574306]  [<ffffffff8178fef2>] _raw_spin_lock_irqsave+0x42/0x80
>> [69943.580335]  [<ffffffff810ba40c>] ? finish_wait+0x3c/0x70
>> [69943.586237]  [<ffffffff810ba40c>] finish_wait+0x3c/0x70
>> [69943.591992]  [<ffffffff81169727>] shmem_fault+0x167/0x1b0
>> [69943.597807]  [<ffffffff810ba6c0>] ? prepare_to_wait_event+0x100/0x100
>> [69943.603741]  [<ffffffff8117b46d>] __do_fault+0x6d/0x1b0
>> [69943.609743]  [<ffffffff8117f168>] handle_mm_fault+0xc58/0x1170
>> [69943.615822]  [<ffffffff8117e553>] ? handle_mm_fault+0x43/0x1170
>> [69943.621971]  [<ffffffff81044982>] __do_page_fault+0x172/0x4e0
>> [69943.628184]  [<ffffffff81044d10>] do_page_fault+0x20/0x70
>> [69943.634449]  [<ffffffff8132a897>] ? debug_smp_processor_id+0x17/0x20
>> [69943.640784]  [<ffffffff81791f3f>] page_fault+0x1f/0x30
>> [69943.647170]  [<ffffffff8133d69c>] ? strncpy_from_user+0x5c/0x170
>> [69943.653480]  [<ffffffff8133d686>] ? strncpy_from_user+0x46/0x170
>> [69943.659632]  [<ffffffff811f22a7>] setxattr+0x57/0x170
>> [69943.665846]  [<ffffffff8132a897>] ? debug_smp_processor_id+0x17/0x20
>> [69943.672172]  [<ffffffff810c1f09>] ? get_lock_stats+0x19/0x50
>> [69943.678558]  [<ffffffff810a58f6>] ? sched_clock_cpu+0xb6/0xd0
>> [69943.685007]  [<ffffffff810c40cf>] ? __lock_acquire.isra.32+0x1cf/0x8c0
>> [69943.691542]  [<ffffffff8132a8b3>] ? __this_cpu_preempt_check+0x13/0x20
>> [69943.698130]  [<ffffffff8109b9bc>] ? preempt_count_add+0x7c/0xc0
>> [69943.704791]  [<ffffffff811ecda1>] ? __mnt_want_write+0x61/0x90
>> [69943.711519]  [<ffffffff811f2638>] SyS_fsetxattr+0x78/0xa0
>> [69943.718300]  [<ffffffff8100255c>] do_syscall_64+0x5c/0x170
>> [69943.724949]  [<ffffffff81790a4b>] entry_SYSCALL64_slow_path+0x25/0x25
>> [69943.731521] Code:
>> [69943.738124] 00 83 fe 01 0f 86 0e 03 00 00 31 d2 4c 89 f7 44 89 45 d0 89 4d d4 e8 75 e7 ff ff 8b 4d d4 48 85 c0 44 8b 45 d0 0f 84 d8 02 00 00 <f0> ff 80 98 01 00 00 8b 15 e0 21 8f 01 45 8b 8f 50 08 00 00 85
>
> That's lock incl 0x198(%rax).  I think this is:
>
>     atomic_inc((atomic_t *)&class->ops);
>
> I suppose this could be stack corruption at work, but after a fair
> amount of staring, I still haven't found anything in the vmap_stack
> code that would cause stack corruption.

Well, it is intriguing that what faults is this:

                        finish_wait(shmem_falloc_waitq, &shmem_fault_wait);

where 'shmem_fault_wait' is a on-stack wait queue. So it really looks
very much like stack corruption.

What strikes me is that "finish_wait()" does this optimistic "has my
entry been removed" without holding the waitqueue lock (and uses
list_empty_careful() to make sure it does that "safely").

It has that big comment too:

                        /*
                         * shmem_falloc_waitq points into the shmem_fallocate()
                         * stack of the hole-punching task: shmem_falloc_waitq
                         * is usually invalid by the time we reach here, but
                         * finish_wait() does not dereference it in that case;
                         * though i_lock needed lest racing with wake_up_all().
                         */

the stack it comes from is the wait queue head from shmem_fallocate(),
which will do "wake_up_all()" under the inode lock.

On the face of it, the inode lock should make that safe and serialize
everything. And yes, finish_wait() does not touch the unsafe stuff if
the wait-queue (in the local stack) is empty, which wake_up_all()
*should* have guaranteed. It's just a regular wait-queue entry (that
DEFINE_WAIT() does that), so it uses the normal
autoremove_wake_function() that removes things on successful wakeup:

int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int
sync, void *key)
{
        int ret = default_wake_function(wait, mode, sync, key);

        if (ret)
                list_del_init(&wait->task_list);
        return ret;
}

So the only issue is "did default_wake_function() return true"? That's
try_to_wake_up(TASK_NORMAL, 0), and I note that it can return zero
(and thus *not* remove the entry - leavign the invalid entry tghere)
if

        if (!(p->state & state))
                goto out;

but "prepare_to_wait()" (which also ran with the inode->i_lock held,
and also takes the wait-queue lock) did set p->state to
TASK_UNINTERRUPTIBLE.

So this is all some really subtle code, but I'm not seeing that it
would be wrong.

            Linus

  reply	other threads:[~2016-10-24 20:46 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-11 14:45 btrfs bio linked list corruption Dave Jones
2016-10-11 15:11 ` Al Viro
2016-10-11 15:19   ` Dave Jones
2016-10-11 15:20     ` Chris Mason
2016-10-11 15:49       ` Dave Jones
2016-10-11 15:54 ` Chris Mason
2016-10-11 16:25   ` Dave Jones
2016-10-12 13:47   ` Dave Jones
2016-10-12 14:40     ` Dave Jones
2016-10-12 14:42       ` Chris Mason
2016-10-13 18:16         ` Dave Jones
2016-10-13 21:18           ` Chris Mason
2016-10-13 21:56             ` Dave Jones
2016-10-16  0:42             ` Dave Jones
2016-10-18  1:07               ` Chris Mason
2016-10-18 22:42 ` Dave Jones
2016-10-18 23:12   ` Jens Axboe
2016-10-18 23:31     ` Chris Mason
2016-10-18 23:36       ` Jens Axboe
2016-10-18 23:39       ` Linus Torvalds
2016-10-18 23:42         ` Chris Mason
2016-10-19  0:10           ` Linus Torvalds
2016-10-19  0:19             ` Chris Mason
2016-10-19  0:28             ` Linus Torvalds
2016-10-20 22:48               ` Dave Jones
2016-10-19  1:05             ` Andy Lutomirski
2016-10-20 22:50               ` Dave Jones
2016-10-20 23:01                 ` Andy Lutomirski
2016-10-20 23:03                   ` Dave Jones
2016-10-20 23:23                     ` Andy Lutomirski
2016-10-21 20:02                       ` Dave Jones
2016-10-21 20:17                         ` Chris Mason
2016-10-21 20:23                           ` Dave Jones
2016-10-21 20:38                             ` Chris Mason
2016-10-21 20:41                               ` Josef Bacik
2016-10-21 21:11                                 ` Dave Jones
2016-10-22 15:20                         ` Dave Jones
2016-10-23 21:32                           ` Chris Mason
2016-10-24  4:40                             ` Dave Jones
2016-10-24 13:42                               ` Chris Mason
2016-10-26  0:27                                 ` Dave Jones
2016-10-26  1:33                                   ` Linus Torvalds
2016-10-26  1:39                                     ` Linus Torvalds
2016-10-26 16:30                                       ` Dave Jones
2016-10-26 16:48                                         ` Linus Torvalds
2016-10-26 18:18                                           ` Dave Jones
2016-10-26 18:42                                           ` Dave Jones
2016-10-26 19:06                                             ` Linus Torvalds
2016-10-26 20:00                                               ` Chris Mason
2016-10-26 21:52                                                 ` Chris Mason
2016-10-26 22:21                                                   ` Linus Torvalds
2016-10-26 22:40                                                     ` Dave Jones
2016-10-26 22:51                                                       ` Linus Torvalds
2016-10-26 22:55                                                         ` Jens Axboe
2016-10-26 22:58                                                         ` Linus Torvalds
2016-10-26 23:03                                                           ` Jens Axboe
2016-10-26 23:07                                                             ` Dave Jones
2016-10-26 23:08                                                             ` Linus Torvalds
2016-10-26 23:20                                                               ` Jens Axboe
2016-10-26 23:38                                                                 ` Chris Mason
2016-10-26 23:47                                                                   ` Dave Jones
2016-10-27  0:00                                                                     ` Jens Axboe
2016-10-27 13:33                                                                       ` Chris Mason
2016-10-31 18:55                                                                     ` Dave Jones
2016-10-31 19:35                                                                       ` Linus Torvalds
2016-10-31 19:44                                                                         ` Chris Mason
2016-11-06 16:55                                                                           ` btrfs btree_ctree_super fault Dave Jones
2016-11-08 14:59                                                                             ` Dave Jones
2016-11-08 15:08                                                                               ` Chris Mason
2016-11-10 14:35                                                                                 ` Dave Jones
2016-11-10 15:27                                                                                   ` Chris Mason
2016-11-23 19:34                                                                           ` bio linked list corruption Dave Jones
2016-11-23 19:58                                                                             ` Dave Jones
2016-12-01 15:32                                                                               ` btrfs_destroy_inode warn (outstanding extents) Dave Jones
2016-12-03 16:48                                                                                 ` Dave Jones
2016-12-07 16:15                                                                                   ` Dave Jones
2016-12-09 21:12                                                                                 ` Steven Rostedt
2016-12-04 23:04                                                                               ` bio linked list corruption Vegard Nossum
2016-12-05 11:10                                                                                 ` Vegard Nossum
2016-12-05 17:09                                                                                   ` Vegard Nossum
2016-12-05 17:21                                                                                     ` Dave Jones
2016-12-05 17:55                                                                                     ` Linus Torvalds
2016-12-05 19:11                                                                                       ` Vegard Nossum
2016-12-05 20:10                                                                                         ` Linus Torvalds
2016-12-05 20:35                                                                                           ` Linus Torvalds
2016-12-05 21:33                                                                                             ` Vegard Nossum
2016-12-06  8:42                                                                                               ` Vegard Nossum
2016-12-06  8:16                                                                                             ` Peter Zijlstra
2016-12-06  8:36                                                                                               ` Ingo Molnar
2016-12-06 16:33                                                                                               ` Linus Torvalds
2016-12-05 20:10                                                                                         ` Vegard Nossum
2016-12-05 18:11                                                                                 ` Andy Lutomirski
2016-12-05 18:25                                                                                   ` Linus Torvalds
2016-12-05 18:26                                                                                   ` Vegard Nossum
2016-10-26 23:19                                                             ` Chris Mason
2016-10-26 23:21                                                               ` Jens Axboe
2016-10-27  6:33                                                             ` Christoph Hellwig
2016-10-27 16:34                                                               ` Linus Torvalds
2016-10-27 16:36                                                                 ` Jens Axboe
2016-10-26 23:01                                                         ` Dave Jones
2016-10-26 23:05                                                           ` Jens Axboe
2016-10-26 22:52                                                       ` Jens Axboe
2016-10-26 22:07                                                 ` Linus Torvalds
2016-10-26 22:54                                                   ` Chris Mason
2016-10-27  5:41                                   ` Dave Chinner
2016-10-27 17:23                                     ` Dave Jones
2016-10-24 20:06                               ` Andy Lutomirski
2016-10-24 20:46                                 ` Linus Torvalds [this message]
2016-10-24 21:17                                   ` Linus Torvalds
2016-10-24 21:50                                     ` Linus Torvalds
2016-10-24 22:02                                       ` Chris Mason
2016-10-24 22:42                                   ` Andy Lutomirski
2016-10-25  0:00                                     ` Linus Torvalds
2016-10-25  1:09                                       ` Andy Lutomirski
2016-10-19 17:09           ` Philipp Hahn
2016-10-19 17:43             ` Linus Torvalds
2016-10-20  6:52               ` Ingo Molnar
2016-10-20  7:17                 ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFycKG_7qpSs_pH7ibnOYL9vM85UMGhUFEGC-qpB4qkb5A@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=axboe@fb.com \
    --cc=clm@fb.com \
    --cc=davej@codemonkey.org.uk \
    --cc=dsterba@suse.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).