linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <clm@fb.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Dave Jones <davej@codemonkey.org.uk>,
	Andy Lutomirski <luto@amacapital.net>,
	"Andy Lutomirski" <luto@kernel.org>, Jens Axboe <axboe@fb.com>,
	Al Viro <viro@zeniv.linux.org.uk>, Josef Bacik <jbacik@fb.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: bio linked list corruption.
Date: Wed, 26 Oct 2016 16:00:23 -0400	[thread overview]
Message-ID: <488f9edc-6a1c-2c68-0d33-d3aa32ece9a4@fb.com> (raw)
In-Reply-To: <CA+55aFwD9McVapb0svQrrvP1k6iSkqz5ENNGXY6b+Yo-k7wOsg@mail.gmail.com>



On 10/26/2016 03:06 PM, Linus Torvalds wrote:
> On Wed, Oct 26, 2016 at 11:42 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
>>
>> The stacks show nearly all of them are stuck in sync_inodes_sb
> 
> That's just wb_wait_for_completion(), and it means that some IO isn't
> completing.
> 
> There's also a lot of processes waiting for inode_lock(), and a few
> waiting for mnt_want_write()
> 
> Ignoring those, we have
> 
>> [<ffffffffa009554f>] btrfs_wait_ordered_roots+0x3f/0x200 [btrfs]
>> [<ffffffffa00470d1>] btrfs_sync_fs+0x31/0xc0 [btrfs]
>> [<ffffffff811fbd4e>] sync_filesystem+0x6e/0xa0
>> [<ffffffff811fbebc>] SyS_syncfs+0x3c/0x70
>> [<ffffffff8100255c>] do_syscall_64+0x5c/0x170
>> [<ffffffff817908cb>] entry_SYSCALL64_slow_path+0x25/0x25
>> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Don't know this one. There's a couple of them. Could there be some
> ABBA deadlock on the ordered roots waiting?

It's always possible, but we haven't changed anything here.

I've tried a long list of things to reproduce this on my test boxes,
including days of trinity runs and a kernel module to exercise vmalloc,
and thread creation.

Today I turned off every CONFIG_DEBUG_* except for list debugging, and
ran dbench 2048:

[ 2759.118711] WARNING: CPU: 2 PID: 31039 at lib/list_debug.c:33 __list_add+0xbe/0xd0
[ 2759.119652] list_add corruption. prev->next should be next (ffffe8ffffc80308), but was ffffc90000ccfb88. (prev=ffff880128522380).
[ 2759.121039] Modules linked in: crc32c_intel i2c_piix4 aesni_intel aes_x86_64 virtio_net glue_helper i2c_core lrw floppy gf128mul serio_raw pcspkr button ablk_helper cryptd sch_fq_codel autofs4 virtio_blk
[ 2759.124369] CPU: 2 PID: 31039 Comm: dbench Not tainted 4.9.0-rc1-15246-g4ce9206-dirty #317
[ 2759.125077] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014
[ 2759.125077]  ffffc9000f6fb868 ffffffff814fe4ff ffffffff8151cb5e ffffc9000f6fb8c8
[ 2759.125077]  ffffc9000f6fb8c8 0000000000000000 ffffc9000f6fb8b8 ffffffff81064bbf
[ 2759.127444]  ffff880128523680 0000002139968000 ffff880138b7a4a0 ffff880128523540
[ 2759.127444] Call Trace:
[ 2759.127444]  [<ffffffff814fe4ff>] dump_stack+0x53/0x74
[ 2759.127444]  [<ffffffff8151cb5e>] ? __list_add+0xbe/0xd0
[ 2759.127444]  [<ffffffff81064bbf>] __warn+0xff/0x120
[ 2759.127444]  [<ffffffff81064c99>] warn_slowpath_fmt+0x49/0x50
[ 2759.127444]  [<ffffffff8151cb5e>] __list_add+0xbe/0xd0
[ 2759.127444]  [<ffffffff814df338>] blk_sq_make_request+0x388/0x580
[ 2759.127444]  [<ffffffff814d5b44>] generic_make_request+0x104/0x200
[ 2759.127444]  [<ffffffff814d5ca5>] submit_bio+0x65/0x130
[ 2759.127444]  [<ffffffff8152a946>] ? __percpu_counter_add+0x96/0xd0
[ 2759.127444]  [<ffffffff814260bc>] btrfs_map_bio+0x23c/0x310
[ 2759.127444]  [<ffffffff813f42b3>] btrfs_submit_bio_hook+0xd3/0x190
[ 2759.127444]  [<ffffffff814117ad>] submit_one_bio+0x6d/0xa0
[ 2759.127444]  [<ffffffff8141182e>] flush_epd_write_bio+0x4e/0x70
[ 2759.127444]  [<ffffffff81418d8d>] extent_writepages+0x5d/0x70
[ 2759.127444]  [<ffffffff813f84e0>] ? btrfs_releasepage+0x50/0x50
[ 2759.127444]  [<ffffffff81220ffe>] ? wbc_attach_and_unlock_inode+0x6e/0x170
[ 2759.127444]  [<ffffffff813f5047>] btrfs_writepages+0x27/0x30
[ 2759.127444]  [<ffffffff81178690>] do_writepages+0x20/0x30
[ 2759.127444]  [<ffffffff81167d85>] __filemap_fdatawrite_range+0xb5/0x100
[ 2759.127444]  [<ffffffff81168263>] filemap_fdatawrite_range+0x13/0x20
[ 2759.127444]  [<ffffffff81405a7b>] btrfs_fdatawrite_range+0x2b/0x70
[ 2759.127444]  [<ffffffff81405ba8>] btrfs_sync_file+0x88/0x490
[ 2759.127444]  [<ffffffff810751e2>] ? group_send_sig_info+0x42/0x80
[ 2759.127444]  [<ffffffff8107527d>] ? kill_pid_info+0x5d/0x90
[ 2759.127444]  [<ffffffff8107564a>] ? SYSC_kill+0xba/0x1d0
[ 2759.127444]  [<ffffffff811f2638>] ? __sb_end_write+0x58/0x80
[ 2759.127444]  [<ffffffff81225b9c>] vfs_fsync_range+0x4c/0xb0
[ 2759.127444]  [<ffffffff81002501>] ? syscall_trace_enter+0x201/0x2e0
[ 2759.127444]  [<ffffffff81225c1c>] vfs_fsync+0x1c/0x20
[ 2759.127444]  [<ffffffff81225c5d>] do_fsync+0x3d/0x70
[ 2759.127444]  [<ffffffff810029cb>] ? syscall_slow_exit_work+0xfb/0x100
[ 2759.127444]  [<ffffffff81225cc0>] SyS_fsync+0x10/0x20
[ 2759.127444]  [<ffffffff81002b65>] do_syscall_64+0x55/0xd0
[ 2759.127444]  [<ffffffff810026d7>] ? prepare_exit_to_usermode+0x37/0x40
[ 2759.127444]  [<ffffffff819ad986>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2759.150635] ---[ end trace 3b5b7e2ef61c3d02 ]---

I put a variant of your suggested patch in place, but my printk never
triggered.  Now that I've made it happen once, I'll make sure I can do it
over and over again.  This doesn't have the patches that Andy asked Davej to
try out yet, but I'll try them once I have a reliable reproducer.

diff --git a/kernel/fork.c b/kernel/fork.c
index 623259f..de95e19 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -165,7 +165,7 @@ void __weak arch_release_thread_stack(unsigned long *stack)
  * vmalloc() is a bit slow, and calling vfree() enough times will force a TLB
  * flush.  Try to minimize the number of calls by caching stacks.
  */
-#define NR_CACHED_STACKS 2
+#define NR_CACHED_STACKS 256
 static DEFINE_PER_CPU(struct vm_struct *, cached_stacks[NR_CACHED_STACKS]);
 #endif
 
@@ -173,7 +173,9 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
 {
 #ifdef CONFIG_VMAP_STACK
        void *stack;
+       char *p;
        int i;
+       int j;
 
        local_irq_disable();
        for (i = 0; i < NR_CACHED_STACKS; i++) {
@@ -183,7 +185,15 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
                        continue;
                this_cpu_write(cached_stacks[i], NULL);
 
+               p = s->addr;
+               for (j = 0; j < THREAD_SIZE; j++) {
+                       if (p[j] != 'c') {
+                               printk_ratelimited(KERN_CRIT "bad poison %c byte %d\n", p[j], j);
+                               break;
+                       }
+               }
                tsk->stack_vm_area = s;
+
                local_irq_enable();
                return s->addr;
        }
@@ -219,6 +229,7 @@ static inline void free_thread_stack(struct task_struct *tsk)
                int i;
 
                local_irq_save(flags);
+               memset(tsk->stack_vm_area->addr, 'c', THREAD_SIZE);
                for (i = 0; i < NR_CACHED_STACKS; i++) {
                        if (this_cpu_read(cached_stacks[i]))
                                continue;

  reply	other threads:[~2016-10-26 20:01 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-11 14:45 btrfs bio linked list corruption Dave Jones
2016-10-11 15:11 ` Al Viro
2016-10-11 15:19   ` Dave Jones
2016-10-11 15:20     ` Chris Mason
2016-10-11 15:49       ` Dave Jones
2016-10-11 15:54 ` Chris Mason
2016-10-11 16:25   ` Dave Jones
2016-10-12 13:47   ` Dave Jones
2016-10-12 14:40     ` Dave Jones
2016-10-12 14:42       ` Chris Mason
2016-10-13 18:16         ` Dave Jones
2016-10-13 21:18           ` Chris Mason
2016-10-13 21:56             ` Dave Jones
2016-10-16  0:42             ` Dave Jones
2016-10-18  1:07               ` Chris Mason
2016-10-18 22:42 ` Dave Jones
2016-10-18 23:12   ` Jens Axboe
2016-10-18 23:31     ` Chris Mason
2016-10-18 23:36       ` Jens Axboe
2016-10-18 23:39       ` Linus Torvalds
2016-10-18 23:42         ` Chris Mason
2016-10-19  0:10           ` Linus Torvalds
2016-10-19  0:19             ` Chris Mason
2016-10-19  0:28             ` Linus Torvalds
2016-10-20 22:48               ` Dave Jones
2016-10-19  1:05             ` Andy Lutomirski
2016-10-20 22:50               ` Dave Jones
2016-10-20 23:01                 ` Andy Lutomirski
2016-10-20 23:03                   ` Dave Jones
2016-10-20 23:23                     ` Andy Lutomirski
2016-10-21 20:02                       ` Dave Jones
2016-10-21 20:17                         ` Chris Mason
2016-10-21 20:23                           ` Dave Jones
2016-10-21 20:38                             ` Chris Mason
2016-10-21 20:41                               ` Josef Bacik
2016-10-21 21:11                                 ` Dave Jones
2016-10-22 15:20                         ` Dave Jones
2016-10-23 21:32                           ` Chris Mason
2016-10-24  4:40                             ` Dave Jones
2016-10-24 13:42                               ` Chris Mason
2016-10-26  0:27                                 ` Dave Jones
2016-10-26  1:33                                   ` Linus Torvalds
2016-10-26  1:39                                     ` Linus Torvalds
2016-10-26 16:30                                       ` Dave Jones
2016-10-26 16:48                                         ` Linus Torvalds
2016-10-26 18:18                                           ` Dave Jones
2016-10-26 18:42                                           ` Dave Jones
2016-10-26 19:06                                             ` Linus Torvalds
2016-10-26 20:00                                               ` Chris Mason [this message]
2016-10-26 21:52                                                 ` Chris Mason
2016-10-26 22:21                                                   ` Linus Torvalds
2016-10-26 22:40                                                     ` Dave Jones
2016-10-26 22:51                                                       ` Linus Torvalds
2016-10-26 22:55                                                         ` Jens Axboe
2016-10-26 22:58                                                         ` Linus Torvalds
2016-10-26 23:03                                                           ` Jens Axboe
2016-10-26 23:07                                                             ` Dave Jones
2016-10-26 23:08                                                             ` Linus Torvalds
2016-10-26 23:20                                                               ` Jens Axboe
2016-10-26 23:38                                                                 ` Chris Mason
2016-10-26 23:47                                                                   ` Dave Jones
2016-10-27  0:00                                                                     ` Jens Axboe
2016-10-27 13:33                                                                       ` Chris Mason
2016-10-31 18:55                                                                     ` Dave Jones
2016-10-31 19:35                                                                       ` Linus Torvalds
2016-10-31 19:44                                                                         ` Chris Mason
2016-11-06 16:55                                                                           ` btrfs btree_ctree_super fault Dave Jones
2016-11-08 14:59                                                                             ` Dave Jones
2016-11-08 15:08                                                                               ` Chris Mason
2016-11-10 14:35                                                                                 ` Dave Jones
2016-11-10 15:27                                                                                   ` Chris Mason
2016-11-23 19:34                                                                           ` bio linked list corruption Dave Jones
2016-11-23 19:58                                                                             ` Dave Jones
2016-12-01 15:32                                                                               ` btrfs_destroy_inode warn (outstanding extents) Dave Jones
2016-12-03 16:48                                                                                 ` Dave Jones
2016-12-07 16:15                                                                                   ` Dave Jones
2016-12-09 21:12                                                                                 ` Steven Rostedt
2016-12-04 23:04                                                                               ` bio linked list corruption Vegard Nossum
2016-12-05 11:10                                                                                 ` Vegard Nossum
2016-12-05 17:09                                                                                   ` Vegard Nossum
2016-12-05 17:21                                                                                     ` Dave Jones
2016-12-05 17:55                                                                                     ` Linus Torvalds
2016-12-05 19:11                                                                                       ` Vegard Nossum
2016-12-05 20:10                                                                                         ` Linus Torvalds
2016-12-05 20:35                                                                                           ` Linus Torvalds
2016-12-05 21:33                                                                                             ` Vegard Nossum
2016-12-06  8:42                                                                                               ` Vegard Nossum
2016-12-06  8:16                                                                                             ` Peter Zijlstra
2016-12-06  8:36                                                                                               ` Ingo Molnar
2016-12-06 16:33                                                                                               ` Linus Torvalds
2016-12-05 20:10                                                                                         ` Vegard Nossum
2016-12-05 18:11                                                                                 ` Andy Lutomirski
2016-12-05 18:25                                                                                   ` Linus Torvalds
2016-12-05 18:26                                                                                   ` Vegard Nossum
2016-10-26 23:19                                                             ` Chris Mason
2016-10-26 23:21                                                               ` Jens Axboe
2016-10-27  6:33                                                             ` Christoph Hellwig
2016-10-27 16:34                                                               ` Linus Torvalds
2016-10-27 16:36                                                                 ` Jens Axboe
2016-10-26 23:01                                                         ` Dave Jones
2016-10-26 23:05                                                           ` Jens Axboe
2016-10-26 22:52                                                       ` Jens Axboe
2016-10-26 22:07                                                 ` Linus Torvalds
2016-10-26 22:54                                                   ` Chris Mason
2016-10-27  5:41                                   ` Dave Chinner
2016-10-27 17:23                                     ` Dave Jones
2016-10-24 20:06                               ` Andy Lutomirski
2016-10-24 20:46                                 ` Linus Torvalds
2016-10-24 21:17                                   ` Linus Torvalds
2016-10-24 21:50                                     ` Linus Torvalds
2016-10-24 22:02                                       ` Chris Mason
2016-10-24 22:42                                   ` Andy Lutomirski
2016-10-25  0:00                                     ` Linus Torvalds
2016-10-25  1:09                                       ` Andy Lutomirski
2016-10-19 17:09           ` Philipp Hahn
2016-10-19 17:43             ` Linus Torvalds
2016-10-20  6:52               ` Ingo Molnar
2016-10-20  7:17                 ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=488f9edc-6a1c-2c68-0d33-d3aa32ece9a4@fb.com \
    --to=clm@fb.com \
    --cc=axboe@fb.com \
    --cc=davej@codemonkey.org.uk \
    --cc=david@fromorbit.com \
    --cc=dsterba@suse.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).