All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Jones <davej@codemonkey.org.uk>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Chris Mason <clm@fb.com>, Jens Axboe <axboe@fb.com>,
	Al Viro <viro@zeniv.linux.org.uk>, Josef Bacik <jbacik@fb.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: bio linked list corruption.
Date: Fri, 21 Oct 2016 16:02:45 -0400	[thread overview]
Message-ID: <20161021200245.kahjzgqzdfyoe3uz@codemonkey.org.uk> (raw)
In-Reply-To: <CALCETrVHXPw1PyKSQja07V+MxHJPcDkaLJ1rPF2Z3286tHY2Xw@mail.gmail.com>

On Thu, Oct 20, 2016 at 04:23:32PM -0700, Andy Lutomirski wrote:
 > On Thu, Oct 20, 2016 at 4:03 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
 > > On Thu, Oct 20, 2016 at 04:01:12PM -0700, Andy Lutomirski wrote:
 > >  > On Thu, Oct 20, 2016 at 3:50 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
 > >  > > On Tue, Oct 18, 2016 at 06:05:57PM -0700, Andy Lutomirski wrote:
 > >  > >
 > >  > >  > One possible debugging approach would be to change:
 > >  > >  >
 > >  > >  > #define NR_CACHED_STACKS 2
 > >  > >  >
 > >  > >  > to
 > >  > >  >
 > >  > >  > #define NR_CACHED_STACKS 0
 > >  > >  >
 > >  > >  > in kernel/fork.c and to set CONFIG_DEBUG_PAGEALLOC=y.  The latter will
 > >  > >  > force an immediate TLB flush after vfree.
 > >  > >
 > >  > > I can give that idea some runtime, but it sounds like this a case where
 > >  > > we're trying to prove a negative, and that'll just run and run ? In which case I
 > >  > > might do this when I'm travelling on Sunday.
 > >  >
 > >  > The idea is that the stack will be free and unmapped immediately upon
 > >  > process exit if configured like this so that bogus stack accesses (by
 > >  > the CPU, not DMA) would OOPS immediately.
 > >
 > > oh, misparsed. ok, I can definitely get behind that idea then.
 > > I'll do that next.
 > >
 > 
 > It could be worth trying this, too:
 > 
 > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=174531fef4e8
 > 
 > It occurred to me that the current code is a little bit fragile.

It's been nearly 24hrs with the above changes, and it's been pretty much
silent the whole time.

The only thing of note over that time period has been a btrfs lockdep
warning that's been around for a while, and occasional btrfs checksum
failures, which I've been seeing for a while, but seem to have gotten
worse since 4.8.

I'm pretty confident in the disk being ok in this machine, so I think
the checksum warnings are bogus.  Chris suggested they may be the result
of memory corruption, but there's little else going on.


BTRFS warning (device sda3): csum failed ino 130654 off 0 csum 2566472073 expected csum 3008371513
BTRFS warning (device sda3): csum failed ino 131057 off 4096 csum 3563910319 expected csum 738595262
BTRFS warning (device sda3): csum failed ino 131176 off 4096 csum 1344477721 expected csum 441864825
BTRFS warning (device sda3): csum failed ino 131241 off 245760 csum 3576232181 expected csum 2566472073
BTRFS warning (device sda3): csum failed ino 131429 off 0 csum 1494450239 expected csum 2646577722
BTRFS warning (device sda3): csum failed ino 131471 off 0 csum 3949539320 expected csum 3828807800
BTRFS warning (device sda3): csum failed ino 131471 off 4096 csum 3475108475 expected csum 2566472073
BTRFS warning (device sda3): csum failed ino 131471 off 958464 csum 142982740 expected csum 2566472073
BTRFS warning (device sda3): csum failed ino 131471 off 0 csum 3949539320 expected csum 3828807800
BTRFS warning (device sda3): csum failed ino 131532 off 270336 csum 3138898528 expected csum 2566472073
BTRFS warning (device sda3): csum failed ino 131532 off 1249280 csum 2169165042 expected csum 2566472073
BTRFS warning (device sda3): csum failed ino 131649 off 16384 csum 2914965650 expected csum 1425742005


A curious thing: the expected csum 2566472073 turns up a number of times for different inodes, and gets
differing actual csums each time.  I suppose this could be something like a block of all zeros in multiple files,
but it struck me as surprising.

btrfs people: is there an easy way to map those inodes to a filename ? I'm betting those are the
test files that trinity generates. If so, it might point to a race somewhere.

	Dave


  reply	other threads:[~2016-10-21 20:02 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-11 14:45 btrfs bio linked list corruption Dave Jones
2016-10-11 15:11 ` Al Viro
2016-10-11 15:19   ` Dave Jones
2016-10-11 15:20     ` Chris Mason
2016-10-11 15:49       ` Dave Jones
2016-10-11 15:54 ` Chris Mason
2016-10-11 16:25   ` Dave Jones
2016-10-12 13:47   ` Dave Jones
2016-10-12 14:40     ` Dave Jones
2016-10-12 14:42       ` Chris Mason
2016-10-13 18:16         ` Dave Jones
2016-10-13 21:18           ` Chris Mason
2016-10-13 21:56             ` Dave Jones
2016-10-16  0:42             ` Dave Jones
2016-10-18  1:07               ` Chris Mason
2016-10-18 22:42 ` Dave Jones
2016-10-18 23:12   ` Jens Axboe
2016-10-18 23:31     ` Chris Mason
2016-10-18 23:36       ` Jens Axboe
2016-10-18 23:39       ` Linus Torvalds
2016-10-18 23:42         ` Chris Mason
2016-10-19  0:10           ` Linus Torvalds
2016-10-19  0:19             ` Chris Mason
2016-10-19  0:28             ` Linus Torvalds
2016-10-20 22:48               ` Dave Jones
2016-10-19  1:05             ` Andy Lutomirski
2016-10-20 22:50               ` Dave Jones
2016-10-20 23:01                 ` Andy Lutomirski
2016-10-20 23:03                   ` Dave Jones
2016-10-20 23:23                     ` Andy Lutomirski
2016-10-21 20:02                       ` Dave Jones [this message]
2016-10-21 20:17                         ` Chris Mason
2016-10-21 20:23                           ` Dave Jones
2016-10-21 20:38                             ` Chris Mason
2016-10-21 20:41                               ` Josef Bacik
2016-10-21 21:11                                 ` Dave Jones
2016-10-22 15:20                         ` Dave Jones
2016-10-23 21:32                           ` Chris Mason
2016-10-24  4:40                             ` Dave Jones
2016-10-24 13:42                               ` Chris Mason
2016-10-26  0:27                                 ` Dave Jones
2016-10-26  1:33                                   ` Linus Torvalds
2016-10-26  1:39                                     ` Linus Torvalds
2016-10-26 16:30                                       ` Dave Jones
2016-10-26 16:48                                         ` Linus Torvalds
2016-10-26 18:18                                           ` Dave Jones
2016-10-26 18:42                                           ` Dave Jones
2016-10-26 19:06                                             ` Linus Torvalds
2016-10-26 20:00                                               ` Chris Mason
2016-10-26 21:52                                                 ` Chris Mason
2016-10-26 22:21                                                   ` Linus Torvalds
2016-10-26 22:40                                                     ` Dave Jones
2016-10-26 22:51                                                       ` Linus Torvalds
2016-10-26 22:55                                                         ` Jens Axboe
2016-10-26 22:58                                                         ` Linus Torvalds
2016-10-26 23:03                                                           ` Jens Axboe
2016-10-26 23:07                                                             ` Dave Jones
2016-10-26 23:08                                                             ` Linus Torvalds
2016-10-26 23:20                                                               ` Jens Axboe
2016-10-26 23:38                                                                 ` Chris Mason
2016-10-26 23:47                                                                   ` Dave Jones
2016-10-27  0:00                                                                     ` Jens Axboe
2016-10-27 13:33                                                                       ` Chris Mason
2016-10-31 18:55                                                                     ` Dave Jones
2016-10-31 19:35                                                                       ` Linus Torvalds
2016-10-31 19:44                                                                         ` Chris Mason
2016-11-06 16:55                                                                           ` btrfs btree_ctree_super fault Dave Jones
2016-11-08 14:59                                                                             ` Dave Jones
2016-11-08 15:08                                                                               ` Chris Mason
2016-11-10 14:35                                                                                 ` Dave Jones
2016-11-10 15:27                                                                                   ` Chris Mason
2016-11-23 19:34                                                                           ` bio linked list corruption Dave Jones
2016-11-23 19:58                                                                             ` Dave Jones
2016-12-01 15:32                                                                               ` btrfs_destroy_inode warn (outstanding extents) Dave Jones
2016-12-03 16:48                                                                                 ` Dave Jones
2016-12-07 16:15                                                                                   ` Dave Jones
2016-12-09 21:12                                                                                 ` Steven Rostedt
2016-12-04 23:04                                                                               ` bio linked list corruption Vegard Nossum
2016-12-05 11:10                                                                                 ` Vegard Nossum
2016-12-05 17:09                                                                                   ` Vegard Nossum
2016-12-05 17:21                                                                                     ` Dave Jones
2016-12-05 17:55                                                                                     ` Linus Torvalds
2016-12-05 19:11                                                                                       ` Vegard Nossum
2016-12-05 20:10                                                                                         ` Linus Torvalds
2016-12-05 20:35                                                                                           ` Linus Torvalds
2016-12-05 21:33                                                                                             ` Vegard Nossum
2016-12-06  8:42                                                                                               ` Vegard Nossum
2016-12-06  8:16                                                                                             ` Peter Zijlstra
2016-12-06  8:36                                                                                               ` Ingo Molnar
2016-12-06 16:33                                                                                               ` Linus Torvalds
2016-12-05 20:10                                                                                         ` Vegard Nossum
2016-12-05 18:11                                                                                 ` Andy Lutomirski
2016-12-05 18:25                                                                                   ` Linus Torvalds
2016-12-05 18:26                                                                                   ` Vegard Nossum
2016-10-26 23:19                                                             ` Chris Mason
2016-10-26 23:21                                                               ` Jens Axboe
2016-10-27  6:33                                                             ` Christoph Hellwig
2016-10-27 16:34                                                               ` Linus Torvalds
2016-10-27 16:36                                                                 ` Jens Axboe
2016-10-26 23:01                                                         ` Dave Jones
2016-10-26 23:05                                                           ` Jens Axboe
2016-10-26 22:52                                                       ` Jens Axboe
2016-10-26 22:07                                                 ` Linus Torvalds
2016-10-26 22:54                                                   ` Chris Mason
2016-10-27  5:41                                   ` Dave Chinner
2016-10-27 17:23                                     ` Dave Jones
2016-10-24 20:06                               ` Andy Lutomirski
2016-10-24 20:46                                 ` Linus Torvalds
2016-10-24 21:17                                   ` Linus Torvalds
2016-10-24 21:50                                     ` Linus Torvalds
2016-10-24 22:02                                       ` Chris Mason
2016-10-24 22:42                                   ` Andy Lutomirski
2016-10-25  0:00                                     ` Linus Torvalds
2016-10-25  1:09                                       ` Andy Lutomirski
2016-10-19 17:09           ` Philipp Hahn
2016-10-19 17:43             ` Linus Torvalds
2016-10-20  6:52               ` Ingo Molnar
2016-10-20  7:17                 ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161021200245.kahjzgqzdfyoe3uz@codemonkey.org.uk \
    --to=davej@codemonkey.org.uk \
    --cc=axboe@fb.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.