linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Cc: miaox@cn.fujitsu.com, Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org,
	zheng.yan@oracle.com, linux-fsdevel@vger.kernel.org
Subject: Re: kernel BUG at fs/btrfs/extent-tree.c:1353
Date: Thu, 29 Jul 2010 20:54:58 +0200	[thread overview]
Message-ID: <4C51CE82.7040901@kernel.dk> (raw)
In-Reply-To: <201007291909.51978.johannes.hirte@fem.tu-ilmenau.de>

On 07/29/2010 07:09 PM, Johannes Hirte wrote:
> Am Donnerstag 22 Juli 2010, 20:07:23 schrieb Johannes Hirte:
>> Am Montag 19 Juli 2010, 10:01:46 schrieb Miao Xie:
>>> On Thu, 15 Jul 2010 20:14:51 +0200, Johannes Hirte wrote:
>>>> Am Donnerstag 15 Juli 2010, 02:11:04 schrieb Dave Chinner:
>>>>> On Wed, Jul 14, 2010 at 05:25:23PM +0200, Johannes Hirte wrote:
>>>>>> Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
>>>>>> I'm not sure if btrfs is to blame for this error. After the errors I
>>>>>> switched to XFS on this system and got now this error:
>>>>>>
>>>>>> ls -l .kde4/share/apps/akregator/data/
>>>>>> ls: cannot access .kde4/share/apps/akregator/data/feeds.opml:
>>>>>> Structure needs cleaning
>>>>>> total 4
>>>>>> ?????????? ? ?    ?        ?            ? feeds.opml
>>>>>
>>>>> What is the error reported in dmesg when the XFS filesytem shuts down?
>>>>
>>>> Nothing. I double checked the logs. There are only the messages when
>>>> mounting the filesystem. No other errors are reported than the
>>>> inaccessible file and the output from xfs_check.
>>>
>>> Is there anything wrong with your disks or memory?
>>> Sometimes the bad memory can break the filesystem. I have met this kind
>>> of problem some time ago.
>>
>> I don't think that's the case. I've checked the RAM with memtest86+ and got
>> no errors. I got the errors with two different disks, the first one with
>> btrfs the second one now with XFS. Before changing to the second disk,
>> I've run badblocks on it to be sure it has no errors.
> 
> I think I've found it. The bug was introduced by 
> 
> commit 7f0e7bed936a0c422641a046551829a01341dd80
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Tue Jun 8 18:14:34 2010 +0200
> 
>     writeback: fix writeback completion notifications
>     
>     The code dealing with bdi_work->state and completion of a bdi_work is a
>     major mess currently.  This patch makes sure we directly use one set of
>     flags to deal with it, and use it consistently, which means:
>     
>      - always notify about completion from the rcu callback.  We only ever
>        wait for it from on-stack callers, so this simplification does not
>        even cause a theoretical slowdown currently.  It also makes sure we
>        don't miss out on the notification if we ever add other callers to
>        wait for it.
>      - make earlier completion notification depending on the on-stack
>        allocation, not the sync mode.  If we introduce new callers that
>        want to do WB_SYNC_NONE writeback from on-stack callers this will
>        be nessecary.
>     
>     Also rename bdi_wait_on_work_clear to bdi_wait_on_work_done and inline
>     a few small functions into their only caller to make the code
>     understandable.
>     
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
> 
> and seems to be fixed by
> 
> commit 83ba7b071f30f7c01f72518ad72d5cd203c27502
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Tue Jul 6 08:59:53 2010 +0200
> 
>     writeback: simplify the write back thread queue
>     
>     First remove items from work_list as soon as we start working on them.This
>     means we don't have to track any pending or visited state and can get
>     rid of all the RCU magic freeing the work items - we can simply free
>     them once the operation has finished.  Second use a real completion for
>     tracking synchronous requests - if the caller sets the completion pointer
>     we complete it, otherwise use it as a boolean indicator that we can free
>     the work item directly.  Third unify struct wb_writeback_args and struct
>     bdi_work into a single data structure, wb_writeback_work.  Previous we
>     set all parameters into a struct wb_writeback_args, copied it into
>     struct bdi_work, copied it again on the stack to use it there.  Instead
>     of just allocate one structure dynamically or on the stack and use it
>     all the way through the stack.
>     
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
> 
> I was able to reproduce the bug by unpacking a big tar-file and
> deleting this files multiple times. Normally with btrfs the kernel
> crashed within 20 runs. After commit
> 83ba7b071f30f7c01f72518ad72d5cd203c27502 it survived more than 500
> runs.

Makes sense, that first commit would potentially pass in stack cruft as
the wbc arg. So I think we can safely consider it fixed now.

-- 
Jens Axboe

  reply	other threads:[~2010-07-29 18:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-08 14:27 kernel BUG at fs/btrfs/extent-tree.c:1353 Johannes Hirte
2010-07-08 14:31 ` Chris Mason
2010-07-08 15:40   ` Johannes Hirte
2010-07-14 15:25   ` Johannes Hirte
2010-07-15  0:11     ` Dave Chinner
2010-07-15 18:14       ` Johannes Hirte
2010-07-16 14:59         ` Johannes Hirte
2010-07-19  8:01         ` Miao Xie
2010-07-22 18:07           ` Johannes Hirte
2010-07-23 11:02             ` Daniel J Blueman
2010-07-23 11:14             ` Bob Copeland
2010-07-29 17:09             ` Johannes Hirte
2010-07-29 18:54               ` Jens Axboe [this message]
2010-08-13 12:19   ` David Woodhouse
2010-07-11 12:28 ` Johannes Hirte
2010-07-13 12:23   ` Johannes Hirte
2010-07-15 18:30     ` csum errors Johannes Hirte
2010-07-15 19:03       ` Chris Mason
2010-07-15 19:32         ` Johannes Hirte
2010-07-15 19:35           ` Chris Mason
2010-07-15 20:00             ` Johannes Hirte
2010-07-17  4:55             ` Brian Rogers
2010-08-10 21:06               ` Sebastian 'gonX' Jensen
2010-08-14  7:05                 ` Brian Rogers
2010-08-14 11:10                   ` Sebastian 'gonX' Jensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C51CE82.7040901@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=johannes.hirte@fem.tu-ilmenau.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miaox@cn.fujitsu.com \
    --cc=zheng.yan@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).