All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergei Trofimovich <slyich@gmail.com>
To: Josef Bacik <josef@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>,
	linux-btrfs@vger.kernel.org, Chris Mason <chris.mason@oracle.com>
Subject: Re: [PATCH] btrfs: fix warning in iput for bad-inode
Date: Tue, 30 Aug 2011 22:31:16 +0300	[thread overview]
Message-ID: <20110830223116.5fc21074@sf> (raw)
In-Reply-To: <4E5D25BD.3020901@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 11394 bytes --]

On Tue, 30 Aug 2011 14:02:37 -0400
Josef Bacik <josef@redhat.com> wrote:

> On 08/30/2011 12:53 PM, Sergei Trofimovich wrote:
> >> Running 'sync' program after the load does not finish and eats 
> >> 100%CPU busy-waiting for something in kernel.
> >> 
> >> It's easy to reproduce hang with patch for me. I just run
> >> liferea and sync after it. Without patch I haven't managed to
> >> hang btrfs up.
> > 
> > And I think it's another btrfs bug. I've managed to reproduce it 
> > _without_ your patch and _without_ autodefrag enabled by manually 
> > running the following commands: $ btrfs fi defrag 
> > file-with-20_000-extents $ sync
> > 
> > I think your patch just shuffles things a bit and forces
> > autodefrag races to pop-up sooner (which is good! :])
> > 
> 
> Sergei, can you do sysrq+w when this is happening, and maybe turn on
> the softlockup detector so we can see where sync is getting stuck?
> Thanks,

Sure. As I keep telling about 2 cases in IRC I will state both here explicitely:

==The First Issue (aka "The Hung sync()" case) ==

- it's an unpatched linus's v3.1-rc4-80-g0f43dd5
- /dev/root on / type btrfs (rw,noatime,compress=lzo)
- 50% full 30GB filesystem (usual nonmixed mode)

How I hung it:
    $ /usr/sbin/filefrag ~/.bogofilter/wordlist.db 
        /home/st/.bogofilter/wordlist.db: 19070 extents found
      the file is 138MB sqlite database for bayesian SPAM filter,
      it's being read and written every 20 minutes or so.
      Maybe, it was writtent even in defrag/sync time!
    $~/dev/git/btrfs-progs-unstable/btrfs fi defrag ~/.bogofilter/wordlist.db 
    $ sync
    ^C<hung in D-state>

I didn't try to reproduce it yet.
As for lockdep I'll try but I'm afraid I will fail to reproduce, but I'll try tomorrow.
I suspect I'll need to seriously fragment some file first down to such horrible state.

With help of David I've some (hopefully relevant) info:
    #!/bin/sh -x
    
    for i in $(ps aux|grep " D[+ ]\?"|awk '{print $2}'); do
            ps $i
            sudo cat /proc/$i/stack
    done

  PID TTY      STAT   TIME COMMAND
 1291 ?        D      0:00 [btrfs-endio-wri]
[<ffffffff8130055d>] btrfs_tree_read_lock+0x6d/0x120
[<ffffffff812b8e88>] btrfs_search_slot+0x698/0x8b0
[<ffffffff812c9e18>] btrfs_lookup_csum+0x68/0x190
[<ffffffff812ca10f>] __btrfs_lookup_bio_sums+0x1cf/0x3e0
[<ffffffff812ca371>] btrfs_lookup_bio_sums+0x11/0x20
[<ffffffff812d6a50>] btrfs_submit_bio_hook+0x140/0x170
[<ffffffff812ed594>] submit_one_bio+0x64/0xa0
[<ffffffff812f14f5>] extent_readpages+0xe5/0x100
[<ffffffff812d7aaa>] btrfs_readpages+0x1a/0x20
[<ffffffff810a6a02>] __do_page_cache_readahead+0x1d2/0x280
[<ffffffff810a6d8c>] ra_submit+0x1c/0x20
[<ffffffff810a6ebd>] ondemand_readahead+0x12d/0x270
[<ffffffff810a70cc>] page_cache_sync_readahead+0x2c/0x40
[<ffffffff81309987>] __load_free_space_cache+0x1a7/0x5b0
[<ffffffff81309e61>] load_free_space_cache+0xd1/0x190
[<ffffffff812be07b>] cache_block_group+0xab/0x290
[<ffffffff812c3def>] find_free_extent.clone.71+0x39f/0xab0
[<ffffffff812c5160>] btrfs_reserve_extent+0xe0/0x170
[<ffffffff812c56df>] btrfs_alloc_free_block+0xcf/0x330
[<ffffffff812b498d>] __btrfs_cow_block+0x11d/0x4a0
[<ffffffff812b4df8>] btrfs_cow_block+0xe8/0x1a0
[<ffffffff812b8965>] btrfs_search_slot+0x175/0x8b0
[<ffffffff812c9e18>] btrfs_lookup_csum+0x68/0x190
[<ffffffff812caf6e>] btrfs_csum_file_blocks+0xbe/0x670
[<ffffffff812d7d91>] add_pending_csums.clone.39+0x41/0x60
[<ffffffff812da528>] btrfs_finish_ordered_io+0x218/0x310
[<ffffffff812da635>] btrfs_writepage_end_io_hook+0x15/0x20
[<ffffffff8130c71a>] end_compressed_bio_write+0x7a/0xe0
[<ffffffff811146f8>] bio_endio+0x18/0x30
[<ffffffff812cd8fc>] end_workqueue_fn+0xec/0x120
[<ffffffff812fb0ac>] worker_loop+0xac/0x520
[<ffffffff8105d486>] kthread+0x96/0xa0
[<ffffffff815f9214>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff
  PID TTY      STAT   TIME COMMAND
 1296 ?        D      0:00 [btrfs-transacti]
[<ffffffff812d3d6e>] btrfs_commit_transaction+0x22e/0x870
[<ffffffff812cd5bb>] transaction_kthread+0x26b/0x280
[<ffffffff8105d486>] kthread+0x96/0xa0
[<ffffffff815f9214>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff
  PID TTY      STAT   TIME COMMAND
 2891 pts/0    D+     0:00 sync
[<ffffffff8109d069>] sleep_on_page+0x9/0x10
[<ffffffff8109d28e>] wait_on_page_bit+0x6e/0x80
[<ffffffff8109d9aa>] filemap_fdatawait_range+0xfa/0x190
[<ffffffff8109da62>] filemap_fdatawait+0x22/0x30
[<ffffffff8110ae5a>] sync_inodes_sb+0x17a/0x1e0
[<ffffffff8110eae8>] __sync_filesystem+0x78/0x80
[<ffffffff8110eb07>] sync_one_sb+0x17/0x20
[<ffffffff810e9cc5>] iterate_supers+0x95/0xf0
[<ffffffff8110ea2b>] sync_filesystems+0x1b/0x20
[<ffffffff8110eb8c>] sys_sync+0x1c/0x40
[<ffffffff815f813b>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
  PID TTY      STAT   TIME COMMAND
cat: /proc/4454/stack: Нет такого файла или каталога

disassebbled btrfs_tree_read_lock:

ffffffff813004f0 <btrfs_tree_read_lock> push   %rbp
ffffffff813004f1 <btrfs_tree_read_lock+0x1> mov    %gs:0xb5c0,%rax
ffffffff813004fa <btrfs_tree_read_lock+0xa> mov    %rsp,%rbp
ffffffff813004fd <btrfs_tree_read_lock+0xd> push   %r15
ffffffff813004ff <btrfs_tree_read_lock+0xf> push   %r14
ffffffff81300501 <btrfs_tree_read_lock+0x11> lea    0x6c(%rdi),%r14
ffffffff81300505 <btrfs_tree_read_lock+0x15> push   %r13
ffffffff81300507 <btrfs_tree_read_lock+0x17> lea    0x70(%rdi),%r13
ffffffff8130050b <btrfs_tree_read_lock+0x1b> push   %r12
ffffffff8130050d <btrfs_tree_read_lock+0x1d> lea    -0x60(%rbp),%r12
ffffffff81300511 <btrfs_tree_read_lock+0x21> push   %rbx
ffffffff81300512 <btrfs_tree_read_lock+0x22> lea    0x18(%r12),%r15
ffffffff81300517 <btrfs_tree_read_lock+0x27> lea    0x5c(%rdi),%rbx
ffffffff8130051b <btrfs_tree_read_lock+0x2b> sub    $0x58,%rsp
ffffffff8130051f <btrfs_tree_read_lock+0x2f> mov    %rdi,-0x70(%rbp)
ffffffff81300523 <btrfs_tree_read_lock+0x33> mov    %rax,-0x68(%rbp)
ffffffff81300527 <btrfs_tree_read_lock+0x37> mov    %rax,-0x78(%rbp)
ffffffff8130052b <btrfs_tree_read_lock+0x3b> nopl   0x0(%rax,%rax,1)
ffffffff81300530 <btrfs_tree_read_lock+0x40> mov    (%rbx),%eax
ffffffff81300532 <btrfs_tree_read_lock+0x42> test   %eax,%eax
ffffffff81300534 <btrfs_tree_read_lock+0x44> je     ffffffff8130057e <btrfs_tree_read_lock+0x8e>
ffffffff81300536 <btrfs_tree_read_lock+0x46> mov    -0x68(%rbp),%rax
ffffffff8130053a <btrfs_tree_read_lock+0x4a> movq   $0x0,-0x60(%rbp)
ffffffff81300542 <btrfs_tree_read_lock+0x52> movq   $0xffffffff8105d950,-0x50(%rbp)
ffffffff8130054a <btrfs_tree_read_lock+0x5a> mov    %r15,-0x48(%rbp)
ffffffff8130054e <btrfs_tree_read_lock+0x5e> mov    %r15,-0x40(%rbp)
ffffffff81300552 <btrfs_tree_read_lock+0x62> mov    %rax,-0x58(%rbp)
ffffffff81300556 <btrfs_tree_read_lock+0x66> jmp    ffffffff8130055d <btrfs_tree_read_lock+0x6d>
ffffffff81300558 <btrfs_tree_read_lock+0x68> callq  ffffffff815f4670 <schedule>
ffffffff8130055d <btrfs_tree_read_lock+0x6d> mov    $0x2,%edx
ffffffff81300562 <btrfs_tree_read_lock+0x72> mov    %r12,%rsi
ffffffff81300565 <btrfs_tree_read_lock+0x75> mov    %r13,%rdi
ffffffff81300568 <btrfs_tree_read_lock+0x78> callq  ffffffff8105db90 <prepare_to_wait>
ffffffff8130056d <btrfs_tree_read_lock+0x7d> mov    (%rbx),%eax
ffffffff8130056f <btrfs_tree_read_lock+0x7f> test   %eax,%eax
ffffffff81300571 <btrfs_tree_read_lock+0x81> jne    ffffffff81300558 <btrfs_tree_read_lock+0x68>
ffffffff81300573 <btrfs_tree_read_lock+0x83> mov    %r12,%rsi
ffffffff81300576 <btrfs_tree_read_lock+0x86> mov    %r13,%rdi
ffffffff81300579 <btrfs_tree_read_lock+0x89> callq  ffffffff8105da80 <finish_wait>
ffffffff8130057e <btrfs_tree_read_lock+0x8e> mov    %r14,%rdi
ffffffff81300581 <btrfs_tree_read_lock+0x91> callq  ffffffff815f76d0 <_raw_read_lock>
ffffffff81300586 <btrfs_tree_read_lock+0x96> mov    (%rbx),%eax
ffffffff81300588 <btrfs_tree_read_lock+0x98> test   %eax,%eax
ffffffff8130058a <btrfs_tree_read_lock+0x9a> je     ffffffff813005eb <btrfs_tree_read_lock+0xfb>
ffffffff8130058c <btrfs_tree_read_lock+0x9c> mov    %r14,%rdi
ffffffff8130058f <btrfs_tree_read_lock+0x9f> callq  ffffffff815f73f0 <_raw_read_unlock>
ffffffff81300594 <btrfs_tree_read_lock+0xa4> mov    (%rbx),%eax
ffffffff81300596 <btrfs_tree_read_lock+0xa6> test   %eax,%eax
ffffffff81300598 <btrfs_tree_read_lock+0xa8> je     ffffffff81300530 <btrfs_tree_read_lock+0x40>
ffffffff8130059a <btrfs_tree_read_lock+0xaa> mov    -0x78(%rbp),%rax
ffffffff8130059e <btrfs_tree_read_lock+0xae> movq   $0x0,-0x60(%rbp)
ffffffff813005a6 <btrfs_tree_read_lock+0xb6> movq   $0xffffffff8105d950,-0x50(%rbp)
ffffffff813005ae <btrfs_tree_read_lock+0xbe> mov    %r15,-0x48(%rbp)
ffffffff813005b2 <btrfs_tree_read_lock+0xc2> mov    %r15,-0x40(%rbp)
ffffffff813005b6 <btrfs_tree_read_lock+0xc6> mov    %rax,-0x58(%rbp)
ffffffff813005ba <btrfs_tree_read_lock+0xca> jmp    ffffffff813005c5 <btrfs_tree_read_lock+0xd5>
ffffffff813005bc <btrfs_tree_read_lock+0xcc> nopl   0x0(%rax)
ffffffff813005c0 <btrfs_tree_read_lock+0xd0> callq  ffffffff815f4670 <schedule>
ffffffff813005c5 <btrfs_tree_read_lock+0xd5> mov    $0x2,%edx
ffffffff813005ca <btrfs_tree_read_lock+0xda> mov    %r12,%rsi
ffffffff813005cd <btrfs_tree_read_lock+0xdd> mov    %r13,%rdi
ffffffff813005d0 <btrfs_tree_read_lock+0xe0> callq  ffffffff8105db90 <prepare_to_wait>
ffffffff813005d5 <btrfs_tree_read_lock+0xe5> mov    (%rbx),%eax
ffffffff813005d7 <btrfs_tree_read_lock+0xe7> test   %eax,%eax
ffffffff813005d9 <btrfs_tree_read_lock+0xe9> jne    ffffffff813005c0 <btrfs_tree_read_lock+0xd0>
ffffffff813005db <btrfs_tree_read_lock+0xeb> mov    %r12,%rsi
ffffffff813005de <btrfs_tree_read_lock+0xee> mov    %r13,%rdi
ffffffff813005e1 <btrfs_tree_read_lock+0xf1> callq  ffffffff8105da80 <finish_wait>
ffffffff813005e6 <btrfs_tree_read_lock+0xf6> jmpq   ffffffff81300530 <btrfs_tree_read_lock+0x40>
ffffffff813005eb <btrfs_tree_read_lock+0xfb> mov    -0x70(%rbp),%rax
ffffffff813005ef <btrfs_tree_read_lock+0xff> lock incl 0x58(%rax)
ffffffff813005f3 <btrfs_tree_read_lock+0x103> lock incl 0x64(%rax)
ffffffff813005f7 <btrfs_tree_read_lock+0x107> add    $0x58,%rsp
ffffffff813005fb <btrfs_tree_read_lock+0x10b> pop    %rbx
ffffffff813005fc <btrfs_tree_read_lock+0x10c> pop    %r12
ffffffff813005fe <btrfs_tree_read_lock+0x10e> pop    %r13
ffffffff81300600 <btrfs_tree_read_lock+0x110> pop    %r14
ffffffff81300602 <btrfs_tree_read_lock+0x112> pop    %r15
ffffffff81300604 <btrfs_tree_read_lock+0x114> leaveq 
ffffffff81300605 <btrfs_tree_read_lock+0x115> retq   
ffffffff81300606 <btrfs_tree_read_lock+0x116> nopw   %cs:0x0(%rax,%rax,1)

==The Second Issue (aka "The Busy Looping sync()" case) ==

The box is different from first, so conditions are a bit different.

- it's a _patched_ linus's 3.1.0-rc3-00271-g3230739 (+patch from $SUBJ on top)
- /dev/root on / type btrfs (rw,noatime,autodefrag)
  (note autodefrag!)
- 15% full 594GB filesystem (usual nonmixed mode)

How I hung it:
    $ liferea
    <wait it to calm down. it does a lot of SQLite reads/writes>
    $ sync
    Got CPU is 100% loaded

Perfectly reproducible.
I don't remembed it it exits or hangs. Will send detailed followup today.

-- 

  Sergei

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  reply	other threads:[~2011-08-30 19:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-17 18:56 [PATCH] btrfs: fix warning in iput for bad-inode Konstantin Khlebnikov
2011-08-29  3:34 ` Sergei Trofimovich
2011-08-30 16:53   ` Sergei Trofimovich
2011-08-30 18:02     ` Josef Bacik
2011-08-30 19:31       ` Sergei Trofimovich [this message]
2011-08-30 19:40         ` Josef Bacik
2011-08-30 19:45           ` Josef Bacik
2011-08-30 20:46             ` Sergei Trofimovich
2011-08-30 21:17               ` Sergei Trofimovich
2011-09-02 17:01                 ` slyich
2011-09-07  9:18                   ` David Sterba
2011-09-01  2:45 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110830223116.5fc21074@sf \
    --to=slyich@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=josef@redhat.com \
    --cc=khlebnikov@openvz.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.