All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
To: linux-btrfs@vger.kernel.org
Subject: Re: hang on 3.9, 3.10-rc5
Date: Fri, 21 Jun 2013 16:54:54 +0200	[thread overview]
Message-ID: <20130621165454.5a87e2cb@fem.tu-ilmenau.de> (raw)
In-Reply-To: <loom.20130618T190955-784@post.gmane.org>

On Tue, 18 Jun 2013 17:19:04 +0000 (UTC)
Jon Nelson <jnelson+btrfs@jamponi.net> wrote:

> Josef Bacik <jbacik <at> fusionio.com> writes:
> 
> > 
> > On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
> > > I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.
> > > Is this is a known problem?  In this case there is no
> > > powercycling; just a regular ceph-osd workload.
> 
> ..
> 
> 
> I'm able to cause a complete kernel hang by defrag'ing even one 
> file on 3.9.X (3.9.0 through 3.9.4, so far).

I see similar behavior with autodefrag enabled. When fetching mails
with claws (piped through bogofilter) the whole system got stuck more
or less. I can switch between the tasks but everything involving I/O is
hanging. Most time I was able to solve this with the sync command in a
shell. I got only one time a backtrace from hung task checker:

Jun 20 12:37:47 localhost kernel: INFO: task btrfs-cleaner:771 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: btrfs-cleaner   D ffff88011fc91740     0   771      2 0x00000000
Jun 20 12:37:47 localhost kernel: ffff88011aaff530 0000000000000002 ffff880119929fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff880119929fd8 ffff88011aabddc0 ffff880119928000 ffff880119929cb8
Jun 20 12:37:47 localhost kernel: ffff8800aaac22b4 ffff8800aaac22b0 0000000000000000 ffff8800aaac22b8
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff8109856d>] ? ondemand_readahead+0x15d/0x200
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff81171579>] ? btrfs_defrag_file+0x41a/0xa4e
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff81159eb9>] ? btrfs_run_defrag_inodes+0x1f7/0x2d3
Jun 20 12:37:47 localhost kernel: [<ffffffff8115100e>] ? btrfs_run_delayed_iputs+0x44/0xbe
Jun 20 12:37:47 localhost kernel: [<ffffffff811459d5>] ? cleaner_kthread+0x89/0xf3
Jun 20 12:37:47 localhost kernel: [<ffffffff8114594c>] ? transaction_kthread+0x17a/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff81042c00>] ? kthread+0x7d/0x85
Jun 20 12:37:47 localhost kernel: [<ffffffff81040000>] ? thaw_workqueues+0xd3/0xff
Jun 20 12:37:47 localhost kernel: [<ffffffff81042b83>] ? __kthread_parkme+0x59/0x59
Jun 20 12:37:47 localhost kernel: [<ffffffff8148942c>] ? ret_from_fork+0x7c/0xb0
Jun 20 12:37:47 localhost kernel: [<ffffffff81042b83>] ? __kthread_parkme+0x59/0x59
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2384 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc11740     0  2384   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800c82dce20 0000000000000002 ffff8800c0c55fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0c55fd8 ffffffff81a10400 ffff8800c0c54000 ffff8800c0c55e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2385 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0  2385   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800c82dd5f0 0000000000000002 ffff8800c0d1dfd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0d1dfd8 ffff88011aabddc0 ffff8800c0d1c000 ffff8800c0d1de90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2393 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0  2393   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800c82df530 0000000000000002 ffff8800c0daffd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0daffd8 ffff88011aabddc0 ffff8800ddd9ade0 ffff88011816e1e8
Jun 20 12:37:47 localhost kernel: 0000000000000000 0000000000000000 00000000003c0000 0000000000000000
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff8114aa11>] ? wait_current_trans.isra.19+0xaa/0xdc
Jun 20 12:37:47 localhost kernel: [<ffffffff81043492>] ? finish_wait+0x60/0x60
Jun 20 12:37:47 localhost kernel: [<ffffffff8114c663>] ? start_transaction+0x3b9/0x3e9
Jun 20 12:37:47 localhost kernel: [<ffffffff8114d71b>] ? __unlink_start_trans+0x46/0x37f
Jun 20 12:37:47 localhost kernel: [<ffffffff810d9b18>] ? __d_lookup+0xdb/0xed
Jun 20 12:37:47 localhost kernel: [<ffffffff810d242d>] ? path_lookupat+0x6d/0x2a7
Jun 20 12:37:47 localhost kernel: [<ffffffff811537be>] ? btrfs_unlink+0x1f/0x91
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3ee2>] ? vfs_unlink+0x61/0xa8
Jun 20 12:37:47 localhost kernel: [<ffffffff810d4000>] ? do_unlinkat+0xd7/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2398 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff880106fc0000     0  2398   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff880106fc0000 0000000000000002 ffff8800c0fa3fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0fa3fd8 ffff8800c82dd5f0 ffff8800c0fa2000 ffff8800c0fa3e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2400 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0  2400   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800d8ebc650 0000000000000002 ffff8800bece9fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800bece9fd8 ffff88011aabddc0 ffff8800bece8000 ffff8800bece9e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:12930 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0 12930   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800acf22710 0000000000000002 ffff8801018a1fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8801018a1fd8 ffff88011aabddc0 ffff8801018a0000 ffff8801018a1e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:18283 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc11740     0 18283   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8801030f8000 0000000000000002 ffff8800a8a9dfd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800a8a9dfd8 ffffffff81a10400 ffff8800a8a9c000 ffff8800a8a9de90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:21478 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc11740     0 21478   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff88011aafc650 0000000000000002 ffff88010852bfd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff88010852bfd8 ffffffff81a10400 ffff88010852a000 ffff88010852be90
Jun 20 12:37:49 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:49 localhost kernel: Call Trace:
Jun 20 12:37:49 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:49 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:49 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:49 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:49 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:49 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:49 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:49 localhost kernel: INFO: task bogofilter:21625 blocked for more than 120 seconds.
Jun 20 12:37:49 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:49 localhost kernel: bogofilter      D ffff88011fc91740     0 21625      1 0x00000000
Jun 20 12:37:49 localhost kernel: ffff88009f116d60 0000000000000002 ffff8800296e7fd8 0000000000011740
Jun 20 12:37:49 localhost kernel: ffff8800296e7fd8 ffff88011aabddc0 ffff88004b568da8 ffff88002970a800
Jun 20 12:37:49 localhost kernel: 0000000000000000 ffff8800aaa2a2e0 ffff8800aaa2a350 ffff88002970ac60
Jun 20 12:37:49 localhost kernel: Call Trace:
Jun 20 12:37:49 localhost kernel: [<ffffffff811763fd>] ? log_one_extent.isra.24+0x441/0x58b
Jun 20 12:37:49 localhost kernel: [<ffffffff81043492>] ? finish_wait+0x60/0x60
Jun 20 12:37:49 localhost kernel: [<ffffffff81177ee9>] ? btrfs_log_inode+0x5d0/0x71e
Jun 20 12:37:49 localhost kernel: [<ffffffff811791f5>] ? btrfs_log_inode_parent+0x1f1/0x2dd
Jun 20 12:37:49 localhost kernel: [<ffffffff81179c34>] ? btrfs_log_dentry_safe+0x31/0x45
Jun 20 12:37:49 localhost kernel: [<ffffffff81158e69>] ? btrfs_sync_file+0x11a/0x1e2
Jun 20 12:37:49 localhost kernel: [<ffffffff810e8211>] ? do_fsync+0x2b/0x50
Jun 20 12:37:49 localhost kernel: [<ffffffff8148966c>] ? tracesys+0x7e/0xe2
Jun 20 12:37:49 localhost kernel: [<ffffffff810e8402>] ? SyS_fdatasync+0xa/0xd
Jun 20 12:37:49 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2

After disabling autodefrag it didn't happen again. The kernel is the latest git.

      parent reply	other threads:[~2013-06-21 15:04 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-11 15:43 hang on 3.9, 3.10-rc5 Sage Weil
2013-06-11 16:21 ` Chris Mason
     [not found]   ` <alpine.DEB.2.00.1306121317300.27596@cobra.newdream.net>
2013-06-18 15:56     ` Sage Weil
2013-06-18 16:36       ` Chris Mason
2013-06-18 16:37 ` Josef Bacik
2013-06-18 16:59   ` Chris Mason
2013-06-20  3:44     ` Sage Weil
2013-06-20 21:56       ` Sage Weil
2013-06-21  0:42         ` Chris Mason
2013-06-21  1:00           ` Sage Weil
2013-06-21  1:04             ` Chris Mason
2013-06-21  1:09               ` Sage Weil
2013-08-09 18:04           ` Sage Weil
2013-08-09 18:08             ` Chris Mason
2013-06-18 17:19   ` Jon Nelson
2013-06-21  1:11     ` Chris Mason
2013-06-21  1:46       ` Jon Nelson
2013-06-21  5:40         ` Chris Murphy
2013-06-21  6:42         ` Clemens Eisserer
2013-06-25 15:52           ` Liu Bo
2013-06-21 11:30         ` Chris Mason
2013-06-21 13:14           ` Jon Nelson
2013-06-21 14:54     ` Johannes Hirte [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130621165454.5a87e2cb@fem.tu-ilmenau.de \
    --to=johannes.hirte@fem.tu-ilmenau.de \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.