* OSD blocked for more than 120 seconds
@ 2011-10-13 20:39 Martin Mailand
2011-10-14 9:38 ` Wido den Hollander
0 siblings, 1 reply; 12+ messages in thread
From: Martin Mailand @ 2011-10-13 20:39 UTC (permalink / raw)
To: ceph-devel, linux-btrfs
Hi,
on one of my OSDs the ceph-osd task hung for more than 120 sec. The OSD
had almost no load, therefore it cannot be an overload problem. I think
it is a btrfs problem, could someone clarify it?
This was in the dmesg.
[29280.890040] INFO: task btrfs-cleaner:1708 blocked for more than 120
seconds.
[29280.905659] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[29280.922916] btrfs-cleaner D ffff8801153bdf80 0 1708 2
0x00000000
[29280.922931] ffff88011698bbd0 0000000000000046 ffff88011698bb90
ffffffff81090d7d
[29280.922960] ffff880100000000 ffff88011698bfd8 ffff88011698a000
ffff88011698bfd8
[29280.922988] ffffffff81a0d020 ffff8801153bdbc0 ffff88011698bbd0
0000000181090d7d
[29280.923018] Call Trace:
[29280.923043] [<ffffffff81090d7d>] ? ktime_get_ts+0xad/0xe0
[29280.923062] [<ffffffff8110cf10>] ? __lock_page+0x70/0x70
[29280.923082] [<ffffffff815d93df>] schedule+0x3f/0x60
[29280.923098] [<ffffffff815d948c>] io_schedule+0x8c/0xd0
[29280.923114] [<ffffffff8110cf1e>] sleep_on_page+0xe/0x20
[29280.923130] [<ffffffff815d9c6f>] __wait_on_bit+0x5f/0x90
[29280.923147] [<ffffffff8110d168>] wait_on_page_bit+0x78/0x80
[29280.923165] [<ffffffff81086bd0>] ? autoremove_wake_function+0x40/0x40
[29280.923227] [<ffffffffa0065ecb>] btrfs_defrag_file+0x4fb/0xc10 [btrfs]
[29280.923246] [<ffffffff8117f6ac>] ? find_inode+0xac/0xb0
[29280.923281] [<ffffffffa003a2d0>] ?
btrfs_clean_old_snapshots+0x160/0x160 [btrfs]
[29280.923302] [<ffffffff812e369b>] ? radix_tree_lookup+0xb/0x10
[29280.923337] [<ffffffffa0034f62>] ?
btrfs_read_fs_root_no_name+0x1c2/0x2e0 [btrfs]
[29280.923375] [<ffffffffa004897e>] btrfs_run_defrag_inodes+0x15e/0x210
[btrfs]
[29280.923410] [<ffffffffa003278f>] cleaner_kthread+0x17f/0x1a0 [btrfs]
[29280.923443] [<ffffffffa0032610>] ? btrfs_congested_fn+0xb0/0xb0 [btrfs]
[29280.923460] [<ffffffff81086436>] kthread+0x96/0xa0
[29280.923477] [<ffffffff815e5934>] kernel_thread_helper+0x4/0x10
[29280.923493] [<ffffffff810863a0>] ? flush_kthread_worker+0xb0/0xb0
[29280.923510] [<ffffffff815e5930>] ? gs_change+0x13/0x13
[29280.923521] INFO: task btrfs-transacti:1709 blocked for more than 120
seconds.
[29280.939551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[29280.956782] btrfs-transacti D ffff880115745f80 0 1709 2
0x00000000
[29280.956792] ffff880115e6fd50 0000000000000046 ffff880115e6fd20
ffff880111a5a3e0
[29280.956800] ffff880100000000 ffff880115e6ffd8 ffff880115e6e000
ffff880115e6ffd8
[29280.956809] ffffffff81a0d020 ffff880115745bc0 0000000000000282
0000000116758450
[29280.956817] Call Trace:
[29280.956827] [<ffffffff815d93df>] schedule+0x3f/0x60
[29280.956855] [<ffffffffa0037de5>] wait_for_commit.clone.16+0x55/0x90
[btrfs]
[29280.956864] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
[29280.956891] [<ffffffffa0039726>]
btrfs_commit_transaction+0x776/0x860 [btrfs]
[29280.956900] [<ffffffff8115653c>] ? kmem_cache_alloc+0x3c/0x130
[29280.956907] [<ffffffff815db6fe>] ? _raw_spin_lock+0xe/0x20
[29280.956933] [<ffffffffa003879d>] ?
join_transaction.clone.24+0x5d/0x240 [btrfs]
[29280.956941] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
[29280.956966] [<ffffffffa0033323>] transaction_kthread+0x273/0x290 [btrfs]
[29280.956991] [<ffffffffa00330b0>] ? check_leaf.clone.68+0x320/0x320
[btrfs]
[29280.956999] [<ffffffff81086436>] kthread+0x96/0xa0
[29280.957007] [<ffffffff815e5934>] kernel_thread_helper+0x4/0x10
[29280.957015] [<ffffffff810863a0>] ? flush_kthread_worker+0xb0/0xb0
[29280.957022] [<ffffffff815e5930>] ? gs_change+0x13/0x13
[29280.957030] INFO: task ceph-osd:1855 blocked for more than 120 seconds.
[29280.971860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[29280.989164] ceph-osd D ffff880114865f80 0 1855 1
0x00000004
[29280.989173] ffff880115229c48 0000000000000082 ffff880115229bf8
ffff880115230fb8
[29280.989181] ffff880115229c00 ffff880115229fd8 ffff880115228000
ffff880115229fd8
[29280.989189] ffff8801151744d0 ffff880114865bc0 0000000000000282
ffff880117864208
[29280.989209] Call Trace:
[29280.989226] [<ffffffff815d93df>] schedule+0x3f/0x60
[29280.989263] [<ffffffffa003a017>]
btrfs_commit_transaction_async+0x1f7/0x270 [btrfs]
[29280.989296] [<ffffffffa002375b>] ? block_rsv_add_bytes+0x5b/0x80 [btrfs]
[29280.989314] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
[29280.989344] [<ffffffffa00237ba>] ? block_rsv_migrate_bytes+0x3a/0x50
[btrfs]
[29280.989380] [<ffffffffa00655b1>] btrfs_mksubvol+0x301/0x3a0 [btrfs]
[29280.989416] [<ffffffffa0065750>]
btrfs_ioctl_snap_create_transid+0x100/0x160 [btrfs]
[29280.989453] [<ffffffffa00658d2>]
btrfs_ioctl_snap_create_v2.clone.57+0xa2/0x100 [btrfs]
[29280.989491] [<ffffffffa0066d5d>] btrfs_ioctl+0x1fd/0xe20 [btrfs]
[29280.989507] [<ffffffff811657c2>] ? do_sync_write+0xd2/0x110
[29280.989525] [<ffffffff811a053d>] ? fsnotify+0x1cd/0x2e0
[29280.989541] [<ffffffff811779f8>] do_vfs_ioctl+0x98/0x540
[29280.989557] [<ffffffff81177f31>] sys_ioctl+0x91/0xa0
[29280.989575] [<ffffffff815e37c2>] system_call_fastpath+0x16/0x1b
Best Regards,
marti
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OSD blocked for more than 120 seconds
2011-10-13 20:39 OSD blocked for more than 120 seconds Martin Mailand
@ 2011-10-14 9:38 ` Wido den Hollander
2011-10-15 19:33 ` Christian Brunner
0 siblings, 1 reply; 12+ messages in thread
From: Wido den Hollander @ 2011-10-14 9:38 UTC (permalink / raw)
To: martin; +Cc: ceph-devel
Hi,
On Thu, 2011-10-13 at 22:39 +0200, Martin Mailand wrote:
> Hi,
> on one of my OSDs the ceph-osd task hung for more than 120 sec. The OSD
> had almost no load, therefore it cannot be an overload problem. I think
> it is a btrfs problem, could someone clarify it?
>
> This was in the dmesg.
>
> [29280.890040] INFO: task btrfs-cleaner:1708 blocked for more than 120
Judging on the fact that I see btrfs-cleaner and btrfs-transaction
blocking I guess this is a btrfs bug/hangup.
Which kernel are you using?
Wido
> seconds.
> [29280.905659] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [29280.922916] btrfs-cleaner D ffff8801153bdf80 0 1708 2
> 0x00000000
> [29280.922931] ffff88011698bbd0 0000000000000046 ffff88011698bb90
> ffffffff81090d7d
> [29280.922960] ffff880100000000 ffff88011698bfd8 ffff88011698a000
> ffff88011698bfd8
> [29280.922988] ffffffff81a0d020 ffff8801153bdbc0 ffff88011698bbd0
> 0000000181090d7d
> [29280.923018] Call Trace:
> [29280.923043] [<ffffffff81090d7d>] ? ktime_get_ts+0xad/0xe0
> [29280.923062] [<ffffffff8110cf10>] ? __lock_page+0x70/0x70
> [29280.923082] [<ffffffff815d93df>] schedule+0x3f/0x60
> [29280.923098] [<ffffffff815d948c>] io_schedule+0x8c/0xd0
> [29280.923114] [<ffffffff8110cf1e>] sleep_on_page+0xe/0x20
> [29280.923130] [<ffffffff815d9c6f>] __wait_on_bit+0x5f/0x90
> [29280.923147] [<ffffffff8110d168>] wait_on_page_bit+0x78/0x80
> [29280.923165] [<ffffffff81086bd0>] ? autoremove_wake_function+0x40/0x40
> [29280.923227] [<ffffffffa0065ecb>] btrfs_defrag_file+0x4fb/0xc10 [btrfs]
> [29280.923246] [<ffffffff8117f6ac>] ? find_inode+0xac/0xb0
> [29280.923281] [<ffffffffa003a2d0>] ?
> btrfs_clean_old_snapshots+0x160/0x160 [btrfs]
> [29280.923302] [<ffffffff812e369b>] ? radix_tree_lookup+0xb/0x10
> [29280.923337] [<ffffffffa0034f62>] ?
> btrfs_read_fs_root_no_name+0x1c2/0x2e0 [btrfs]
> [29280.923375] [<ffffffffa004897e>] btrfs_run_defrag_inodes+0x15e/0x210
> [btrfs]
> [29280.923410] [<ffffffffa003278f>] cleaner_kthread+0x17f/0x1a0 [btrfs]
> [29280.923443] [<ffffffffa0032610>] ? btrfs_congested_fn+0xb0/0xb0 [btrfs]
> [29280.923460] [<ffffffff81086436>] kthread+0x96/0xa0
> [29280.923477] [<ffffffff815e5934>] kernel_thread_helper+0x4/0x10
> [29280.923493] [<ffffffff810863a0>] ? flush_kthread_worker+0xb0/0xb0
> [29280.923510] [<ffffffff815e5930>] ? gs_change+0x13/0x13
> [29280.923521] INFO: task btrfs-transacti:1709 blocked for more than 120
> seconds.
> [29280.939551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [29280.956782] btrfs-transacti D ffff880115745f80 0 1709 2
> 0x00000000
> [29280.956792] ffff880115e6fd50 0000000000000046 ffff880115e6fd20
> ffff880111a5a3e0
> [29280.956800] ffff880100000000 ffff880115e6ffd8 ffff880115e6e000
> ffff880115e6ffd8
> [29280.956809] ffffffff81a0d020 ffff880115745bc0 0000000000000282
> 0000000116758450
> [29280.956817] Call Trace:
> [29280.956827] [<ffffffff815d93df>] schedule+0x3f/0x60
> [29280.956855] [<ffffffffa0037de5>] wait_for_commit.clone.16+0x55/0x90
> [btrfs]
> [29280.956864] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
> [29280.956891] [<ffffffffa0039726>]
> btrfs_commit_transaction+0x776/0x860 [btrfs]
> [29280.956900] [<ffffffff8115653c>] ? kmem_cache_alloc+0x3c/0x130
> [29280.956907] [<ffffffff815db6fe>] ? _raw_spin_lock+0xe/0x20
> [29280.956933] [<ffffffffa003879d>] ?
> join_transaction.clone.24+0x5d/0x240 [btrfs]
> [29280.956941] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
> [29280.956966] [<ffffffffa0033323>] transaction_kthread+0x273/0x290 [btrfs]
> [29280.956991] [<ffffffffa00330b0>] ? check_leaf.clone.68+0x320/0x320
> [btrfs]
> [29280.956999] [<ffffffff81086436>] kthread+0x96/0xa0
> [29280.957007] [<ffffffff815e5934>] kernel_thread_helper+0x4/0x10
> [29280.957015] [<ffffffff810863a0>] ? flush_kthread_worker+0xb0/0xb0
> [29280.957022] [<ffffffff815e5930>] ? gs_change+0x13/0x13
> [29280.957030] INFO: task ceph-osd:1855 blocked for more than 120 seconds.
> [29280.971860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [29280.989164] ceph-osd D ffff880114865f80 0 1855 1
> 0x00000004
> [29280.989173] ffff880115229c48 0000000000000082 ffff880115229bf8
> ffff880115230fb8
> [29280.989181] ffff880115229c00 ffff880115229fd8 ffff880115228000
> ffff880115229fd8
> [29280.989189] ffff8801151744d0 ffff880114865bc0 0000000000000282
> ffff880117864208
> [29280.989209] Call Trace:
> [29280.989226] [<ffffffff815d93df>] schedule+0x3f/0x60
> [29280.989263] [<ffffffffa003a017>]
> btrfs_commit_transaction_async+0x1f7/0x270 [btrfs]
> [29280.989296] [<ffffffffa002375b>] ? block_rsv_add_bytes+0x5b/0x80 [btrfs]
> [29280.989314] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
> [29280.989344] [<ffffffffa00237ba>] ? block_rsv_migrate_bytes+0x3a/0x50
> [btrfs]
> [29280.989380] [<ffffffffa00655b1>] btrfs_mksubvol+0x301/0x3a0 [btrfs]
> [29280.989416] [<ffffffffa0065750>]
> btrfs_ioctl_snap_create_transid+0x100/0x160 [btrfs]
> [29280.989453] [<ffffffffa00658d2>]
> btrfs_ioctl_snap_create_v2.clone.57+0xa2/0x100 [btrfs]
> [29280.989491] [<ffffffffa0066d5d>] btrfs_ioctl+0x1fd/0xe20 [btrfs]
> [29280.989507] [<ffffffff811657c2>] ? do_sync_write+0xd2/0x110
> [29280.989525] [<ffffffff811a053d>] ? fsnotify+0x1cd/0x2e0
> [29280.989541] [<ffffffff811779f8>] do_vfs_ioctl+0x98/0x540
> [29280.989557] [<ffffffff81177f31>] sys_ioctl+0x91/0xa0
> [29280.989575] [<ffffffff815e37c2>] system_call_fastpath+0x16/0x1b
>
>
> Best Regards,
> marti
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OSD blocked for more than 120 seconds
2011-10-14 9:38 ` Wido den Hollander
@ 2011-10-15 19:33 ` Christian Brunner
2011-10-15 20:01 ` Martin Mailand
0 siblings, 1 reply; 12+ messages in thread
From: Christian Brunner @ 2011-10-15 19:33 UTC (permalink / raw)
To: Wido den Hollander; +Cc: martin, ceph-devel
I'm not seeing the same problem, but I've experienced something similar:
As you might know, I had serious performance problems with btrfs some
month ago, after that, I switched to ext4 and had other problems
there. Last Saturday I decided to give josef's current btrfs git repo
a try in our ceph cluster.
Everything performed well at first, but after a day I noticed that
btrfs-cleaner was wasting more and more time in
btrfs_clean_old_snapshots. When we reached load 20 on the OSDs I
rebooted the nodes, everything was back to normal then. But again
after a a few hours the load started to rise.
My solution to fix this for the moment was, to turn of the btrfs
snapshot feature in ceph with:
filestore btrfs snaps = 0
Now I have good performance, low waitio values on the disks and I
haven't seen our btrfs warning until now as well.
I don't know what the implications are (does this enable writeahead
journaling in ceph?), but to me it's the only setup that does the job
at the moment.
Regards,
Christian
2011/10/14 Wido den Hollander <wido@widodh.nl>:
> Hi,
>
> On Thu, 2011-10-13 at 22:39 +0200, Martin Mailand wrote:
>> Hi,
>> on one of my OSDs the ceph-osd task hung for more than 120 sec. The OSD
>> had almost no load, therefore it cannot be an overload problem. I think
>> it is a btrfs problem, could someone clarify it?
>>
>> This was in the dmesg.
>>
>> [29280.890040] INFO: task btrfs-cleaner:1708 blocked for more than 120
>
> Judging on the fact that I see btrfs-cleaner and btrfs-transaction
> blocking I guess this is a btrfs bug/hangup.
>
> Which kernel are you using?
>
> Wido
>
>> seconds.
>> [29280.905659] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [29280.922916] btrfs-cleaner D ffff8801153bdf80 0 1708 2
>> 0x00000000
>> [29280.922931] ffff88011698bbd0 0000000000000046 ffff88011698bb90
>> ffffffff81090d7d
>> [29280.922960] ffff880100000000 ffff88011698bfd8 ffff88011698a000
>> ffff88011698bfd8
>> [29280.922988] ffffffff81a0d020 ffff8801153bdbc0 ffff88011698bbd0
>> 0000000181090d7d
>> [29280.923018] Call Trace:
>> [29280.923043] [<ffffffff81090d7d>] ? ktime_get_ts+0xad/0xe0
>> [29280.923062] [<ffffffff8110cf10>] ? __lock_page+0x70/0x70
>> [29280.923082] [<ffffffff815d93df>] schedule+0x3f/0x60
>> [29280.923098] [<ffffffff815d948c>] io_schedule+0x8c/0xd0
>> [29280.923114] [<ffffffff8110cf1e>] sleep_on_page+0xe/0x20
>> [29280.923130] [<ffffffff815d9c6f>] __wait_on_bit+0x5f/0x90
>> [29280.923147] [<ffffffff8110d168>] wait_on_page_bit+0x78/0x80
>> [29280.923165] [<ffffffff81086bd0>] ? autoremove_wake_function+0x40/0x40
>> [29280.923227] [<ffffffffa0065ecb>] btrfs_defrag_file+0x4fb/0xc10 [btrfs]
>> [29280.923246] [<ffffffff8117f6ac>] ? find_inode+0xac/0xb0
>> [29280.923281] [<ffffffffa003a2d0>] ?
>> btrfs_clean_old_snapshots+0x160/0x160 [btrfs]
>> [29280.923302] [<ffffffff812e369b>] ? radix_tree_lookup+0xb/0x10
>> [29280.923337] [<ffffffffa0034f62>] ?
>> btrfs_read_fs_root_no_name+0x1c2/0x2e0 [btrfs]
>> [29280.923375] [<ffffffffa004897e>] btrfs_run_defrag_inodes+0x15e/0x210
>> [btrfs]
>> [29280.923410] [<ffffffffa003278f>] cleaner_kthread+0x17f/0x1a0 [btrfs]
>> [29280.923443] [<ffffffffa0032610>] ? btrfs_congested_fn+0xb0/0xb0 [btrfs]
>> [29280.923460] [<ffffffff81086436>] kthread+0x96/0xa0
>> [29280.923477] [<ffffffff815e5934>] kernel_thread_helper+0x4/0x10
>> [29280.923493] [<ffffffff810863a0>] ? flush_kthread_worker+0xb0/0xb0
>> [29280.923510] [<ffffffff815e5930>] ? gs_change+0x13/0x13
>> [29280.923521] INFO: task btrfs-transacti:1709 blocked for more than 120
>> seconds.
>> [29280.939551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [29280.956782] btrfs-transacti D ffff880115745f80 0 1709 2
>> 0x00000000
>> [29280.956792] ffff880115e6fd50 0000000000000046 ffff880115e6fd20
>> ffff880111a5a3e0
>> [29280.956800] ffff880100000000 ffff880115e6ffd8 ffff880115e6e000
>> ffff880115e6ffd8
>> [29280.956809] ffffffff81a0d020 ffff880115745bc0 0000000000000282
>> 0000000116758450
>> [29280.956817] Call Trace:
>> [29280.956827] [<ffffffff815d93df>] schedule+0x3f/0x60
>> [29280.956855] [<ffffffffa0037de5>] wait_for_commit.clone.16+0x55/0x90
>> [btrfs]
>> [29280.956864] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
>> [29280.956891] [<ffffffffa0039726>]
>> btrfs_commit_transaction+0x776/0x860 [btrfs]
>> [29280.956900] [<ffffffff8115653c>] ? kmem_cache_alloc+0x3c/0x130
>> [29280.956907] [<ffffffff815db6fe>] ? _raw_spin_lock+0xe/0x20
>> [29280.956933] [<ffffffffa003879d>] ?
>> join_transaction.clone.24+0x5d/0x240 [btrfs]
>> [29280.956941] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
>> [29280.956966] [<ffffffffa0033323>] transaction_kthread+0x273/0x290 [btrfs]
>> [29280.956991] [<ffffffffa00330b0>] ? check_leaf.clone.68+0x320/0x320
>> [btrfs]
>> [29280.956999] [<ffffffff81086436>] kthread+0x96/0xa0
>> [29280.957007] [<ffffffff815e5934>] kernel_thread_helper+0x4/0x10
>> [29280.957015] [<ffffffff810863a0>] ? flush_kthread_worker+0xb0/0xb0
>> [29280.957022] [<ffffffff815e5930>] ? gs_change+0x13/0x13
>> [29280.957030] INFO: task ceph-osd:1855 blocked for more than 120 seconds.
>> [29280.971860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [29280.989164] ceph-osd D ffff880114865f80 0 1855 1
>> 0x00000004
>> [29280.989173] ffff880115229c48 0000000000000082 ffff880115229bf8
>> ffff880115230fb8
>> [29280.989181] ffff880115229c00 ffff880115229fd8 ffff880115228000
>> ffff880115229fd8
>> [29280.989189] ffff8801151744d0 ffff880114865bc0 0000000000000282
>> ffff880117864208
>> [29280.989209] Call Trace:
>> [29280.989226] [<ffffffff815d93df>] schedule+0x3f/0x60
>> [29280.989263] [<ffffffffa003a017>]
>> btrfs_commit_transaction_async+0x1f7/0x270 [btrfs]
>> [29280.989296] [<ffffffffa002375b>] ? block_rsv_add_bytes+0x5b/0x80 [btrfs]
>> [29280.989314] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
>> [29280.989344] [<ffffffffa00237ba>] ? block_rsv_migrate_bytes+0x3a/0x50
>> [btrfs]
>> [29280.989380] [<ffffffffa00655b1>] btrfs_mksubvol+0x301/0x3a0 [btrfs]
>> [29280.989416] [<ffffffffa0065750>]
>> btrfs_ioctl_snap_create_transid+0x100/0x160 [btrfs]
>> [29280.989453] [<ffffffffa00658d2>]
>> btrfs_ioctl_snap_create_v2.clone.57+0xa2/0x100 [btrfs]
>> [29280.989491] [<ffffffffa0066d5d>] btrfs_ioctl+0x1fd/0xe20 [btrfs]
>> [29280.989507] [<ffffffff811657c2>] ? do_sync_write+0xd2/0x110
>> [29280.989525] [<ffffffff811a053d>] ? fsnotify+0x1cd/0x2e0
>> [29280.989541] [<ffffffff811779f8>] do_vfs_ioctl+0x98/0x540
>> [29280.989557] [<ffffffff81177f31>] sys_ioctl+0x91/0xa0
>> [29280.989575] [<ffffffff815e37c2>] system_call_fastpath+0x16/0x1b
>>
>>
>> Best Regards,
>> marti
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OSD blocked for more than 120 seconds
2011-10-15 19:33 ` Christian Brunner
@ 2011-10-15 20:01 ` Martin Mailand
2011-10-17 9:40 ` Christian Brunner
0 siblings, 1 reply; 12+ messages in thread
From: Martin Mailand @ 2011-10-15 20:01 UTC (permalink / raw)
To: chb; +Cc: Wido den Hollander, ceph-devel
Hi Christian,
I have a very similar experience, I also used josef's tree and btrfs
snaps = 0, the next problem I had than was excessive fragmentation, so I
used this patch http://marc.info/?l=linux-btrfs&m=131495014823121&w=2,
and changed the btrfs option to (btrfs options =
noatime,nodatacow,autodefrag) that kept the fragmentation under control.
But even with this setup after a few days the load on the osd is unbearable.
As far as I understood the doku if you disable the btrfs snapshot
functionality the writeahead journal is activated.
http://ceph.newdream.net/wiki/Ceph.conf
And I get this in the logs.
mount: enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is
not enabled
May I asked what kind of probs you did have with ext4? Because I am
looking into this direction as well.
Best Regards,
martin
Christian Brunner schrieb:
> I'm not seeing the same problem, but I've experienced something similar:
>
> As you might know, I had serious performance problems with btrfs some
> month ago, after that, I switched to ext4 and had other problems
> there. Last Saturday I decided to give josef's current btrfs git repo
> a try in our ceph cluster.
>
> Everything performed well at first, but after a day I noticed that
> btrfs-cleaner was wasting more and more time in
> btrfs_clean_old_snapshots. When we reached load 20 on the OSDs I
> rebooted the nodes, everything was back to normal then. But again
> after a a few hours the load started to rise.
>
> My solution to fix this for the moment was, to turn of the btrfs
> snapshot feature in ceph with:
>
> filestore btrfs snaps = 0
>
> Now I have good performance, low waitio values on the disks and I
> haven't seen our btrfs warning until now as well.
>
> I don't know what the implications are (does this enable writeahead
> journaling in ceph?), but to me it's the only setup that does the job
> at the moment.
>
> Regards,
> Christian
>
>
>
> 2011/10/14 Wido den Hollander <wido@widodh.nl>:
>> Hi,
>>
>> On Thu, 2011-10-13 at 22:39 +0200, Martin Mailand wrote:
>>> Hi,
>>> on one of my OSDs the ceph-osd task hung for more than 120 sec. The OSD
>>> had almost no load, therefore it cannot be an overload problem. I think
>>> it is a btrfs problem, could someone clarify it?
>>>
>>> This was in the dmesg.
>>>
>>> [29280.890040] INFO: task btrfs-cleaner:1708 blocked for more than 120
>> Judging on the fact that I see btrfs-cleaner and btrfs-transaction
>> blocking I guess this is a btrfs bug/hangup.
>>
>> Which kernel are you using?
>>
>> Wido
>>
>>> seconds.
>>> [29280.905659] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [29280.922916] btrfs-cleaner D ffff8801153bdf80 0 1708 2
>>> 0x00000000
>>> [29280.922931] ffff88011698bbd0 0000000000000046 ffff88011698bb90
>>> ffffffff81090d7d
>>> [29280.922960] ffff880100000000 ffff88011698bfd8 ffff88011698a000
>>> ffff88011698bfd8
>>> [29280.922988] ffffffff81a0d020 ffff8801153bdbc0 ffff88011698bbd0
>>> 0000000181090d7d
>>> [29280.923018] Call Trace:
>>> [29280.923043] [<ffffffff81090d7d>] ? ktime_get_ts+0xad/0xe0
>>> [29280.923062] [<ffffffff8110cf10>] ? __lock_page+0x70/0x70
>>> [29280.923082] [<ffffffff815d93df>] schedule+0x3f/0x60
>>> [29280.923098] [<ffffffff815d948c>] io_schedule+0x8c/0xd0
>>> [29280.923114] [<ffffffff8110cf1e>] sleep_on_page+0xe/0x20
>>> [29280.923130] [<ffffffff815d9c6f>] __wait_on_bit+0x5f/0x90
>>> [29280.923147] [<ffffffff8110d168>] wait_on_page_bit+0x78/0x80
>>> [29280.923165] [<ffffffff81086bd0>] ? autoremove_wake_function+0x40/0x40
>>> [29280.923227] [<ffffffffa0065ecb>] btrfs_defrag_file+0x4fb/0xc10 [btrfs]
>>> [29280.923246] [<ffffffff8117f6ac>] ? find_inode+0xac/0xb0
>>> [29280.923281] [<ffffffffa003a2d0>] ?
>>> btrfs_clean_old_snapshots+0x160/0x160 [btrfs]
>>> [29280.923302] [<ffffffff812e369b>] ? radix_tree_lookup+0xb/0x10
>>> [29280.923337] [<ffffffffa0034f62>] ?
>>> btrfs_read_fs_root_no_name+0x1c2/0x2e0 [btrfs]
>>> [29280.923375] [<ffffffffa004897e>] btrfs_run_defrag_inodes+0x15e/0x210
>>> [btrfs]
>>> [29280.923410] [<ffffffffa003278f>] cleaner_kthread+0x17f/0x1a0 [btrfs]
>>> [29280.923443] [<ffffffffa0032610>] ? btrfs_congested_fn+0xb0/0xb0 [btrfs]
>>> [29280.923460] [<ffffffff81086436>] kthread+0x96/0xa0
>>> [29280.923477] [<ffffffff815e5934>] kernel_thread_helper+0x4/0x10
>>> [29280.923493] [<ffffffff810863a0>] ? flush_kthread_worker+0xb0/0xb0
>>> [29280.923510] [<ffffffff815e5930>] ? gs_change+0x13/0x13
>>> [29280.923521] INFO: task btrfs-transacti:1709 blocked for more than 120
>>> seconds.
>>> [29280.939551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [29280.956782] btrfs-transacti D ffff880115745f80 0 1709 2
>>> 0x00000000
>>> [29280.956792] ffff880115e6fd50 0000000000000046 ffff880115e6fd20
>>> ffff880111a5a3e0
>>> [29280.956800] ffff880100000000 ffff880115e6ffd8 ffff880115e6e000
>>> ffff880115e6ffd8
>>> [29280.956809] ffffffff81a0d020 ffff880115745bc0 0000000000000282
>>> 0000000116758450
>>> [29280.956817] Call Trace:
>>> [29280.956827] [<ffffffff815d93df>] schedule+0x3f/0x60
>>> [29280.956855] [<ffffffffa0037de5>] wait_for_commit.clone.16+0x55/0x90
>>> [btrfs]
>>> [29280.956864] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
>>> [29280.956891] [<ffffffffa0039726>]
>>> btrfs_commit_transaction+0x776/0x860 [btrfs]
>>> [29280.956900] [<ffffffff8115653c>] ? kmem_cache_alloc+0x3c/0x130
>>> [29280.956907] [<ffffffff815db6fe>] ? _raw_spin_lock+0xe/0x20
>>> [29280.956933] [<ffffffffa003879d>] ?
>>> join_transaction.clone.24+0x5d/0x240 [btrfs]
>>> [29280.956941] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
>>> [29280.956966] [<ffffffffa0033323>] transaction_kthread+0x273/0x290 [btrfs]
>>> [29280.956991] [<ffffffffa00330b0>] ? check_leaf.clone.68+0x320/0x320
>>> [btrfs]
>>> [29280.956999] [<ffffffff81086436>] kthread+0x96/0xa0
>>> [29280.957007] [<ffffffff815e5934>] kernel_thread_helper+0x4/0x10
>>> [29280.957015] [<ffffffff810863a0>] ? flush_kthread_worker+0xb0/0xb0
>>> [29280.957022] [<ffffffff815e5930>] ? gs_change+0x13/0x13
>>> [29280.957030] INFO: task ceph-osd:1855 blocked for more than 120 seconds.
>>> [29280.971860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [29280.989164] ceph-osd D ffff880114865f80 0 1855 1
>>> 0x00000004
>>> [29280.989173] ffff880115229c48 0000000000000082 ffff880115229bf8
>>> ffff880115230fb8
>>> [29280.989181] ffff880115229c00 ffff880115229fd8 ffff880115228000
>>> ffff880115229fd8
>>> [29280.989189] ffff8801151744d0 ffff880114865bc0 0000000000000282
>>> ffff880117864208
>>> [29280.989209] Call Trace:
>>> [29280.989226] [<ffffffff815d93df>] schedule+0x3f/0x60
>>> [29280.989263] [<ffffffffa003a017>]
>>> btrfs_commit_transaction_async+0x1f7/0x270 [btrfs]
>>> [29280.989296] [<ffffffffa002375b>] ? block_rsv_add_bytes+0x5b/0x80 [btrfs]
>>> [29280.989314] [<ffffffff81086b90>] ? wake_up_bit+0x40/0x40
>>> [29280.989344] [<ffffffffa00237ba>] ? block_rsv_migrate_bytes+0x3a/0x50
>>> [btrfs]
>>> [29280.989380] [<ffffffffa00655b1>] btrfs_mksubvol+0x301/0x3a0 [btrfs]
>>> [29280.989416] [<ffffffffa0065750>]
>>> btrfs_ioctl_snap_create_transid+0x100/0x160 [btrfs]
>>> [29280.989453] [<ffffffffa00658d2>]
>>> btrfs_ioctl_snap_create_v2.clone.57+0xa2/0x100 [btrfs]
>>> [29280.989491] [<ffffffffa0066d5d>] btrfs_ioctl+0x1fd/0xe20 [btrfs]
>>> [29280.989507] [<ffffffff811657c2>] ? do_sync_write+0xd2/0x110
>>> [29280.989525] [<ffffffff811a053d>] ? fsnotify+0x1cd/0x2e0
>>> [29280.989541] [<ffffffff811779f8>] do_vfs_ioctl+0x98/0x540
>>> [29280.989557] [<ffffffff81177f31>] sys_ioctl+0x91/0xa0
>>> [29280.989575] [<ffffffff815e37c2>] system_call_fastpath+0x16/0x1b
>>>
>>>
>>> Best Regards,
>>> marti
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OSD blocked for more than 120 seconds
2011-10-15 20:01 ` Martin Mailand
@ 2011-10-17 9:40 ` Christian Brunner
2011-10-17 11:49 ` Martin Mailand
2011-10-17 14:13 ` Martin Mailand
0 siblings, 2 replies; 12+ messages in thread
From: Christian Brunner @ 2011-10-17 9:40 UTC (permalink / raw)
To: martin; +Cc: Wido den Hollander, ceph-devel
2011/10/15 Martin Mailand <martin@tuxadero.com>:
> Hi Christian,
> I have a very similar experience, I also used josef's tree and btrfs snaps =
> 0, the next problem I had than was excessive fragmentation, so I used this
> patch http://marc.info/?l=linux-btrfs&m=131495014823121&w=2, and changed the
> btrfs option to (btrfs options = noatime,nodatacow,autodefrag) that kept the
> fragmentation under control.
> But even with this setup after a few days the load on the osd is unbearable.
How did you find out about our fragmentation issues? Was it just a
performance problem?
> As far as I understood the doku if you disable the btrfs snapshot
> functionality the writeahead journal is activated.
> http://ceph.newdream.net/wiki/Ceph.conf
> And I get this in the logs.
> mount: enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not
> enabled
>
> May I asked what kind of probs you did have with ext4? Because I am looking
> into this direction as well.
You can read about our ext4 problems here:
http://marc.info/?l=ceph-devel&m=131201869703245&w=2
Our bugreport with RedHat didn't make any progress for a long time,
but last week RedHat made two sugestions:
- If you configure ceph with 'filestore flusher = false', do you see
any different behavior?
- If you mount with -o noauto_da_alloc does it change anything?
Since I have just migrated to btrfs, I've some problems to check this,
but I'll try to do this as soon as I can get hold of some extra
hardware.
Regards,
Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OSD blocked for more than 120 seconds
2011-10-17 9:40 ` Christian Brunner
@ 2011-10-17 11:49 ` Martin Mailand
2011-10-17 12:05 ` Tomasz Paszkowski
2011-10-17 14:13 ` Martin Mailand
1 sibling, 1 reply; 12+ messages in thread
From: Martin Mailand @ 2011-10-17 11:49 UTC (permalink / raw)
To: chb; +Cc: Wido den Hollander, ceph-devel
Am 17.10.2011 11:40, schrieb Christian Brunner:
> 2011/10/15 Martin Mailand<martin@tuxadero.com>:
>> Hi Christian,
>> I have a very similar experience, I also used josef's tree and btrfs snaps =
>> 0, the next problem I had than was excessive fragmentation, so I used this
>> patch http://marc.info/?l=linux-btrfs&m=131495014823121&w=2, and changed the
>> btrfs option to (btrfs options = noatime,nodatacow,autodefrag) that kept the
>> fragmentation under control.
>> But even with this setup after a few days the load on the osd is unbearable.
>
> How did you find out about our fragmentation issues? Was it just a
> performance problem?
>
I used filefrag to show the number of extents, after the patch, I have
on average 1,14 extents per 4MB ceph object on the osd.
>> As far as I understood the doku if you disable the btrfs snapshot
>> functionality the writeahead journal is activated.
>> http://ceph.newdream.net/wiki/Ceph.conf
>> And I get this in the logs.
>> mount: enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not
>> enabled
>>
>> May I asked what kind of probs you did have with ext4? Because I am looking
>> into this direction as well.
>
> You can read about our ext4 problems here:
>
> http://marc.info/?l=ceph-devel&m=131201869703245&w=2
I still can reproduce the bug with v3.1-rc9.
>
> Our bugreport with RedHat didn't make any progress for a long time,
> but last week RedHat made two sugestions:
>
> - If you configure ceph with 'filestore flusher = false', do you see
> any different behavior?
> - If you mount with -o noauto_da_alloc does it change anything?
>
> Since I have just migrated to btrfs, I've some problems to check this,
> but I'll try to do this as soon as I can get hold of some extra
> hardware.
>
I can check this, I have a spare cluster at the moment.
> Regards,
> Christian
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OSD blocked for more than 120 seconds
2011-10-17 11:49 ` Martin Mailand
@ 2011-10-17 12:05 ` Tomasz Paszkowski
2011-10-17 13:21 ` Martin Mailand
0 siblings, 1 reply; 12+ messages in thread
From: Tomasz Paszkowski @ 2011-10-17 12:05 UTC (permalink / raw)
To: Martin Mailand; +Cc: chb, Wido den Hollander, ceph-devel
Hi,
It seems that ext4 and btrfs are not to be considered as stable for
now. Does anyone could confirm that
ext3 is the best choice for this moment ?
On 17 October 2011 13:49, Martin Mailand <martin@tuxadero.com> wrote:
> Am 17.10.2011 11:40, schrieb Christian Brunner:
>>
>> 2011/10/15 Martin Mailand<martin@tuxadero.com>:
>>>
>>> Hi Christian,
>>> I have a very similar experience, I also used josef's tree and btrfs
>>> snaps =
>>> 0, the next problem I had than was excessive fragmentation, so I used
>>> this
>>> patch http://marc.info/?l=linux-btrfs&m=131495014823121&w=2, and changed
>>> the
>>> btrfs option to (btrfs options = noatime,nodatacow,autodefrag) that kept
>>> the
>>> fragmentation under control.
>>> But even with this setup after a few days the load on the osd is
>>> unbearable.
>>
>> How did you find out about our fragmentation issues? Was it just a
>> performance problem?
>>
>
> I used filefrag to show the number of extents, after the patch, I have on
> average 1,14 extents per 4MB ceph object on the osd.
>
>>> As far as I understood the doku if you disable the btrfs snapshot
>>> functionality the writeahead journal is activated.
>>> http://ceph.newdream.net/wiki/Ceph.conf
>>> And I get this in the logs.
>>> mount: enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is
>>> not
>>> enabled
>>>
>>> May I asked what kind of probs you did have with ext4? Because I am
>>> looking
>>> into this direction as well.
>>
>> You can read about our ext4 problems here:
>>
>> http://marc.info/?l=ceph-devel&m=131201869703245&w=2
>
> I still can reproduce the bug with v3.1-rc9.
>
>>
>> Our bugreport with RedHat didn't make any progress for a long time,
>> but last week RedHat made two sugestions:
>>
>> - If you configure ceph with 'filestore flusher = false', do you see
>> any different behavior?
>> - If you mount with -o noauto_da_alloc does it change anything?
>>
>> Since I have just migrated to btrfs, I've some problems to check this,
>> but I'll try to do this as soon as I can get hold of some extra
>> hardware.
>>
> I can check this, I have a spare cluster at the moment.
>
>> Regards,
>> Christian
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Tomasz Paszkowski
SS7, Asterisk, SAN, Datacenter, Cloud Computing
+48500166299
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OSD blocked for more than 120 seconds
2011-10-17 12:05 ` Tomasz Paszkowski
@ 2011-10-17 13:21 ` Martin Mailand
0 siblings, 0 replies; 12+ messages in thread
From: Martin Mailand @ 2011-10-17 13:21 UTC (permalink / raw)
To: Tomasz Paszkowski; +Cc: chb, Wido den Hollander, ceph-devel
Am 17.10.2011 14:05, schrieb Tomasz Paszkowski:
> Hi,
>
> It seems that ext4 and btrfs are not to be considered as stable for
> now. Does anyone could confirm that
> ext3 is the best choice for this moment ?
Hi,
I did a quick test with ext3, and it did not look very good.
After a few minutes one of the osds failed with this message.
[315274.737204] kjournald starting. Commit interval 5 seconds
[315274.737919] EXT3-fs (sdb): using internal journal
[315274.737929] EXT3-fs (sdb): mounted filesystem with ordered data mode
[317040.890148] INFO: task ceph-osd:18032 blocked for more than 120 seconds.
[317040.905855] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[317040.923801] ceph-osd D ffff880114c8b1a0 0 18032 1
0x00000000
[317040.923812] ffff88010f2e3cb8 0000000000000086 ffff88010f2e3cb8
ffff88010f2e3cb8
[317040.923821] ffff88011ffdff08 ffff88010f2e3fd8 ffff88010f2e2000
ffff88010f2e3fd8
[317040.923830] ffff880116dadbc0 ffff880114c8ade0 ffff88010f2e3cd8
ffffffff8110d500
[317040.923847] Call Trace:
[317040.923865] [<ffffffff8110d500>] ? find_get_pages_tag+0x40/0x130
[317040.923876] [<ffffffff815d93df>] schedule+0x3f/0x60
[317040.923884] [<ffffffff815d99ed>] schedule_timeout+0x26d/0x2e0
[317040.923893] [<ffffffff8101a725>] ? native_sched_clock+0x15/0x70
[317040.923899] [<ffffffff8101a789>] ? sched_clock+0x9/0x10
[317040.923908] [<ffffffff8108d465>] ? sched_clock_local+0x25/0x90
[317040.923916] [<ffffffff815d9219>] wait_for_common+0xd9/0x180
[317040.923924] [<ffffffff8105bbc0>] ? try_to_wake_up+0x2b0/0x2b0
[317040.923932] [<ffffffff815d939d>] wait_for_completion+0x1d/0x20
[317040.923941] [<ffffffff8118d652>] sync_inodes_sb+0x92/0x1c0
[317040.923949] [<ffffffff81192440>] ? __sync_filesystem+0x90/0x90
[317040.923956] [<ffffffff81192430>] __sync_filesystem+0x80/0x90
[317040.923963] [<ffffffff8119245f>] sync_one_sb+0x1f/0x30
[317040.923972] [<ffffffff81169268>] iterate_supers+0xa8/0x100
[317040.923979] [<ffffffff81192360>] sync_filesystems+0x20/0x30
[317040.923985] [<ffffffff81192501>] sys_sync+0x21/0x40
[317040.923995] [<ffffffff815e37c2>] system_call_fastpath+0x16/0x1b
Best Regards,
martin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OSD blocked for more than 120 seconds
2011-10-17 9:40 ` Christian Brunner
2011-10-17 11:49 ` Martin Mailand
@ 2011-10-17 14:13 ` Martin Mailand
2011-10-17 15:31 ` Sage Weil
1 sibling, 1 reply; 12+ messages in thread
From: Martin Mailand @ 2011-10-17 14:13 UTC (permalink / raw)
To: chb; +Cc: Wido den Hollander, ceph-devel
Am 17.10.2011 11:40, schrieb Christian Brunner:
> Our bugreport with RedHat didn't make any progress for a long time,
> but last week RedHat made two sugestions:
>
> - If you configure ceph with 'filestore flusher = false', do you see
> any different behavior?
> - If you mount with -o noauto_da_alloc does it change anything?
Hi,
after a quick test I think 'filestore flusher = false' did the trick.
What does it do?
Best Regards,
martin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OSD blocked for more than 120 seconds
2011-10-17 14:13 ` Martin Mailand
@ 2011-10-17 15:31 ` Sage Weil
2011-10-17 18:06 ` Martin Mailand
0 siblings, 1 reply; 12+ messages in thread
From: Sage Weil @ 2011-10-17 15:31 UTC (permalink / raw)
To: Martin Mailand; +Cc: chb, Wido den Hollander, ceph-devel
On Mon, 17 Oct 2011, Martin Mailand wrote:
> Am 17.10.2011 11:40, schrieb Christian Brunner:
> > Our bugreport with RedHat didn't make any progress for a long time,
> > but last week RedHat made two sugestions:
> >
> > - If you configure ceph with 'filestore flusher = false', do you see
> > any different behavior?
> > - If you mount with -o noauto_da_alloc does it change anything?
>
> Hi,
> after a quick test I think 'filestore flusher = false' did the trick.
> What does it do?
It fixes your hang (previous email), or the subsequent fsck errors?
When filestore flusher = true (default), after every write the fd is
handed off to another thread that uses sync_file_range() to push the data
out to disk quickly before closing the file. The purpose is to limit the
latency for the eventual snapshot or sync. Eric suspected the handoff
between threads may be what was triggering the bug in ext4.
sage
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OSD blocked for more than 120 seconds
2011-10-17 15:31 ` Sage Weil
@ 2011-10-17 18:06 ` Martin Mailand
2011-10-17 18:24 ` Christian Brunner
0 siblings, 1 reply; 12+ messages in thread
From: Martin Mailand @ 2011-10-17 18:06 UTC (permalink / raw)
To: Sage Weil; +Cc: chb, Wido den Hollander, ceph-devel
Hi Sage,
the hang was on a btrfs, I do not have a fix for that.
The 'filestore flusher = false' does fix the ext4 problems, which where
reported from Christian, but this option has quite an impact of the osd
performance.
The '-o noauto_da_alloc' option did not solve the fsck problem.
Best Regards,
Martin
Sage Weil schrieb:
> On Mon, 17 Oct 2011, Martin Mailand wrote:
>> Am 17.10.2011 11:40, schrieb Christian Brunner:
>>> Our bugreport with RedHat didn't make any progress for a long time,
>>> but last week RedHat made two sugestions:
>>>
>>> - If you configure ceph with 'filestore flusher = false', do you see
>>> any different behavior?
>>> - If you mount with -o noauto_da_alloc does it change anything?
>> Hi,
>> after a quick test I think 'filestore flusher = false' did the trick.
>> What does it do?
>
> It fixes your hang (previous email), or the subsequent fsck errors?
>
> When filestore flusher = true (default), after every write the fd is
> handed off to another thread that uses sync_file_range() to push the data
> out to disk quickly before closing the file. The purpose is to limit the
> latency for the eventual snapshot or sync. Eric suspected the handoff
> between threads may be what was triggering the bug in ext4.
>
> sage
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OSD blocked for more than 120 seconds
2011-10-17 18:06 ` Martin Mailand
@ 2011-10-17 18:24 ` Christian Brunner
0 siblings, 0 replies; 12+ messages in thread
From: Christian Brunner @ 2011-10-17 18:24 UTC (permalink / raw)
To: martin; +Cc: Sage Weil, Wido den Hollander, ceph-devel
2011/10/17 Martin Mailand <martin@tuxadero.com>:
> Hi Sage,
> the hang was on a btrfs, I do not have a fix for that.
>
> The 'filestore flusher = false' does fix the ext4 problems, which where
> reported from Christian, but this option has quite an impact of the osd
> performance.
> The '-o noauto_da_alloc' option did not solve the fsck problem.
Thanks for testing. I'll report this back to RedHat tomorow, maybe
Eric has an idea what causes the problem in this case.
Regards,
Christian
>
> Best Regards,
> Martin
>
>
> Sage Weil schrieb:
>>
>> On Mon, 17 Oct 2011, Martin Mailand wrote:
>>>
>>> Am 17.10.2011 11:40, schrieb Christian Brunner:
>>>>
>>>> Our bugreport with RedHat didn't make any progress for a long time,
>>>> but last week RedHat made two sugestions:
>>>>
>>>> - If you configure ceph with 'filestore flusher = false', do you see
>>>> any different behavior?
>>>> - If you mount with -o noauto_da_alloc does it change anything?
>>>
>>> Hi,
>>> after a quick test I think 'filestore flusher = false' did the trick.
>>> What does it do?
>>
>> It fixes your hang (previous email), or the subsequent fsck errors?
>>
>> When filestore flusher = true (default), after every write the fd is
>> handed off to another thread that uses sync_file_range() to push the data
>> out to disk quickly before closing the file. The purpose is to limit the
>> latency for the eventual snapshot or sync. Eric suspected the handoff
>> between threads may be what was triggering the bug in ext4.
>>
>> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-10-17 18:24 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-13 20:39 OSD blocked for more than 120 seconds Martin Mailand
2011-10-14 9:38 ` Wido den Hollander
2011-10-15 19:33 ` Christian Brunner
2011-10-15 20:01 ` Martin Mailand
2011-10-17 9:40 ` Christian Brunner
2011-10-17 11:49 ` Martin Mailand
2011-10-17 12:05 ` Tomasz Paszkowski
2011-10-17 13:21 ` Martin Mailand
2011-10-17 14:13 ` Martin Mailand
2011-10-17 15:31 ` Sage Weil
2011-10-17 18:06 ` Martin Mailand
2011-10-17 18:24 ` Christian Brunner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.