All of lore.kernel.org
 help / color / mirror / Atom feed
* Write operation is stuck
@ 2010-08-27 12:18 Bogdan Lobodzinski
  2010-08-27 15:42 ` Wido den Hollander
  2010-08-27 16:09 ` Sage Weil
  0 siblings, 2 replies; 27+ messages in thread
From: Bogdan Lobodzinski @ 2010-08-27 12:18 UTC (permalink / raw)
  To: ceph-devel

Hello,

working with ceph on my test configuration 
(3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP)
After starting 
svn co https://root.cern.ch/svn/root/trunk root

on the /ceph directory, the command become stuck, and also:
root      5303  0.0  0.0      0     0 ?        D    Aug26   0:00 [kjournald]
root     30181  0.0  0.0   6972  2056 pts/1    D+   13:46   0:00 /usr//bin/cosd
-i 2 -c /etc/ceph/ceph.conf

any mount, unmount are going also to the state D.
This is a permanennt behaviour of the ceph if the command is started.

dmesg shows:
-------------
[99048.567704] ------------[ cut here ]------------
[99048.568767] kernel BUG at
/build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384!
[99048.568767] invalid opcode: 0000 [#1] SMP
[99048.568767] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device
[99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph
crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga
vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac
edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas
usbhid mptsas mptscsih mptbase scsi_transport_sas
[99048.596652]
[99048.596652] Pid: 6258, comm: cosd Tainted: P
(2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950
[99048.596652] EIP: 0060:[<c026dc8d>] EFLAGS: 00210296 CPU: 3
[99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
[99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000
[99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14
[99048.596652]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300
task.ti=f5cca000)
[99048.596652] Stack:
[99048.596652]  00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c
f6dd5494 02147fff
[99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428
f1058500 00000000
[99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0
f5ccbcb4 f5ccbc90
[99048.596652] Call Trace:
[99048.596652]  [<c026cc88>] ? read_block_bitmap+0x48/0x160
[99048.596652]  [<c026e048>] ? ext3_new_blocks+0x228/0x6c0
[99048.596652]  [<c024fbd7>] ? mb_cache_entry_find_first+0x67/0x80
[99048.596652]  [<c026e505>] ? ext3_new_block+0x25/0x30
[99048.596652]  [<c02809a4>] ? ext3_xattr_block_set+0x554/0x670
[99048.596652]  [<c027f589>] ? ext3_xattr_set_entry+0x29/0x350
[99048.596652]  [<c0280d8b>] ? ext3_xattr_set_handle+0x2cb/0x3e0
[99048.596652]  [<c0280f15>] ? ext3_xattr_set+0x75/0xc0
[99048.596652]  [<c0280fd6>] ? ext3_xattr_user_set+0x76/0x80
[99048.596652]  [<c022dd8c>] ? generic_setxattr+0x9c/0xb0
[99048.596652]  [<c022dcf0>] ? generic_setxattr+0x0/0xb0
[99048.596652]  [<c022e984>] ? __vfs_setxattr_noperm+0x44/0x160
[99048.596652]  [<c02fed4c>] ? cap_inode_setxattr+0x2c/0x60
[99048.596652]  [<c022eb31>] ? vfs_setxattr+0x91/0xa0
[99048.596652]  [<c022ebf8>] ? setxattr+0xb8/0x110
[99048.596652]  [<c021d512>] ? __link_path_walk+0x632/0xca0
[99048.596652]  [<c014e369>] ? enqueue_task_fair+0x39/0x80
[99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
[99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
[99048.596652]  [<c021be45>] ? path_put+0x25/0x30
[99048.596652]  [<c021ba8b>] ? putname+0x2b/0x40
[99048.596652]  [<c021ea6a>] ? user_path_at+0x4a/0x80
[99048.596652]  [<c0183242>] ? sys_futex+0x72/0x120
[99048.596652]  [<c022ee13>] ? sys_setxattr+0x83/0x90
[99048.596652]  [<c0109763>] ? sysenter_do_call+0x12/0x28
[99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 
32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 ff<0f> 
0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53
[99048.596652] EIP: [<c026dc8d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
SS:ESP 0068:f5ccbc14
[99049.044090] ---[ end trace 35860103963ee444 ]---
h1farm184#
--------------------

my ceph.conf is:
-------
[global]
       pid file = /var/run/ceph/$name.pid
       debug ms = 1
       keyring = /etc/ceph/keyring.bin
; monitors
[mon]
       ;Directory for monitor files
       mon data = /x02/mon$id
       debug mon = 20
       debug paxos = 20
       mon lease wiggle room = 0.5

[mon0]
       host = h1farm182
       mon addr = xxx.xxx.xx.116:6789
[mon1]
       host = h1farm183
       mon addr = xxx.xxx.xx.117:6789
; metadata servers
[mds]
       debug mds = 20
       mds log max segments = 2
       keyring = /etc/ceph/keyring.$name
[mds0]
       host = h1farm182
[mds1]
       host = h1farm183
[osd]
       sudo = true
       osd data = /x02/osd$id
       osd journal = /x02/osd$id/journal
       osd journal size = 100
       keyring = /etc/ceph/keyring.$name
       debug osd = 20
       debug journal = 20
       debug filestore = 20
       ;osd journal size = 100
[osd0]
       host = h1farm182
[osd1]
       host = h1farm183
[osd2]
       host = h1farm184

------- 

Any idea how to improve the situation ?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-08-27 12:18 Write operation is stuck Bogdan Lobodzinski
@ 2010-08-27 15:42 ` Wido den Hollander
  2010-08-27 16:09 ` Sage Weil
  1 sibling, 0 replies; 27+ messages in thread
From: Wido den Hollander @ 2010-08-27 15:42 UTC (permalink / raw)
  To: Bogdan Lobodzinski; +Cc: ceph-devel

Hi Bogdan,

Are you running your OSD data on ext3? It seems that you are hitting
some ext3 bug.

Could you try changing to btrfs? This since ext is not yet fully
supported.

Wido

On Fri, 2010-08-27 at 12:18 +0000, Bogdan Lobodzinski wrote:
> Hello,
> 
> working with ceph on my test configuration 
> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP)
> After starting 
> svn co https://root.cern.ch/svn/root/trunk root
> 
> on the /ceph directory, the command become stuck, and also:
> root      5303  0.0  0.0      0     0 ?        D    Aug26   0:00 [kjournald]
> root     30181  0.0  0.0   6972  2056 pts/1    D+   13:46   0:00 /usr//bin/cosd
> -i 2 -c /etc/ceph/ceph.conf
> 
> any mount, unmount are going also to the state D.
> This is a permanennt behaviour of the ceph if the command is started.
> 
> dmesg shows:
> -------------
> [99048.567704] ------------[ cut here ]------------
> [99048.568767] kernel BUG at
> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384!
> [99048.568767] invalid opcode: 0000 [#1] SMP
> [99048.568767] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device
> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph
> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga
> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac
> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas
> usbhid mptsas mptscsih mptbase scsi_transport_sas
> [99048.596652]
> [99048.596652] Pid: 6258, comm: cosd Tainted: P
> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950
> [99048.596652] EIP: 0060:[<c026dc8d>] EFLAGS: 00210296 CPU: 3
> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000
> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14
> [99048.596652]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300
> task.ti=f5cca000)
> [99048.596652] Stack:
> [99048.596652]  00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c
> f6dd5494 02147fff
> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428
> f1058500 00000000
> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0
> f5ccbcb4 f5ccbc90
> [99048.596652] Call Trace:
> [99048.596652]  [<c026cc88>] ? read_block_bitmap+0x48/0x160
> [99048.596652]  [<c026e048>] ? ext3_new_blocks+0x228/0x6c0
> [99048.596652]  [<c024fbd7>] ? mb_cache_entry_find_first+0x67/0x80
> [99048.596652]  [<c026e505>] ? ext3_new_block+0x25/0x30
> [99048.596652]  [<c02809a4>] ? ext3_xattr_block_set+0x554/0x670
> [99048.596652]  [<c027f589>] ? ext3_xattr_set_entry+0x29/0x350
> [99048.596652]  [<c0280d8b>] ? ext3_xattr_set_handle+0x2cb/0x3e0
> [99048.596652]  [<c0280f15>] ? ext3_xattr_set+0x75/0xc0
> [99048.596652]  [<c0280fd6>] ? ext3_xattr_user_set+0x76/0x80
> [99048.596652]  [<c022dd8c>] ? generic_setxattr+0x9c/0xb0
> [99048.596652]  [<c022dcf0>] ? generic_setxattr+0x0/0xb0
> [99048.596652]  [<c022e984>] ? __vfs_setxattr_noperm+0x44/0x160
> [99048.596652]  [<c02fed4c>] ? cap_inode_setxattr+0x2c/0x60
> [99048.596652]  [<c022eb31>] ? vfs_setxattr+0x91/0xa0
> [99048.596652]  [<c022ebf8>] ? setxattr+0xb8/0x110
> [99048.596652]  [<c021d512>] ? __link_path_walk+0x632/0xca0
> [99048.596652]  [<c014e369>] ? enqueue_task_fair+0x39/0x80
> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
> [99048.596652]  [<c021be45>] ? path_put+0x25/0x30
> [99048.596652]  [<c021ba8b>] ? putname+0x2b/0x40
> [99048.596652]  [<c021ea6a>] ? user_path_at+0x4a/0x80
> [99048.596652]  [<c0183242>] ? sys_futex+0x72/0x120
> [99048.596652]  [<c022ee13>] ? sys_setxattr+0x83/0x90
> [99048.596652]  [<c0109763>] ? sysenter_do_call+0x12/0x28
> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 
> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 ff<0f> 
> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53
> [99048.596652] EIP: [<c026dc8d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> SS:ESP 0068:f5ccbc14
> [99049.044090] ---[ end trace 35860103963ee444 ]---
> h1farm184#
> --------------------
> 
> my ceph.conf is:
> -------
> [global]
>        pid file = /var/run/ceph/$name.pid
>        debug ms = 1
>        keyring = /etc/ceph/keyring.bin
> ; monitors
> [mon]
>        ;Directory for monitor files
>        mon data = /x02/mon$id
>        debug mon = 20
>        debug paxos = 20
>        mon lease wiggle room = 0.5
> 
> [mon0]
>        host = h1farm182
>        mon addr = xxx.xxx.xx.116:6789
> [mon1]
>        host = h1farm183
>        mon addr = xxx.xxx.xx.117:6789
> ; metadata servers
> [mds]
>        debug mds = 20
>        mds log max segments = 2
>        keyring = /etc/ceph/keyring.$name
> [mds0]
>        host = h1farm182
> [mds1]
>        host = h1farm183
> [osd]
>        sudo = true
>        osd data = /x02/osd$id
>        osd journal = /x02/osd$id/journal
>        osd journal size = 100
>        keyring = /etc/ceph/keyring.$name
>        debug osd = 20
>        debug journal = 20
>        debug filestore = 20
>        ;osd journal size = 100
> [osd0]
>        host = h1farm182
> [osd1]
>        host = h1farm183
> [osd2]
>        host = h1farm184
> 
> ------- 
> 
> Any idea how to improve the situation ?
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-08-27 12:18 Write operation is stuck Bogdan Lobodzinski
  2010-08-27 15:42 ` Wido den Hollander
@ 2010-08-27 16:09 ` Sage Weil
  2010-08-30 15:32   ` Bogdan Lobodzinski
  1 sibling, 1 reply; 27+ messages in thread
From: Sage Weil @ 2010-08-27 16:09 UTC (permalink / raw)
  To: Bogdan Lobodzinski; +Cc: ceph-devel

Hi Bogdan,

This is a bug in the ext3 xattr code.  It seems to be gone in 2.6.34 and 
later.  Or, you can switch to btrfs!

sage


On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote:

> Hello,
> 
> working with ceph on my test configuration 
> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP)
> After starting 
> svn co https://root.cern.ch/svn/root/trunk root
> 
> on the /ceph directory, the command become stuck, and also:
> root      5303  0.0  0.0      0     0 ?        D    Aug26   0:00 [kjournald]
> root     30181  0.0  0.0   6972  2056 pts/1    D+   13:46   0:00 /usr//bin/cosd
> -i 2 -c /etc/ceph/ceph.conf
> 
> any mount, unmount are going also to the state D.
> This is a permanennt behaviour of the ceph if the command is started.
> 
> dmesg shows:
> -------------
> [99048.567704] ------------[ cut here ]------------
> [99048.568767] kernel BUG at
> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384!
> [99048.568767] invalid opcode: 0000 [#1] SMP
> [99048.568767] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device
> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph
> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga
> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac
> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas
> usbhid mptsas mptscsih mptbase scsi_transport_sas
> [99048.596652]
> [99048.596652] Pid: 6258, comm: cosd Tainted: P
> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950
> [99048.596652] EIP: 0060:[<c026dc8d>] EFLAGS: 00210296 CPU: 3
> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000
> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14
> [99048.596652]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300
> task.ti=f5cca000)
> [99048.596652] Stack:
> [99048.596652]  00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c
> f6dd5494 02147fff
> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428
> f1058500 00000000
> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0
> f5ccbcb4 f5ccbc90
> [99048.596652] Call Trace:
> [99048.596652]  [<c026cc88>] ? read_block_bitmap+0x48/0x160
> [99048.596652]  [<c026e048>] ? ext3_new_blocks+0x228/0x6c0
> [99048.596652]  [<c024fbd7>] ? mb_cache_entry_find_first+0x67/0x80
> [99048.596652]  [<c026e505>] ? ext3_new_block+0x25/0x30
> [99048.596652]  [<c02809a4>] ? ext3_xattr_block_set+0x554/0x670
> [99048.596652]  [<c027f589>] ? ext3_xattr_set_entry+0x29/0x350
> [99048.596652]  [<c0280d8b>] ? ext3_xattr_set_handle+0x2cb/0x3e0
> [99048.596652]  [<c0280f15>] ? ext3_xattr_set+0x75/0xc0
> [99048.596652]  [<c0280fd6>] ? ext3_xattr_user_set+0x76/0x80
> [99048.596652]  [<c022dd8c>] ? generic_setxattr+0x9c/0xb0
> [99048.596652]  [<c022dcf0>] ? generic_setxattr+0x0/0xb0
> [99048.596652]  [<c022e984>] ? __vfs_setxattr_noperm+0x44/0x160
> [99048.596652]  [<c02fed4c>] ? cap_inode_setxattr+0x2c/0x60
> [99048.596652]  [<c022eb31>] ? vfs_setxattr+0x91/0xa0
> [99048.596652]  [<c022ebf8>] ? setxattr+0xb8/0x110
> [99048.596652]  [<c021d512>] ? __link_path_walk+0x632/0xca0
> [99048.596652]  [<c014e369>] ? enqueue_task_fair+0x39/0x80
> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
> [99048.596652]  [<c021be45>] ? path_put+0x25/0x30
> [99048.596652]  [<c021ba8b>] ? putname+0x2b/0x40
> [99048.596652]  [<c021ea6a>] ? user_path_at+0x4a/0x80
> [99048.596652]  [<c0183242>] ? sys_futex+0x72/0x120
> [99048.596652]  [<c022ee13>] ? sys_setxattr+0x83/0x90
> [99048.596652]  [<c0109763>] ? sysenter_do_call+0x12/0x28
> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 
> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 ff<0f> 
> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53
> [99048.596652] EIP: [<c026dc8d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> SS:ESP 0068:f5ccbc14
> [99049.044090] ---[ end trace 35860103963ee444 ]---
> h1farm184#
> --------------------
> 
> my ceph.conf is:
> -------
> [global]
>        pid file = /var/run/ceph/$name.pid
>        debug ms = 1
>        keyring = /etc/ceph/keyring.bin
> ; monitors
> [mon]
>        ;Directory for monitor files
>        mon data = /x02/mon$id
>        debug mon = 20
>        debug paxos = 20
>        mon lease wiggle room = 0.5
> 
> [mon0]
>        host = h1farm182
>        mon addr = xxx.xxx.xx.116:6789
> [mon1]
>        host = h1farm183
>        mon addr = xxx.xxx.xx.117:6789
> ; metadata servers
> [mds]
>        debug mds = 20
>        mds log max segments = 2
>        keyring = /etc/ceph/keyring.$name
> [mds0]
>        host = h1farm182
> [mds1]
>        host = h1farm183
> [osd]
>        sudo = true
>        osd data = /x02/osd$id
>        osd journal = /x02/osd$id/journal
>        osd journal size = 100
>        keyring = /etc/ceph/keyring.$name
>        debug osd = 20
>        debug journal = 20
>        debug filestore = 20
>        ;osd journal size = 100
> [osd0]
>        host = h1farm182
> [osd1]
>        host = h1farm183
> [osd2]
>        host = h1farm184
> 
> ------- 
> 
> Any idea how to improve the situation ?
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-08-27 16:09 ` Sage Weil
@ 2010-08-30 15:32   ` Bogdan Lobodzinski
  2010-08-30 19:39     ` Sage Weil
  0 siblings, 1 reply; 27+ messages in thread
From: Bogdan Lobodzinski @ 2010-08-30 15:32 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel


Hello Sage,

I moved to the kernel 2.6.35, keeping ext3 filesystem.
After executing teh same command:
svn co https://root.cern.ch/svn/root/trunk root

System is again dead. The command and kjournald are stuck
bogdan  8539  0.9  0.6  31168 22040 pts/0  DL+  16:44  0:21 svn co https://root.cern.ch/svn/root/trunk root
root    802   0.0  0.0      0     0 ?        D  12:59  0:01 [kjournald]

Looks like the bug is not fixed, 
dmesg shows:
---------
[14325.304068] kernel BUG at 
/build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385!
[14325.304191] invalid opcode: 0000 [#1] SMP
[14325.304263] last sysfs file: 
/sys/devices/pci0000:00/0000:00:00.0/device
[14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss 
sunrpc ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse 
agpgart i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas 
shpchp serio_raw mptsas mptscsih mptbase scsi_transport_sas
[14325.304266]
[14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic 
#20~lucid2-Ubuntu 0DT097/PowerEdge 1950
[14325.304266] EIP: 0060:[<c0274a4d>] EFLAGS: 00210286 CPU: 1
[14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
[14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000
[14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10
[14325.304266]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70 
task.ti=f5822000)
[14325.304266] Stack:
[14325.304266]  000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007 
c8641454 007b7fff
[14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6 
c4063ec0 00000000
[14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8 
f5823cac c0256017
[14325.304266] Call Trace:
[14325.304266]  [<c0273a58>] ? read_block_bitmap+0x48/0x160
[14325.304266]  [<c0274daf>] ? ext3_new_blocks+0x1ff/0x610
[14325.304266]  [<c0256017>] ? mb_cache_entry_find_first+0x67/0x80
[14325.304266]  [<c02751e5>] ? ext3_new_block+0x25/0x30
[14325.304266]  [<c0287721>] ? ext3_xattr_block_set+0x481/0x550
[14325.304266]  [<c0286490>] ? ext3_xattr_set_entry+0x20/0x2f0
[14325.304266]  [<c0287b0b>] ? ext3_xattr_set_handle+0x31b/0x400
[14325.304266]  [<c0287c65>] ? ext3_xattr_set+0x75/0xc0
[14325.304266]  [<c0287d24>] ? ext3_xattr_user_set+0x74/0x80
[14325.304266]  [<c023348b>] ? generic_setxattr+0x9b/0xb0
[14325.304266]  [<c02333f0>] ? generic_setxattr+0x0/0xb0
[14325.304266]  [<c0234084>] ? __vfs_setxattr_noperm+0x44/0x150
[14325.304266]  [<c03017dc>] ? cap_inode_setxattr+0x2c/0x60
[14325.304266]  [<c0234221>] ? vfs_setxattr+0x91/0xa0
[14325.304266]  [<c02342e8>] ? setxattr+0xb8/0x110
[14325.304266]  [<c0221d0e>] ? path_to_nameidata+0x1e/0x50
[14325.304266]  [<c0223492>] ? link_path_walk+0x412/0x890
[14325.304266]  [<c013a159>] ? enqueue_task_fair+0x39/0x80
[14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
[14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
[14325.304266]  [<c022168b>] ? putname+0x2b/0x40
[14325.304266]  [<c022470a>] ? user_path_at+0x4a/0x80
[14325.304266]  [<c0179902>] ? sys_futex+0x72/0x120
[14325.304266]  [<c0234503>] ? sys_setxattr+0x83/0x90
[14325.304266]  [<c05c9bb4>] ? syscall_call+0x7/0xb
[14325.304266]  [<c05c0000>] ? cache_add_dev+0x73/0x195
[14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 
32 ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff 
<0f> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b
[14325.304266] EIP: [<c0274a4d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 
SS:ESP 0068:f5823c10
[14325.326777] ---[ end trace 53e0b3b55af7a83c ]---
[14384.001261] ceph: mds0 caps stale
[14413.616132] ceph:  tid 33594 timed out on osd2, will reset osd
[14628.992279] ceph: mds0 hung
---------

as a next step I wil try to use btrfs .

Cheers,

Bogdan


On Fri, 27 Aug 2010, Sage Weil wrote:

> Hi Bogdan,
>
> This is a bug in the ext3 xattr code.  It seems to be gone in 2.6.34 and
> later.  Or, you can switch to btrfs!
>
> sage
>
>
> On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote:
>
>> Hello,
>>
>> working with ceph on my test configuration
>> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP)
>> After starting
>> svn co https://root.cern.ch/svn/root/trunk root
>>
>> on the /ceph directory, the command become stuck, and also:
>> root      5303  0.0  0.0      0     0 ?        D    Aug26   0:00 [kjournald]
>> root     30181  0.0  0.0   6972  2056 pts/1    D+   13:46   0:00 /usr//bin/cosd
>> -i 2 -c /etc/ceph/ceph.conf
>>
>> any mount, unmount are going also to the state D.
>> This is a permanennt behaviour of the ceph if the command is started.
>>
>> dmesg shows:
>> -------------
>> [99048.567704] ------------[ cut here ]------------
>> [99048.568767] kernel BUG at
>> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384!
>> [99048.568767] invalid opcode: 0000 [#1] SMP
>> [99048.568767] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device
>> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph
>> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga
>> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac
>> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas
>> usbhid mptsas mptscsih mptbase scsi_transport_sas
>> [99048.596652]
>> [99048.596652] Pid: 6258, comm: cosd Tainted: P
>> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950
>> [99048.596652] EIP: 0060:[<c026dc8d>] EFLAGS: 00210296 CPU: 3
>> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000
>> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14
>> [99048.596652]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300
>> task.ti=f5cca000)
>> [99048.596652] Stack:
>> [99048.596652]  00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c
>> f6dd5494 02147fff
>> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428
>> f1058500 00000000
>> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0
>> f5ccbcb4 f5ccbc90
>> [99048.596652] Call Trace:
>> [99048.596652]  [<c026cc88>] ? read_block_bitmap+0x48/0x160
>> [99048.596652]  [<c026e048>] ? ext3_new_blocks+0x228/0x6c0
>> [99048.596652]  [<c024fbd7>] ? mb_cache_entry_find_first+0x67/0x80
>> [99048.596652]  [<c026e505>] ? ext3_new_block+0x25/0x30
>> [99048.596652]  [<c02809a4>] ? ext3_xattr_block_set+0x554/0x670
>> [99048.596652]  [<c027f589>] ? ext3_xattr_set_entry+0x29/0x350
>> [99048.596652]  [<c0280d8b>] ? ext3_xattr_set_handle+0x2cb/0x3e0
>> [99048.596652]  [<c0280f15>] ? ext3_xattr_set+0x75/0xc0
>> [99048.596652]  [<c0280fd6>] ? ext3_xattr_user_set+0x76/0x80
>> [99048.596652]  [<c022dd8c>] ? generic_setxattr+0x9c/0xb0
>> [99048.596652]  [<c022dcf0>] ? generic_setxattr+0x0/0xb0
>> [99048.596652]  [<c022e984>] ? __vfs_setxattr_noperm+0x44/0x160
>> [99048.596652]  [<c02fed4c>] ? cap_inode_setxattr+0x2c/0x60
>> [99048.596652]  [<c022eb31>] ? vfs_setxattr+0x91/0xa0
>> [99048.596652]  [<c022ebf8>] ? setxattr+0xb8/0x110
>> [99048.596652]  [<c021d512>] ? __link_path_walk+0x632/0xca0
>> [99048.596652]  [<c014e369>] ? enqueue_task_fair+0x39/0x80
>> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
>> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
>> [99048.596652]  [<c021be45>] ? path_put+0x25/0x30
>> [99048.596652]  [<c021ba8b>] ? putname+0x2b/0x40
>> [99048.596652]  [<c021ea6a>] ? user_path_at+0x4a/0x80
>> [99048.596652]  [<c0183242>] ? sys_futex+0x72/0x120
>> [99048.596652]  [<c022ee13>] ? sys_setxattr+0x83/0x90
>> [99048.596652]  [<c0109763>] ? sysenter_do_call+0x12/0x28
>> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83
>> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 ff<0f>
>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53
>> [99048.596652] EIP: [<c026dc8d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>> SS:ESP 0068:f5ccbc14
>> [99049.044090] ---[ end trace 35860103963ee444 ]---
>> h1farm184#
>> --------------------
>>
>> my ceph.conf is:
>> -------
>> [global]
>>        pid file = /var/run/ceph/$name.pid
>>        debug ms = 1
>>        keyring = /etc/ceph/keyring.bin
>> ; monitors
>> [mon]
>>        ;Directory for monitor files
>>        mon data = /x02/mon$id
>>        debug mon = 20
>>        debug paxos = 20
>>        mon lease wiggle room = 0.5
>>
>> [mon0]
>>        host = h1farm182
>>        mon addr = xxx.xxx.xx.116:6789
>> [mon1]
>>        host = h1farm183
>>        mon addr = xxx.xxx.xx.117:6789
>> ; metadata servers
>> [mds]
>>        debug mds = 20
>>        mds log max segments = 2
>>        keyring = /etc/ceph/keyring.$name
>> [mds0]
>>        host = h1farm182
>> [mds1]
>>        host = h1farm183
>> [osd]
>>        sudo = true
>>        osd data = /x02/osd$id
>>        osd journal = /x02/osd$id/journal
>>        osd journal size = 100
>>        keyring = /etc/ceph/keyring.$name
>>        debug osd = 20
>>        debug journal = 20
>>        debug filestore = 20
>>        ;osd journal size = 100
>> [osd0]
>>        host = h1farm182
>> [osd1]
>>        host = h1farm183
>> [osd2]
>>        host = h1farm184
>>
>> -------
>>
>> Any idea how to improve the situation ?
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-08-30 15:32   ` Bogdan Lobodzinski
@ 2010-08-30 19:39     ` Sage Weil
  2010-08-31  7:56       ` Bogdan Lobodzinski
  0 siblings, 1 reply; 27+ messages in thread
From: Sage Weil @ 2010-08-30 19:39 UTC (permalink / raw)
  To: Bogdan Lobodzinski; +Cc: ceph-devel

On Mon, 30 Aug 2010, Bogdan Lobodzinski wrote:
> 
> Hello Sage,
> 
> I moved to the kernel 2.6.35, keeping ext3 filesystem.
> After executing teh same command:
> svn co https://root.cern.ch/svn/root/trunk root
> 
> System is again dead. The command and kjournald are stuck
> bogdan  8539  0.9  0.6  31168 22040 pts/0  DL+  16:44  0:21 svn co
> https://root.cern.ch/svn/root/trunk root
> root    802   0.0  0.0      0     0 ?        D  12:59  0:01 [kjournald]

Hmm.  Have you tried ext4?

I stopped seeing this on my own machine with recent kernels, but it looks 
like it isn't in fact fixed.  This should be reported to the ext4 list.  
Are you running ceph via vstart.sh or a custom ceph.conf?

sage

> 
> Looks like the bug is not fixed, dmesg shows:
> ---------
> [14325.304068] kernel BUG at
> /build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385!
> [14325.304191] invalid opcode: 0000 [#1] SMP
> [14325.304263] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device
> [14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc
> ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse agpgart
> i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas shpchp
> serio_raw mptsas mptscsih mptbase scsi_transport_sas
> [14325.304266]
> [14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic
> #20~lucid2-Ubuntu 0DT097/PowerEdge 1950
> [14325.304266] EIP: 0060:[<c0274a4d>] EFLAGS: 00210286 CPU: 1
> [14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> [14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000
> [14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10
> [14325.304266]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70
> task.ti=f5822000)
> [14325.304266] Stack:
> [14325.304266]  000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007 c8641454
> 007b7fff
> [14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6
> c4063ec0 00000000
> [14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8
> f5823cac c0256017
> [14325.304266] Call Trace:
> [14325.304266]  [<c0273a58>] ? read_block_bitmap+0x48/0x160
> [14325.304266]  [<c0274daf>] ? ext3_new_blocks+0x1ff/0x610
> [14325.304266]  [<c0256017>] ? mb_cache_entry_find_first+0x67/0x80
> [14325.304266]  [<c02751e5>] ? ext3_new_block+0x25/0x30
> [14325.304266]  [<c0287721>] ? ext3_xattr_block_set+0x481/0x550
> [14325.304266]  [<c0286490>] ? ext3_xattr_set_entry+0x20/0x2f0
> [14325.304266]  [<c0287b0b>] ? ext3_xattr_set_handle+0x31b/0x400
> [14325.304266]  [<c0287c65>] ? ext3_xattr_set+0x75/0xc0
> [14325.304266]  [<c0287d24>] ? ext3_xattr_user_set+0x74/0x80
> [14325.304266]  [<c023348b>] ? generic_setxattr+0x9b/0xb0
> [14325.304266]  [<c02333f0>] ? generic_setxattr+0x0/0xb0
> [14325.304266]  [<c0234084>] ? __vfs_setxattr_noperm+0x44/0x150
> [14325.304266]  [<c03017dc>] ? cap_inode_setxattr+0x2c/0x60
> [14325.304266]  [<c0234221>] ? vfs_setxattr+0x91/0xa0
> [14325.304266]  [<c02342e8>] ? setxattr+0xb8/0x110
> [14325.304266]  [<c0221d0e>] ? path_to_nameidata+0x1e/0x50
> [14325.304266]  [<c0223492>] ? link_path_walk+0x412/0x890
> [14325.304266]  [<c013a159>] ? enqueue_task_fair+0x39/0x80
> [14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
> [14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
> [14325.304266]  [<c022168b>] ? putname+0x2b/0x40
> [14325.304266]  [<c022470a>] ? user_path_at+0x4a/0x80
> [14325.304266]  [<c0179902>] ? sys_futex+0x72/0x120
> [14325.304266]  [<c0234503>] ? sys_setxattr+0x83/0x90
> [14325.304266]  [<c05c9bb4>] ? syscall_call+0x7/0xb
> [14325.304266]  [<c05c0000>] ? cache_add_dev+0x73/0x195
> [14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 32
> ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff <0f>
> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b
> [14325.304266] EIP: [<c0274a4d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> SS:ESP 0068:f5823c10
> [14325.326777] ---[ end trace 53e0b3b55af7a83c ]---
> [14384.001261] ceph: mds0 caps stale
> [14413.616132] ceph:  tid 33594 timed out on osd2, will reset osd
> [14628.992279] ceph: mds0 hung
> ---------
> 
> as a next step I wil try to use btrfs .
> 
> Cheers,
> 
> Bogdan
> 
> 
> On Fri, 27 Aug 2010, Sage Weil wrote:
> 
> > Hi Bogdan,
> > 
> > This is a bug in the ext3 xattr code.  It seems to be gone in 2.6.34 and
> > later.  Or, you can switch to btrfs!
> > 
> > sage
> > 
> > 
> > On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote:
> > 
> > > Hello,
> > > 
> > > working with ceph on my test configuration
> > > (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP)
> > > After starting
> > > svn co https://root.cern.ch/svn/root/trunk root
> > > 
> > > on the /ceph directory, the command become stuck, and also:
> > > root      5303  0.0  0.0      0     0 ?        D    Aug26   0:00
> > > [kjournald]
> > > root     30181  0.0  0.0   6972  2056 pts/1    D+   13:46   0:00
> > > /usr//bin/cosd
> > > -i 2 -c /etc/ceph/ceph.conf
> > > 
> > > any mount, unmount are going also to the state D.
> > > This is a permanennt behaviour of the ceph if the command is started.
> > > 
> > > dmesg shows:
> > > -------------
> > > [99048.567704] ------------[ cut here ]------------
> > > [99048.568767] kernel BUG at
> > > /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384!
> > > [99048.568767] invalid opcode: 0000 [#1] SMP
> > > [99048.568767] last sysfs file:
> > > /sys/devices/pci0000:00/0000:00:00.0/device
> > > [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc
> > > ceph
> > > crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga
> > > vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac
> > > edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas
> > > usbhid mptsas mptscsih mptbase scsi_transport_sas
> > > [99048.596652]
> > > [99048.596652] Pid: 6258, comm: cosd Tainted: P
> > > (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950
> > > [99048.596652] EIP: 0060:[<c026dc8d>] EFLAGS: 00210296 CPU: 3
> > > [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> > > [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000
> > > [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14
> > > [99048.596652]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > > [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300
> > > task.ti=f5cca000)
> > > [99048.596652] Stack:
> > > [99048.596652]  00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c
> > > f6dd5494 02147fff
> > > [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428
> > > f1058500 00000000
> > > [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0
> > > f5ccbcb4 f5ccbc90
> > > [99048.596652] Call Trace:
> > > [99048.596652]  [<c026cc88>] ? read_block_bitmap+0x48/0x160
> > > [99048.596652]  [<c026e048>] ? ext3_new_blocks+0x228/0x6c0
> > > [99048.596652]  [<c024fbd7>] ? mb_cache_entry_find_first+0x67/0x80
> > > [99048.596652]  [<c026e505>] ? ext3_new_block+0x25/0x30
> > > [99048.596652]  [<c02809a4>] ? ext3_xattr_block_set+0x554/0x670
> > > [99048.596652]  [<c027f589>] ? ext3_xattr_set_entry+0x29/0x350
> > > [99048.596652]  [<c0280d8b>] ? ext3_xattr_set_handle+0x2cb/0x3e0
> > > [99048.596652]  [<c0280f15>] ? ext3_xattr_set+0x75/0xc0
> > > [99048.596652]  [<c0280fd6>] ? ext3_xattr_user_set+0x76/0x80
> > > [99048.596652]  [<c022dd8c>] ? generic_setxattr+0x9c/0xb0
> > > [99048.596652]  [<c022dcf0>] ? generic_setxattr+0x0/0xb0
> > > [99048.596652]  [<c022e984>] ? __vfs_setxattr_noperm+0x44/0x160
> > > [99048.596652]  [<c02fed4c>] ? cap_inode_setxattr+0x2c/0x60
> > > [99048.596652]  [<c022eb31>] ? vfs_setxattr+0x91/0xa0
> > > [99048.596652]  [<c022ebf8>] ? setxattr+0xb8/0x110
> > > [99048.596652]  [<c021d512>] ? __link_path_walk+0x632/0xca0
> > > [99048.596652]  [<c014e369>] ? enqueue_task_fair+0x39/0x80
> > > [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
> > > [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
> > > [99048.596652]  [<c021be45>] ? path_put+0x25/0x30
> > > [99048.596652]  [<c021ba8b>] ? putname+0x2b/0x40
> > > [99048.596652]  [<c021ea6a>] ? user_path_at+0x4a/0x80
> > > [99048.596652]  [<c0183242>] ? sys_futex+0x72/0x120
> > > [99048.596652]  [<c022ee13>] ? sys_setxattr+0x83/0x90
> > > [99048.596652]  [<c0109763>] ? sysenter_do_call+0x12/0x28
> > > [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83
> > > 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1
> > > ff<0f>
> > > 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53
> > > [99048.596652] EIP: [<c026dc8d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> > > SS:ESP 0068:f5ccbc14
> > > [99049.044090] ---[ end trace 35860103963ee444 ]---
> > > h1farm184#
> > > --------------------
> > > 
> > > my ceph.conf is:
> > > -------
> > > [global]
> > >        pid file = /var/run/ceph/$name.pid
> > >        debug ms = 1
> > >        keyring = /etc/ceph/keyring.bin
> > > ; monitors
> > > [mon]
> > >        ;Directory for monitor files
> > >        mon data = /x02/mon$id
> > >        debug mon = 20
> > >        debug paxos = 20
> > >        mon lease wiggle room = 0.5
> > > 
> > > [mon0]
> > >        host = h1farm182
> > >        mon addr = xxx.xxx.xx.116:6789
> > > [mon1]
> > >        host = h1farm183
> > >        mon addr = xxx.xxx.xx.117:6789
> > > ; metadata servers
> > > [mds]
> > >        debug mds = 20
> > >        mds log max segments = 2
> > >        keyring = /etc/ceph/keyring.$name
> > > [mds0]
> > >        host = h1farm182
> > > [mds1]
> > >        host = h1farm183
> > > [osd]
> > >        sudo = true
> > >        osd data = /x02/osd$id
> > >        osd journal = /x02/osd$id/journal
> > >        osd journal size = 100
> > >        keyring = /etc/ceph/keyring.$name
> > >        debug osd = 20
> > >        debug journal = 20
> > >        debug filestore = 20
> > >        ;osd journal size = 100
> > > [osd0]
> > >        host = h1farm182
> > > [osd1]
> > >        host = h1farm183
> > > [osd2]
> > >        host = h1farm184
> > > 
> > > -------
> > > 
> > > Any idea how to improve the situation ?
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-08-30 19:39     ` Sage Weil
@ 2010-08-31  7:56       ` Bogdan Lobodzinski
  2010-09-01 15:21         ` Bogdan Lobodzinski
  0 siblings, 1 reply; 27+ messages in thread
From: Bogdan Lobodzinski @ 2010-08-31  7:56 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel


Hello Sage,

On Mon, 30 Aug 2010, Sage Weil wrote:

> On Mon, 30 Aug 2010, Bogdan Lobodzinski wrote:
>>
>> Hello Sage,
>>
>> I moved to the kernel 2.6.35, keeping ext3 filesystem.
>> After executing teh same command:
>> svn co https://root.cern.ch/svn/root/trunk root
>>
>> System is again dead. The command and kjournald are stuck
>> bogdan  8539  0.9  0.6  31168 22040 pts/0  DL+  16:44  0:21 svn co
>> https://root.cern.ch/svn/root/trunk root
>> root    802   0.0  0.0      0     0 ?        D  12:59  0:01 [kjournald]
>
> Hmm.  Have you tried ext4?
>
> I stopped seeing this on my own machine with recent kernels, but it looks
> like it isn't in fact fixed.  This should be reported to the ext4 list.
> Are you running ceph via vstart.sh or a custom ceph.conf?
I am using vstart.sh taken from compiled by me source 
tarball ceph-0.21.tar.gz  (http://ceph.newdream.net/download/)
and the client from
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client-standalone.git

Cheers,

Bogdan

>
> sage
>
>>
>> Looks like the bug is not fixed, dmesg shows:
>> ---------
>> [14325.304068] kernel BUG at
>> /build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385!
>> [14325.304191] invalid opcode: 0000 [#1] SMP
>> [14325.304263] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device
>> [14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc
>> ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse agpgart
>> i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas shpchp
>> serio_raw mptsas mptscsih mptbase scsi_transport_sas
>> [14325.304266]
>> [14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic
>> #20~lucid2-Ubuntu 0DT097/PowerEdge 1950
>> [14325.304266] EIP: 0060:[<c0274a4d>] EFLAGS: 00210286 CPU: 1
>> [14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>> [14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000
>> [14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10
>> [14325.304266]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>> [14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70
>> task.ti=f5822000)
>> [14325.304266] Stack:
>> [14325.304266]  000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007 c8641454
>> 007b7fff
>> [14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6
>> c4063ec0 00000000
>> [14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8
>> f5823cac c0256017
>> [14325.304266] Call Trace:
>> [14325.304266]  [<c0273a58>] ? read_block_bitmap+0x48/0x160
>> [14325.304266]  [<c0274daf>] ? ext3_new_blocks+0x1ff/0x610
>> [14325.304266]  [<c0256017>] ? mb_cache_entry_find_first+0x67/0x80
>> [14325.304266]  [<c02751e5>] ? ext3_new_block+0x25/0x30
>> [14325.304266]  [<c0287721>] ? ext3_xattr_block_set+0x481/0x550
>> [14325.304266]  [<c0286490>] ? ext3_xattr_set_entry+0x20/0x2f0
>> [14325.304266]  [<c0287b0b>] ? ext3_xattr_set_handle+0x31b/0x400
>> [14325.304266]  [<c0287c65>] ? ext3_xattr_set+0x75/0xc0
>> [14325.304266]  [<c0287d24>] ? ext3_xattr_user_set+0x74/0x80
>> [14325.304266]  [<c023348b>] ? generic_setxattr+0x9b/0xb0
>> [14325.304266]  [<c02333f0>] ? generic_setxattr+0x0/0xb0
>> [14325.304266]  [<c0234084>] ? __vfs_setxattr_noperm+0x44/0x150
>> [14325.304266]  [<c03017dc>] ? cap_inode_setxattr+0x2c/0x60
>> [14325.304266]  [<c0234221>] ? vfs_setxattr+0x91/0xa0
>> [14325.304266]  [<c02342e8>] ? setxattr+0xb8/0x110
>> [14325.304266]  [<c0221d0e>] ? path_to_nameidata+0x1e/0x50
>> [14325.304266]  [<c0223492>] ? link_path_walk+0x412/0x890
>> [14325.304266]  [<c013a159>] ? enqueue_task_fair+0x39/0x80
>> [14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
>> [14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
>> [14325.304266]  [<c022168b>] ? putname+0x2b/0x40
>> [14325.304266]  [<c022470a>] ? user_path_at+0x4a/0x80
>> [14325.304266]  [<c0179902>] ? sys_futex+0x72/0x120
>> [14325.304266]  [<c0234503>] ? sys_setxattr+0x83/0x90
>> [14325.304266]  [<c05c9bb4>] ? syscall_call+0x7/0xb
>> [14325.304266]  [<c05c0000>] ? cache_add_dev+0x73/0x195
>> [14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 32
>> ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff <0f>
>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b
>> [14325.304266] EIP: [<c0274a4d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>> SS:ESP 0068:f5823c10
>> [14325.326777] ---[ end trace 53e0b3b55af7a83c ]---
>> [14384.001261] ceph: mds0 caps stale
>> [14413.616132] ceph:  tid 33594 timed out on osd2, will reset osd
>> [14628.992279] ceph: mds0 hung
>> ---------
>>
>> as a next step I wil try to use btrfs .
>>
>> Cheers,
>>
>> Bogdan
>>
>>
>> On Fri, 27 Aug 2010, Sage Weil wrote:
>>
>>> Hi Bogdan,
>>>
>>> This is a bug in the ext3 xattr code.  It seems to be gone in 2.6.34 and
>>> later.  Or, you can switch to btrfs!
>>>
>>> sage
>>>
>>>
>>> On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote:
>>>
>>>> Hello,
>>>>
>>>> working with ceph on my test configuration
>>>> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP)
>>>> After starting
>>>> svn co https://root.cern.ch/svn/root/trunk root
>>>>
>>>> on the /ceph directory, the command become stuck, and also:
>>>> root      5303  0.0  0.0      0     0 ?        D    Aug26   0:00
>>>> [kjournald]
>>>> root     30181  0.0  0.0   6972  2056 pts/1    D+   13:46   0:00
>>>> /usr//bin/cosd
>>>> -i 2 -c /etc/ceph/ceph.conf
>>>>
>>>> any mount, unmount are going also to the state D.
>>>> This is a permanennt behaviour of the ceph if the command is started.
>>>>
>>>> dmesg shows:
>>>> -------------
>>>> [99048.567704] ------------[ cut here ]------------
>>>> [99048.568767] kernel BUG at
>>>> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384!
>>>> [99048.568767] invalid opcode: 0000 [#1] SMP
>>>> [99048.568767] last sysfs file:
>>>> /sys/devices/pci0000:00/0000:00:00.0/device
>>>> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc
>>>> ceph
>>>> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga
>>>> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac
>>>> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas
>>>> usbhid mptsas mptscsih mptbase scsi_transport_sas
>>>> [99048.596652]
>>>> [99048.596652] Pid: 6258, comm: cosd Tainted: P
>>>> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950
>>>> [99048.596652] EIP: 0060:[<c026dc8d>] EFLAGS: 00210296 CPU: 3
>>>> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>>>> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000
>>>> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14
>>>> [99048.596652]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>>>> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300
>>>> task.ti=f5cca000)
>>>> [99048.596652] Stack:
>>>> [99048.596652]  00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c
>>>> f6dd5494 02147fff
>>>> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428
>>>> f1058500 00000000
>>>> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0
>>>> f5ccbcb4 f5ccbc90
>>>> [99048.596652] Call Trace:
>>>> [99048.596652]  [<c026cc88>] ? read_block_bitmap+0x48/0x160
>>>> [99048.596652]  [<c026e048>] ? ext3_new_blocks+0x228/0x6c0
>>>> [99048.596652]  [<c024fbd7>] ? mb_cache_entry_find_first+0x67/0x80
>>>> [99048.596652]  [<c026e505>] ? ext3_new_block+0x25/0x30
>>>> [99048.596652]  [<c02809a4>] ? ext3_xattr_block_set+0x554/0x670
>>>> [99048.596652]  [<c027f589>] ? ext3_xattr_set_entry+0x29/0x350
>>>> [99048.596652]  [<c0280d8b>] ? ext3_xattr_set_handle+0x2cb/0x3e0
>>>> [99048.596652]  [<c0280f15>] ? ext3_xattr_set+0x75/0xc0
>>>> [99048.596652]  [<c0280fd6>] ? ext3_xattr_user_set+0x76/0x80
>>>> [99048.596652]  [<c022dd8c>] ? generic_setxattr+0x9c/0xb0
>>>> [99048.596652]  [<c022dcf0>] ? generic_setxattr+0x0/0xb0
>>>> [99048.596652]  [<c022e984>] ? __vfs_setxattr_noperm+0x44/0x160
>>>> [99048.596652]  [<c02fed4c>] ? cap_inode_setxattr+0x2c/0x60
>>>> [99048.596652]  [<c022eb31>] ? vfs_setxattr+0x91/0xa0
>>>> [99048.596652]  [<c022ebf8>] ? setxattr+0xb8/0x110
>>>> [99048.596652]  [<c021d512>] ? __link_path_walk+0x632/0xca0
>>>> [99048.596652]  [<c014e369>] ? enqueue_task_fair+0x39/0x80
>>>> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
>>>> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
>>>> [99048.596652]  [<c021be45>] ? path_put+0x25/0x30
>>>> [99048.596652]  [<c021ba8b>] ? putname+0x2b/0x40
>>>> [99048.596652]  [<c021ea6a>] ? user_path_at+0x4a/0x80
>>>> [99048.596652]  [<c0183242>] ? sys_futex+0x72/0x120
>>>> [99048.596652]  [<c022ee13>] ? sys_setxattr+0x83/0x90
>>>> [99048.596652]  [<c0109763>] ? sysenter_do_call+0x12/0x28
>>>> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83
>>>> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1
>>>> ff<0f>
>>>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53
>>>> [99048.596652] EIP: [<c026dc8d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>>>> SS:ESP 0068:f5ccbc14
>>>> [99049.044090] ---[ end trace 35860103963ee444 ]---
>>>> h1farm184#
>>>> --------------------
>>>>
>>>> my ceph.conf is:
>>>> -------
>>>> [global]
>>>>        pid file = /var/run/ceph/$name.pid
>>>>        debug ms = 1
>>>>        keyring = /etc/ceph/keyring.bin
>>>> ; monitors
>>>> [mon]
>>>>        ;Directory for monitor files
>>>>        mon data = /x02/mon$id
>>>>        debug mon = 20
>>>>        debug paxos = 20
>>>>        mon lease wiggle room = 0.5
>>>>
>>>> [mon0]
>>>>        host = h1farm182
>>>>        mon addr = xxx.xxx.xx.116:6789
>>>> [mon1]
>>>>        host = h1farm183
>>>>        mon addr = xxx.xxx.xx.117:6789
>>>> ; metadata servers
>>>> [mds]
>>>>        debug mds = 20
>>>>        mds log max segments = 2
>>>>        keyring = /etc/ceph/keyring.$name
>>>> [mds0]
>>>>        host = h1farm182
>>>> [mds1]
>>>>        host = h1farm183
>>>> [osd]
>>>>        sudo = true
>>>>        osd data = /x02/osd$id
>>>>        osd journal = /x02/osd$id/journal
>>>>        osd journal size = 100
>>>>        keyring = /etc/ceph/keyring.$name
>>>>        debug osd = 20
>>>>        debug journal = 20
>>>>        debug filestore = 20
>>>>        ;osd journal size = 100
>>>> [osd0]
>>>>        host = h1farm182
>>>> [osd1]
>>>>        host = h1farm183
>>>> [osd2]
>>>>        host = h1farm184
>>>>
>>>> -------
>>>>
>>>> Any idea how to improve the situation ?
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-08-31  7:56       ` Bogdan Lobodzinski
@ 2010-09-01 15:21         ` Bogdan Lobodzinski
  2010-09-01 19:29           ` Wido den Hollander
  0 siblings, 1 reply; 27+ messages in thread
From: Bogdan Lobodzinski @ 2010-09-01 15:21 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel


Hello Sage,

replacing ext3 by btrfs my ceph test-bed survived my test comand:
svn co https://root.cern.ch/svn/root/trunk root

I didn't try ext4.

However, I did a few changes in my initial ceph.conf.
Could you, please, check if such a configuration is reasonable ?
Is it correct to use "osd journal" location as it is done below ?

My new ceph.conf:
-----------
[global]
        pid file = /var/run/ceph/$name.pid
        debug ms = 1
        keyring = /etc/ceph/keyring.bin
[mon]
        mon data = /x01/mon$id
        debug mon = 20
        debug paxos = 20
        mon lease wiggle room = 0.5
[mon0]
        host = h1farm182
        mon addr = xxx.xxx.xxx.116:6789
[mon1]
        host = h1farm183
        mon addr = xxx.xxx.xxx.117:6789
[mds]
        debug mds = 10
        mds log max segments = 2
        keyring = /etc/ceph/keyring.$name
[mds0]
        host = h1farm182
[mds1]
        host = h1farm183
[osd]
        sudo = true
        keyring = /etc/ceph/keyring.$name
        osd data = /x02/osd$id
        osd journal = /x02/osd$id/journal
        osd journal size = 100
        debug osd = 20
        debug journal = 20
        debug filestore = 20
[osd0]
        host = h1farm183
        btrfs devs = /dev/sdb1
[osd1]
        host = h1farm184
        btrfs devs = /dev/sdb1
-----------

Thank you for help,

Cheers,

Bogdan


On Tue, 31 Aug 2010, Bogdan Lobodzinski wrote:

>
> Hello Sage,
>
> On Mon, 30 Aug 2010, Sage Weil wrote:
>
>> On Mon, 30 Aug 2010, Bogdan Lobodzinski wrote:
>>> 
>>> Hello Sage,
>>> 
>>> I moved to the kernel 2.6.35, keeping ext3 filesystem.
>>> After executing teh same command:
>>> svn co https://root.cern.ch/svn/root/trunk root
>>> 
>>> System is again dead. The command and kjournald are stuck
>>> bogdan  8539  0.9  0.6  31168 22040 pts/0  DL+  16:44  0:21 svn co
>>> https://root.cern.ch/svn/root/trunk root
>>> root    802   0.0  0.0      0     0 ?        D  12:59  0:01 [kjournald]
>> 
>> Hmm.  Have you tried ext4?
>> 
>> I stopped seeing this on my own machine with recent kernels, but it looks
>> like it isn't in fact fixed.  This should be reported to the ext4 list.
>> Are you running ceph via vstart.sh or a custom ceph.conf?
> I am using vstart.sh taken from compiled by me source tarball 
> ceph-0.21.tar.gz  (http://ceph.newdream.net/download/)
> and the client from
> git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client-standalone.git
>
> Cheers,
>
> Bogdan
>
>> 
>> sage
>> 
>>> 
>>> Looks like the bug is not fixed, dmesg shows:
>>> ---------
>>> [14325.304068] kernel BUG at
>>> /build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385!
>>> [14325.304191] invalid opcode: 0000 [#1] SMP
>>> [14325.304263] last sysfs file: 
>>> /sys/devices/pci0000:00/0000:00:00.0/device
>>> [14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss 
>>> sunrpc
>>> ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse agpgart
>>> i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas shpchp
>>> serio_raw mptsas mptscsih mptbase scsi_transport_sas
>>> [14325.304266]
>>> [14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic
>>> #20~lucid2-Ubuntu 0DT097/PowerEdge 1950
>>> [14325.304266] EIP: 0060:[<c0274a4d>] EFLAGS: 00210286 CPU: 1
>>> [14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>>> [14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000
>>> [14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10
>>> [14325.304266]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>>> [14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70
>>> task.ti=f5822000)
>>> [14325.304266] Stack:
>>> [14325.304266]  000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007 
>>> c8641454
>>> 007b7fff
>>> [14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6
>>> c4063ec0 00000000
>>> [14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8
>>> f5823cac c0256017
>>> [14325.304266] Call Trace:
>>> [14325.304266]  [<c0273a58>] ? read_block_bitmap+0x48/0x160
>>> [14325.304266]  [<c0274daf>] ? ext3_new_blocks+0x1ff/0x610
>>> [14325.304266]  [<c0256017>] ? mb_cache_entry_find_first+0x67/0x80
>>> [14325.304266]  [<c02751e5>] ? ext3_new_block+0x25/0x30
>>> [14325.304266]  [<c0287721>] ? ext3_xattr_block_set+0x481/0x550
>>> [14325.304266]  [<c0286490>] ? ext3_xattr_set_entry+0x20/0x2f0
>>> [14325.304266]  [<c0287b0b>] ? ext3_xattr_set_handle+0x31b/0x400
>>> [14325.304266]  [<c0287c65>] ? ext3_xattr_set+0x75/0xc0
>>> [14325.304266]  [<c0287d24>] ? ext3_xattr_user_set+0x74/0x80
>>> [14325.304266]  [<c023348b>] ? generic_setxattr+0x9b/0xb0
>>> [14325.304266]  [<c02333f0>] ? generic_setxattr+0x0/0xb0
>>> [14325.304266]  [<c0234084>] ? __vfs_setxattr_noperm+0x44/0x150
>>> [14325.304266]  [<c03017dc>] ? cap_inode_setxattr+0x2c/0x60
>>> [14325.304266]  [<c0234221>] ? vfs_setxattr+0x91/0xa0
>>> [14325.304266]  [<c02342e8>] ? setxattr+0xb8/0x110
>>> [14325.304266]  [<c0221d0e>] ? path_to_nameidata+0x1e/0x50
>>> [14325.304266]  [<c0223492>] ? link_path_walk+0x412/0x890
>>> [14325.304266]  [<c013a159>] ? enqueue_task_fair+0x39/0x80
>>> [14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
>>> [14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
>>> [14325.304266]  [<c022168b>] ? putname+0x2b/0x40
>>> [14325.304266]  [<c022470a>] ? user_path_at+0x4a/0x80
>>> [14325.304266]  [<c0179902>] ? sys_futex+0x72/0x120
>>> [14325.304266]  [<c0234503>] ? sys_setxattr+0x83/0x90
>>> [14325.304266]  [<c05c9bb4>] ? syscall_call+0x7/0xb
>>> [14325.304266]  [<c05c0000>] ? cache_add_dev+0x73/0x195
>>> [14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 
>>> 32
>>> ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff 
>>> <0f>
>>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b
>>> [14325.304266] EIP: [<c0274a4d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>>> SS:ESP 0068:f5823c10
>>> [14325.326777] ---[ end trace 53e0b3b55af7a83c ]---
>>> [14384.001261] ceph: mds0 caps stale
>>> [14413.616132] ceph:  tid 33594 timed out on osd2, will reset osd
>>> [14628.992279] ceph: mds0 hung
>>> ---------
>>> 
>>> as a next step I wil try to use btrfs .
>>> 
>>> Cheers,
>>> 
>>> Bogdan
>>> 
>>> 
>>> On Fri, 27 Aug 2010, Sage Weil wrote:
>>> 
>>>> Hi Bogdan,
>>>> 
>>>> This is a bug in the ext3 xattr code.  It seems to be gone in 2.6.34 and
>>>> later.  Or, you can switch to btrfs!
>>>> 
>>>> sage
>>>> 
>>>> 
>>>> On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> working with ceph on my test configuration
>>>>> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP)
>>>>> After starting
>>>>> svn co https://root.cern.ch/svn/root/trunk root
>>>>> 
>>>>> on the /ceph directory, the command become stuck, and also:
>>>>> root      5303  0.0  0.0      0     0 ?        D    Aug26   0:00
>>>>> [kjournald]
>>>>> root     30181  0.0  0.0   6972  2056 pts/1    D+   13:46   0:00
>>>>> /usr//bin/cosd
>>>>> -i 2 -c /etc/ceph/ceph.conf
>>>>> 
>>>>> any mount, unmount are going also to the state D.
>>>>> This is a permanennt behaviour of the ceph if the command is started.
>>>>> 
>>>>> dmesg shows:
>>>>> -------------
>>>>> [99048.567704] ------------[ cut here ]------------
>>>>> [99048.568767] kernel BUG at
>>>>> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384!
>>>>> [99048.568767] invalid opcode: 0000 [#1] SMP
>>>>> [99048.568767] last sysfs file:
>>>>> /sys/devices/pci0000:00/0000:00:00.0/device
>>>>> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc
>>>>> ceph
>>>>> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga
>>>>> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac
>>>>> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas
>>>>> usbhid mptsas mptscsih mptbase scsi_transport_sas
>>>>> [99048.596652]
>>>>> [99048.596652] Pid: 6258, comm: cosd Tainted: P
>>>>> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950
>>>>> [99048.596652] EIP: 0060:[<c026dc8d>] EFLAGS: 00210296 CPU: 3
>>>>> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>>>>> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000
>>>>> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14
>>>>> [99048.596652]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>>>>> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300
>>>>> task.ti=f5cca000)
>>>>> [99048.596652] Stack:
>>>>> [99048.596652]  00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c
>>>>> f6dd5494 02147fff
>>>>> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428
>>>>> f1058500 00000000
>>>>> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0
>>>>> f5ccbcb4 f5ccbc90
>>>>> [99048.596652] Call Trace:
>>>>> [99048.596652]  [<c026cc88>] ? read_block_bitmap+0x48/0x160
>>>>> [99048.596652]  [<c026e048>] ? ext3_new_blocks+0x228/0x6c0
>>>>> [99048.596652]  [<c024fbd7>] ? mb_cache_entry_find_first+0x67/0x80
>>>>> [99048.596652]  [<c026e505>] ? ext3_new_block+0x25/0x30
>>>>> [99048.596652]  [<c02809a4>] ? ext3_xattr_block_set+0x554/0x670
>>>>> [99048.596652]  [<c027f589>] ? ext3_xattr_set_entry+0x29/0x350
>>>>> [99048.596652]  [<c0280d8b>] ? ext3_xattr_set_handle+0x2cb/0x3e0
>>>>> [99048.596652]  [<c0280f15>] ? ext3_xattr_set+0x75/0xc0
>>>>> [99048.596652]  [<c0280fd6>] ? ext3_xattr_user_set+0x76/0x80
>>>>> [99048.596652]  [<c022dd8c>] ? generic_setxattr+0x9c/0xb0
>>>>> [99048.596652]  [<c022dcf0>] ? generic_setxattr+0x0/0xb0
>>>>> [99048.596652]  [<c022e984>] ? __vfs_setxattr_noperm+0x44/0x160
>>>>> [99048.596652]  [<c02fed4c>] ? cap_inode_setxattr+0x2c/0x60
>>>>> [99048.596652]  [<c022eb31>] ? vfs_setxattr+0x91/0xa0
>>>>> [99048.596652]  [<c022ebf8>] ? setxattr+0xb8/0x110
>>>>> [99048.596652]  [<c021d512>] ? __link_path_walk+0x632/0xca0
>>>>> [99048.596652]  [<c014e369>] ? enqueue_task_fair+0x39/0x80
>>>>> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
>>>>> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
>>>>> [99048.596652]  [<c021be45>] ? path_put+0x25/0x30
>>>>> [99048.596652]  [<c021ba8b>] ? putname+0x2b/0x40
>>>>> [99048.596652]  [<c021ea6a>] ? user_path_at+0x4a/0x80
>>>>> [99048.596652]  [<c0183242>] ? sys_futex+0x72/0x120
>>>>> [99048.596652]  [<c022ee13>] ? sys_setxattr+0x83/0x90
>>>>> [99048.596652]  [<c0109763>] ? sysenter_do_call+0x12/0x28
>>>>> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 
>>>>> 83
>>>>> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1
>>>>> ff<0f>
>>>>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53
>>>>> [99048.596652] EIP: [<c026dc8d>] 
>>>>> ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>>>>> SS:ESP 0068:f5ccbc14
>>>>> [99049.044090] ---[ end trace 35860103963ee444 ]---
>>>>> h1farm184#
>>>>> --------------------
>>>>> 
>>>>> my ceph.conf is:
>>>>> -------
>>>>> [global]
>>>>>        pid file = /var/run/ceph/$name.pid
>>>>>        debug ms = 1
>>>>>        keyring = /etc/ceph/keyring.bin
>>>>> ; monitors
>>>>> [mon]
>>>>>        ;Directory for monitor files
>>>>>        mon data = /x02/mon$id
>>>>>        debug mon = 20
>>>>>        debug paxos = 20
>>>>>        mon lease wiggle room = 0.5
>>>>> 
>>>>> [mon0]
>>>>>        host = h1farm182
>>>>>        mon addr = xxx.xxx.xx.116:6789
>>>>> [mon1]
>>>>>        host = h1farm183
>>>>>        mon addr = xxx.xxx.xx.117:6789
>>>>> ; metadata servers
>>>>> [mds]
>>>>>        debug mds = 20
>>>>>        mds log max segments = 2
>>>>>        keyring = /etc/ceph/keyring.$name
>>>>> [mds0]
>>>>>        host = h1farm182
>>>>> [mds1]
>>>>>        host = h1farm183
>>>>> [osd]
>>>>>        sudo = true
>>>>>        osd data = /x02/osd$id
>>>>>        osd journal = /x02/osd$id/journal
>>>>>        osd journal size = 100
>>>>>        keyring = /etc/ceph/keyring.$name
>>>>>        debug osd = 20
>>>>>        debug journal = 20
>>>>>        debug filestore = 20
>>>>>        ;osd journal size = 100
>>>>> [osd0]
>>>>>        host = h1farm182
>>>>> [osd1]
>>>>>        host = h1farm183
>>>>> [osd2]
>>>>>        host = h1farm184
>>>>> 
>>>>> -------
>>>>> 
>>>>> Any idea how to improve the situation ?
>>>>> 
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> 
>>>>> 
>>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>>> 
>> 
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-09-01 15:21         ` Bogdan Lobodzinski
@ 2010-09-01 19:29           ` Wido den Hollander
  2010-09-03 15:02             ` Bogdan Lobodzinski
  0 siblings, 1 reply; 27+ messages in thread
From: Wido den Hollander @ 2010-09-01 19:29 UTC (permalink / raw)
  To: Bogdan Lobodzinski; +Cc: Sage Weil, ceph-devel

Hi Bogdan,

Yes, you can place your journal on a file, that is no problem.

Performance wise you might want to use a block device (or partition) and
a other device then the one where your data is one.

Wido

On Wed, 2010-09-01 at 17:21 +0200, Bogdan Lobodzinski wrote:
> Hello Sage,
> 
> replacing ext3 by btrfs my ceph test-bed survived my test comand:
> svn co https://root.cern.ch/svn/root/trunk root
> 
> I didn't try ext4.
> 
> However, I did a few changes in my initial ceph.conf.
> Could you, please, check if such a configuration is reasonable ?
> Is it correct to use "osd journal" location as it is done below ?
> 
> My new ceph.conf:
> -----------
> [global]
>         pid file = /var/run/ceph/$name.pid
>         debug ms = 1
>         keyring = /etc/ceph/keyring.bin
> [mon]
>         mon data = /x01/mon$id
>         debug mon = 20
>         debug paxos = 20
>         mon lease wiggle room = 0.5
> [mon0]
>         host = h1farm182
>         mon addr = xxx.xxx.xxx.116:6789
> [mon1]
>         host = h1farm183
>         mon addr = xxx.xxx.xxx.117:6789
> [mds]
>         debug mds = 10
>         mds log max segments = 2
>         keyring = /etc/ceph/keyring.$name
> [mds0]
>         host = h1farm182
> [mds1]
>         host = h1farm183
> [osd]
>         sudo = true
>         keyring = /etc/ceph/keyring.$name
>         osd data = /x02/osd$id
>         osd journal = /x02/osd$id/journal
>         osd journal size = 100
>         debug osd = 20
>         debug journal = 20
>         debug filestore = 20
> [osd0]
>         host = h1farm183
>         btrfs devs = /dev/sdb1
> [osd1]
>         host = h1farm184
>         btrfs devs = /dev/sdb1
> -----------
> 
> Thank you for help,
> 
> Cheers,
> 
> Bogdan
> 
> 
> On Tue, 31 Aug 2010, Bogdan Lobodzinski wrote:
> 
> >
> > Hello Sage,
> >
> > On Mon, 30 Aug 2010, Sage Weil wrote:
> >
> >> On Mon, 30 Aug 2010, Bogdan Lobodzinski wrote:
> >>> 
> >>> Hello Sage,
> >>> 
> >>> I moved to the kernel 2.6.35, keeping ext3 filesystem.
> >>> After executing teh same command:
> >>> svn co https://root.cern.ch/svn/root/trunk root
> >>> 
> >>> System is again dead. The command and kjournald are stuck
> >>> bogdan  8539  0.9  0.6  31168 22040 pts/0  DL+  16:44  0:21 svn co
> >>> https://root.cern.ch/svn/root/trunk root
> >>> root    802   0.0  0.0      0     0 ?        D  12:59  0:01 [kjournald]
> >> 
> >> Hmm.  Have you tried ext4?
> >> 
> >> I stopped seeing this on my own machine with recent kernels, but it looks
> >> like it isn't in fact fixed.  This should be reported to the ext4 list.
> >> Are you running ceph via vstart.sh or a custom ceph.conf?
> > I am using vstart.sh taken from compiled by me source tarball 
> > ceph-0.21.tar.gz  (http://ceph.newdream.net/download/)
> > and the client from
> > git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client-standalone.git
> >
> > Cheers,
> >
> > Bogdan
> >
> >> 
> >> sage
> >> 
> >>> 
> >>> Looks like the bug is not fixed, dmesg shows:
> >>> ---------
> >>> [14325.304068] kernel BUG at
> >>> /build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385!
> >>> [14325.304191] invalid opcode: 0000 [#1] SMP
> >>> [14325.304263] last sysfs file: 
> >>> /sys/devices/pci0000:00/0000:00:00.0/device
> >>> [14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss 
> >>> sunrpc
> >>> ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse agpgart
> >>> i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas shpchp
> >>> serio_raw mptsas mptscsih mptbase scsi_transport_sas
> >>> [14325.304266]
> >>> [14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic
> >>> #20~lucid2-Ubuntu 0DT097/PowerEdge 1950
> >>> [14325.304266] EIP: 0060:[<c0274a4d>] EFLAGS: 00210286 CPU: 1
> >>> [14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> >>> [14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000
> >>> [14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10
> >>> [14325.304266]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> >>> [14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70
> >>> task.ti=f5822000)
> >>> [14325.304266] Stack:
> >>> [14325.304266]  000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007 
> >>> c8641454
> >>> 007b7fff
> >>> [14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6
> >>> c4063ec0 00000000
> >>> [14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8
> >>> f5823cac c0256017
> >>> [14325.304266] Call Trace:
> >>> [14325.304266]  [<c0273a58>] ? read_block_bitmap+0x48/0x160
> >>> [14325.304266]  [<c0274daf>] ? ext3_new_blocks+0x1ff/0x610
> >>> [14325.304266]  [<c0256017>] ? mb_cache_entry_find_first+0x67/0x80
> >>> [14325.304266]  [<c02751e5>] ? ext3_new_block+0x25/0x30
> >>> [14325.304266]  [<c0287721>] ? ext3_xattr_block_set+0x481/0x550
> >>> [14325.304266]  [<c0286490>] ? ext3_xattr_set_entry+0x20/0x2f0
> >>> [14325.304266]  [<c0287b0b>] ? ext3_xattr_set_handle+0x31b/0x400
> >>> [14325.304266]  [<c0287c65>] ? ext3_xattr_set+0x75/0xc0
> >>> [14325.304266]  [<c0287d24>] ? ext3_xattr_user_set+0x74/0x80
> >>> [14325.304266]  [<c023348b>] ? generic_setxattr+0x9b/0xb0
> >>> [14325.304266]  [<c02333f0>] ? generic_setxattr+0x0/0xb0
> >>> [14325.304266]  [<c0234084>] ? __vfs_setxattr_noperm+0x44/0x150
> >>> [14325.304266]  [<c03017dc>] ? cap_inode_setxattr+0x2c/0x60
> >>> [14325.304266]  [<c0234221>] ? vfs_setxattr+0x91/0xa0
> >>> [14325.304266]  [<c02342e8>] ? setxattr+0xb8/0x110
> >>> [14325.304266]  [<c0221d0e>] ? path_to_nameidata+0x1e/0x50
> >>> [14325.304266]  [<c0223492>] ? link_path_walk+0x412/0x890
> >>> [14325.304266]  [<c013a159>] ? enqueue_task_fair+0x39/0x80
> >>> [14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
> >>> [14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
> >>> [14325.304266]  [<c022168b>] ? putname+0x2b/0x40
> >>> [14325.304266]  [<c022470a>] ? user_path_at+0x4a/0x80
> >>> [14325.304266]  [<c0179902>] ? sys_futex+0x72/0x120
> >>> [14325.304266]  [<c0234503>] ? sys_setxattr+0x83/0x90
> >>> [14325.304266]  [<c05c9bb4>] ? syscall_call+0x7/0xb
> >>> [14325.304266]  [<c05c0000>] ? cache_add_dev+0x73/0x195
> >>> [14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 
> >>> 32
> >>> ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff 
> >>> <0f>
> >>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b
> >>> [14325.304266] EIP: [<c0274a4d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> >>> SS:ESP 0068:f5823c10
> >>> [14325.326777] ---[ end trace 53e0b3b55af7a83c ]---
> >>> [14384.001261] ceph: mds0 caps stale
> >>> [14413.616132] ceph:  tid 33594 timed out on osd2, will reset osd
> >>> [14628.992279] ceph: mds0 hung
> >>> ---------
> >>> 
> >>> as a next step I wil try to use btrfs .
> >>> 
> >>> Cheers,
> >>> 
> >>> Bogdan
> >>> 
> >>> 
> >>> On Fri, 27 Aug 2010, Sage Weil wrote:
> >>> 
> >>>> Hi Bogdan,
> >>>> 
> >>>> This is a bug in the ext3 xattr code.  It seems to be gone in 2.6.34 and
> >>>> later.  Or, you can switch to btrfs!
> >>>> 
> >>>> sage
> >>>> 
> >>>> 
> >>>> On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote:
> >>>> 
> >>>>> Hello,
> >>>>> 
> >>>>> working with ceph on my test configuration
> >>>>> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP)
> >>>>> After starting
> >>>>> svn co https://root.cern.ch/svn/root/trunk root
> >>>>> 
> >>>>> on the /ceph directory, the command become stuck, and also:
> >>>>> root      5303  0.0  0.0      0     0 ?        D    Aug26   0:00
> >>>>> [kjournald]
> >>>>> root     30181  0.0  0.0   6972  2056 pts/1    D+   13:46   0:00
> >>>>> /usr//bin/cosd
> >>>>> -i 2 -c /etc/ceph/ceph.conf
> >>>>> 
> >>>>> any mount, unmount are going also to the state D.
> >>>>> This is a permanennt behaviour of the ceph if the command is started.
> >>>>> 
> >>>>> dmesg shows:
> >>>>> -------------
> >>>>> [99048.567704] ------------[ cut here ]------------
> >>>>> [99048.568767] kernel BUG at
> >>>>> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384!
> >>>>> [99048.568767] invalid opcode: 0000 [#1] SMP
> >>>>> [99048.568767] last sysfs file:
> >>>>> /sys/devices/pci0000:00/0000:00:00.0/device
> >>>>> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc
> >>>>> ceph
> >>>>> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga
> >>>>> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac
> >>>>> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas
> >>>>> usbhid mptsas mptscsih mptbase scsi_transport_sas
> >>>>> [99048.596652]
> >>>>> [99048.596652] Pid: 6258, comm: cosd Tainted: P
> >>>>> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950
> >>>>> [99048.596652] EIP: 0060:[<c026dc8d>] EFLAGS: 00210296 CPU: 3
> >>>>> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> >>>>> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000
> >>>>> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14
> >>>>> [99048.596652]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> >>>>> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300
> >>>>> task.ti=f5cca000)
> >>>>> [99048.596652] Stack:
> >>>>> [99048.596652]  00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c
> >>>>> f6dd5494 02147fff
> >>>>> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428
> >>>>> f1058500 00000000
> >>>>> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0
> >>>>> f5ccbcb4 f5ccbc90
> >>>>> [99048.596652] Call Trace:
> >>>>> [99048.596652]  [<c026cc88>] ? read_block_bitmap+0x48/0x160
> >>>>> [99048.596652]  [<c026e048>] ? ext3_new_blocks+0x228/0x6c0
> >>>>> [99048.596652]  [<c024fbd7>] ? mb_cache_entry_find_first+0x67/0x80
> >>>>> [99048.596652]  [<c026e505>] ? ext3_new_block+0x25/0x30
> >>>>> [99048.596652]  [<c02809a4>] ? ext3_xattr_block_set+0x554/0x670
> >>>>> [99048.596652]  [<c027f589>] ? ext3_xattr_set_entry+0x29/0x350
> >>>>> [99048.596652]  [<c0280d8b>] ? ext3_xattr_set_handle+0x2cb/0x3e0
> >>>>> [99048.596652]  [<c0280f15>] ? ext3_xattr_set+0x75/0xc0
> >>>>> [99048.596652]  [<c0280fd6>] ? ext3_xattr_user_set+0x76/0x80
> >>>>> [99048.596652]  [<c022dd8c>] ? generic_setxattr+0x9c/0xb0
> >>>>> [99048.596652]  [<c022dcf0>] ? generic_setxattr+0x0/0xb0
> >>>>> [99048.596652]  [<c022e984>] ? __vfs_setxattr_noperm+0x44/0x160
> >>>>> [99048.596652]  [<c02fed4c>] ? cap_inode_setxattr+0x2c/0x60
> >>>>> [99048.596652]  [<c022eb31>] ? vfs_setxattr+0x91/0xa0
> >>>>> [99048.596652]  [<c022ebf8>] ? setxattr+0xb8/0x110
> >>>>> [99048.596652]  [<c021d512>] ? __link_path_walk+0x632/0xca0
> >>>>> [99048.596652]  [<c014e369>] ? enqueue_task_fair+0x39/0x80
> >>>>> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
> >>>>> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
> >>>>> [99048.596652]  [<c021be45>] ? path_put+0x25/0x30
> >>>>> [99048.596652]  [<c021ba8b>] ? putname+0x2b/0x40
> >>>>> [99048.596652]  [<c021ea6a>] ? user_path_at+0x4a/0x80
> >>>>> [99048.596652]  [<c0183242>] ? sys_futex+0x72/0x120
> >>>>> [99048.596652]  [<c022ee13>] ? sys_setxattr+0x83/0x90
> >>>>> [99048.596652]  [<c0109763>] ? sysenter_do_call+0x12/0x28
> >>>>> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 
> >>>>> 83
> >>>>> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1
> >>>>> ff<0f>
> >>>>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53
> >>>>> [99048.596652] EIP: [<c026dc8d>] 
> >>>>> ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
> >>>>> SS:ESP 0068:f5ccbc14
> >>>>> [99049.044090] ---[ end trace 35860103963ee444 ]---
> >>>>> h1farm184#
> >>>>> --------------------
> >>>>> 
> >>>>> my ceph.conf is:
> >>>>> -------
> >>>>> [global]
> >>>>>        pid file = /var/run/ceph/$name.pid
> >>>>>        debug ms = 1
> >>>>>        keyring = /etc/ceph/keyring.bin
> >>>>> ; monitors
> >>>>> [mon]
> >>>>>        ;Directory for monitor files
> >>>>>        mon data = /x02/mon$id
> >>>>>        debug mon = 20
> >>>>>        debug paxos = 20
> >>>>>        mon lease wiggle room = 0.5
> >>>>> 
> >>>>> [mon0]
> >>>>>        host = h1farm182
> >>>>>        mon addr = xxx.xxx.xx.116:6789
> >>>>> [mon1]
> >>>>>        host = h1farm183
> >>>>>        mon addr = xxx.xxx.xx.117:6789
> >>>>> ; metadata servers
> >>>>> [mds]
> >>>>>        debug mds = 20
> >>>>>        mds log max segments = 2
> >>>>>        keyring = /etc/ceph/keyring.$name
> >>>>> [mds0]
> >>>>>        host = h1farm182
> >>>>> [mds1]
> >>>>>        host = h1farm183
> >>>>> [osd]
> >>>>>        sudo = true
> >>>>>        osd data = /x02/osd$id
> >>>>>        osd journal = /x02/osd$id/journal
> >>>>>        osd journal size = 100
> >>>>>        keyring = /etc/ceph/keyring.$name
> >>>>>        debug osd = 20
> >>>>>        debug journal = 20
> >>>>>        debug filestore = 20
> >>>>>        ;osd journal size = 100
> >>>>> [osd0]
> >>>>>        host = h1farm182
> >>>>> [osd1]
> >>>>>        host = h1farm183
> >>>>> [osd2]
> >>>>>        host = h1farm184
> >>>>> 
> >>>>> -------
> >>>>> 
> >>>>> Any idea how to improve the situation ?
> >>>>> 
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>> 
> >>>>> 
> >>>> 
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> 
> >>> 
> >> 
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-09-01 19:29           ` Wido den Hollander
@ 2010-09-03 15:02             ` Bogdan Lobodzinski
  2010-09-03 17:10               ` Yehuda Sadeh Weinraub
  0 siblings, 1 reply; 27+ messages in thread
From: Bogdan Lobodzinski @ 2010-09-03 15:02 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: Sage Weil, ceph-devel


Hello all,

let me continue my troubles, the title can stay the same.
As I wrote, my ceph configuration survived my critical test
svn co https://root.cern.ch/svn/root/trunk root
and suddenly, during the night, at 5 oclock ceph became stuck again - 
without any kind of user activity, no work at all with /ceph directory.
The node is running as
mds1, mon1, osd0

System log file reports (the problem starts with entry:
"Sep  2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale" ):
--------
Sep  1 12:40:38 h1farm183 kernel: [10983.398458] Btrfs loaded
Sep  1 12:44:25 h1farm183 kernel: [11210.109913] ceph: loaded (mon/mds/osd proto 15/32/24, osdmap 5/5 5/5)
Sep  1 13:08:25 h1farm183 kernel: [12650.255052] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
1
Sep  1 14:25:06 h1farm183 kernel: [17251.100851] RPC: Registered udp transport module.
Sep  1 14:25:06 h1farm183 kernel: [17251.100854] RPC: Registered tcp transport module.
Sep  1 14:25:06 h1farm183 kernel: [17251.100855] RPC: Registered tcp NFSv4.1 backchannel transport module.
Sep  1 14:25:20 h1farm183 kernel: [17265.404967] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
1
Sep  1 14:25:20 h1farm183 kernel: [17265.562870] udev: starting version 151
Sep  1 14:25:26 h1farm183 kernel: [17271.752817] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
1
...
Sep  1 16:41:51 h1farm183 kernel: [25456.385184] device fsid 4940eafa1c110ce7-c14b44192348589f devid 1 transid 12 /dev/sdb1
Sep  1 16:42:21 h1farm183 kernel: [25486.297025] ceph: client4100 fsid 4ea08089-acf1-b738-6f72-96c3ed029b71
Sep  1 16:42:21 h1farm183 kernel: [25486.297169] ceph: mon0 131.169.74.116:6789 session established
Sep  2 02:37:54 h1farm183 rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="863" x-info="http://www.rsyslog.com"] rsyslogd was 
HUPed, type 'lightweight'.
Sep  2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale
Sep  2 05:44:57 h1farm183 kernel: [72441.976037] ceph: mds0 caps stale
Sep  2 05:45:27 h1farm183 kernel: [72472.066320] ceph: mds0 reconnect start
Sep  2 05:45:27 h1farm183 kernel: [72472.069681] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph btrfs zlib_deflate crc32c 
libcrc32c ppdev lp parport openafs(P) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT 
xt_tcpudp iptable_filter ip_tables x_tables bridge stp fbcon tileblit font bitblit softcursor vga16fb vgastate radeon ttm mptctl 
drm_kms_helper bnx2 drm usbhid i5000_edac hid dell_wmi shpchp edac_core agpgart i2c_algo_bit i5k_amb dcdbas psmouse serio_raw mptsas 
mptscsih mptbase scsi_transport_sas [last unloaded: kvm]
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]
Sep  2 05:45:27 h1farm183 kernel: [72472.072332] Pid: 6184, comm: ceph-msgr/1 Tainted: P           (2.6.32-24-generic-pae #42-Ubuntu) 
PowerEdge 1950
Sep  2 05:45:27 h1farm183 kernel: [72472.072332] EIP: 0060:[<c01ea907>] EFLAGS: 00010246 CPU: 1
Sep  2 05:45:27 h1farm183 kernel: [72472.072332] EIP is at kunmap_high+0x97/0xa0
Sep  2 05:45:27 h1farm183 kernel: [72472.072332] EAX: 00000000 EBX: f5d17000 ECX: c0916848 EDX: 00000292
Sep  2 05:45:27 h1farm183 kernel: [72472.072332] ESI: c17ee940 EDI: f5d18000 EBP: f5fb3c6c ESP: f5fb3c64
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  c07d9280 f50b10a0 f5fb3c74 c0138307 f5fb3c98 f9ad7d54 00000000 f5fb3cbc
Sep  2 05:45:27 h1farm183 kernel: [72472.072332] <0> 00000038 0000002b eaee1018 ee4bcd70 00000000 f5fb3d14 f9ada09d 00000000
Sep  2 05:45:27 h1farm183 kernel: [72472.072332] <0> eaee108c 0000005c f60bab40 eaee0e00 ee788440 f50b10a0 00000a21 00000000
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c0138307>] ? kunmap+0x57/0x60
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ad7d54>] ? ceph_pagelist_append+0x54/0x110 [ceph]
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ada09d>] ? encode_caps_cb+0x16d/0x1f0 [ceph]
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ad89e0>] ? iterate_session_caps+0xa0/0x170 [ceph]
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ad9f30>] ? encode_caps_cb+0x0/0x1f0 [ceph]
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9adb46f>] ? send_mds_reconnect+0x23f/0x3b0 [ceph]
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9adb804>] ? ceph_mdsc_handle_map+0x224/0x380 [ceph]
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9addd9e>] ? dispatch+0x8e/0x430 [ceph]
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ad7776>] ? con_work+0x1cf6/0x1ed0 [ceph]
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c010807d>] ? __switch_to+0xcd/0x180
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c0146d83>] ? finish_task_switch+0x43/0xc0
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c05b10dc>] ? schedule+0x44c/0x840
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016bbce>] ? run_workqueue+0x8e/0x150
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ad5a80>] ? con_work+0x0/0x1ed0 [ceph]
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016bd14>] ? worker_thread+0x84/0xe0
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016fc70>] ? autoremove_wake_function+0x0/0x50
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016bc90>] ? worker_thread+0x0/0xe0
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016f9e4>] ? kthread+0x74/0x80
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016f970>] ? kthread+0x0/0x80
Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c010a4e7>] ? kernel_thread_helper+0x7/0x10
Sep  2 05:45:27 h1farm183 kernel: [72472.304298] ---[ end trace 47e346731d47774d ]---
---

my mds1.log from the node shows:
--------
10.09.02_05:45:15.001538 b5168b70 mds-1.0 beacon_send up:standby seq 11751 
(currently up:standby)
10.09.02_05:45:15.001555 b5168b70 -- 131.169.74.117:6800/3679 --> mon1 
131.169.74.117:6789/0 -- mdsbeacon(4099/1 up:stand
10.09.02_05:45:19.001663 b5168b70 mds-1.0 beacon_send up:standby seq 11752 
(currently up:standby)
10.09.02_05:45:19.001681 b5168b70 -- 131.169.74.117:6800/3679 --> mon1 
131.169.74.117:6789/0 -- mdsbeacon(4099/1 up:stand
10.09.02_05:45:19.128037 b5168b70 mds-1.0  last tick was 80.001470 > 5 
seconds ago, laggy_until 0.000000, setting laggy f
10.09.02_05:45:19.795620 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 
131.169.74.117:6789/0 12055 ==== mdsmap(e 6) v1 ==
10.09.02_05:45:19.795669 b636cb70 mds-1.0 handle_mds_map epoch 6 from mon1
10.09.02_05:45:19.795697 b636cb70 mds-1.0      my compat 
compat={},rocompat={},incompat={1=base v0.20}
10.09.02_05:45:19.795708 b636cb70 mds-1.0  mdsmap compat 
compat={},rocompat={},incompat={1=base v0.20}
10.09.02_05:45:19.795715 b636cb70 mds0.0 map says i am 
131.169.74.117:6800/3679 mds0 state up:replay
10.09.02_05:45:19.795803 b636cb70 mds0.2 handle_mds_map i am now mds0.2
10.09.02_05:45:19.795812 b636cb70 mds0.2 handle_mds_map state change 
up:standby --> up:replay
10.09.02_05:45:19.795818 b636cb70 mds0.2 replay_start
10.09.02_05:45:19.795825 b636cb70 mds0.2 now replay.  my recovery peers 
are
10.09.02_05:45:19.795835 b636cb70 mds0.cache set_recovery_set
10.09.02_05:45:19.795856 b636cb70 mds0.2 boot_start 1: opening inotable
10.09.02_05:45:19.795866 b636cb70 mds0.inotable: load
10.09.02_05:45:19.795912 b636cb70 -- 131.169.74.117:6800/3679 --> mon1 
131.169.74.117:6789/0 -- mon_subscribe({mdsmap=7+,
10.09.02_05:45:19.795940 b636cb70 mds0.2 boot_start 1: opening sessionmap
10.09.02_05:45:19.795951 b636cb70 mds0.sessionmap load
10.09.02_05:45:19.795975 b636cb70 mds0.2 boot_start 1: opening anchor 
table
10.09.02_05:45:19.795982 b636cb70 mds0.anchortable: load
10.09.02_05:45:19.795998 b636cb70 mds0.2 boot_start 1: opening snap table
10.09.02_05:45:19.796015 b636cb70 mds0.snaptable: load
10.09.02_05:45:19.796030 b636cb70 mds0.2 boot_start 1: opening mds log
10.09.02_05:45:19.796041 b636cb70 mds0.log open discovering log bounds
10.09.02_05:45:19.796082 b636cb70 mds0.cache handle_mds_failure mds0
10.09.02_05:45:19.796093 b636cb70 mds0.cache handle_mds_failure mds0 : 
recovery peers are
10.09.02_05:45:19.796101 b636cb70 mds0.cache  wants_resolve
10.09.02_05:45:19.796107 b636cb70 mds0.cache  got_resolve
10.09.02_05:45:19.796112 b636cb70 mds0.cache  rejoin_sent
10.09.02_05:45:19.796117 b636cb70 mds0.cache  rejoin_gather
10.09.02_05:45:19.796123 b636cb70 mds0.cache  rejoin_ack_gather
10.09.02_05:45:19.796133 b636cb70 mds0.migrator handle_mds_failure_or_stop 
mds0
10.09.02_05:45:19.796164 b636cb70 mds0.cache show_subtrees - no subtrees
10.09.02_05:45:19.796177 b636cb70 mds0.bal check_targets have  need  want
10.09.02_05:45:19.796195 b636cb70 mds0.bal rebalance done
10.09.02_05:45:19.796201 b636cb70 mds0.cache show_subtrees - no subtrees
10.09.02_05:45:19.798127 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 
131.169.74.117:6789/0 12056 ==== osd_map(1,5) v1 =
10.09.02_05:45:19.798152 b636cb70 mds0.2 laggy, deferring osd_map(1,5) v1
10.09.02_05:45:19.798165 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 
131.169.74.117:6789/0 12057 ==== mon_subscribe_ack
10.09.02_05:45:19.984913 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 
131.169.74.117:6789/0 12058 ==== mdsbeacon(4099/1
10.09.02_05:45:19.984951 b636cb70 mds0.2 handle_mds_beacon up:boot seq 2 
dne
10.09.02_05:45:19.985185 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 
131.169.74.117:6789/0 12059 ==== mdsbeacon(4099/1
10.09.02_05:45:19.985210 b636cb70 mds0.2 handle_mds_beacon up:standby seq 
11730 rtt 88.986215
10.09.02_05:45:19.985245 b5168b70 mds0.2 beacon_kill last_acked_stamp 
10.09.02_05:43:50.998994, setting laggy flag.
10.09.02_05:45:19.985293 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 
131.169.74.117:6789/0 12060 ==== mdsbeacon(4099/1
10.09.02_05:45:19.985320 b636cb70 mds0.2 handle_mds_beacon up:standby seq 
11731 rtt 84.986197

--------

The node was stuck at all.
Do you know what can be a reason ?
Any hint how to change the configuration are welcome

Cheers,

Bogdan





On Wed, 1 Sep 2010, Wido den Hollander wrote:

> Hi Bogdan,
>
> Yes, you can place your journal on a file, that is no problem.
>
> Performance wise you might want to use a block device (or partition) and
> a other device then the one where your data is one.
>
> Wido
>
> On Wed, 2010-09-01 at 17:21 +0200, Bogdan Lobodzinski wrote:
>> Hello Sage,
>>
>> replacing ext3 by btrfs my ceph test-bed survived my test comand:
>> svn co https://root.cern.ch/svn/root/trunk root
>>
>> I didn't try ext4.
>>
>> However, I did a few changes in my initial ceph.conf.
>> Could you, please, check if such a configuration is reasonable ?
>> Is it correct to use "osd journal" location as it is done below ?
>>
>> My new ceph.conf:
>> -----------
>> [global]
>>         pid file = /var/run/ceph/$name.pid
>>         debug ms = 1
>>         keyring = /etc/ceph/keyring.bin
>> [mon]
>>         mon data = /x01/mon$id
>>         debug mon = 20
>>         debug paxos = 20
>>         mon lease wiggle room = 0.5
>> [mon0]
>>         host = h1farm182
>>         mon addr = xxx.xxx.xxx.116:6789
>> [mon1]
>>         host = h1farm183
>>         mon addr = xxx.xxx.xxx.117:6789
>> [mds]
>>         debug mds = 10
>>         mds log max segments = 2
>>         keyring = /etc/ceph/keyring.$name
>> [mds0]
>>         host = h1farm182
>> [mds1]
>>         host = h1farm183
>> [osd]
>>         sudo = true
>>         keyring = /etc/ceph/keyring.$name
>>         osd data = /x02/osd$id
>>         osd journal = /x02/osd$id/journal
>>         osd journal size = 100
>>         debug osd = 20
>>         debug journal = 20
>>         debug filestore = 20
>> [osd0]
>>         host = h1farm183
>>         btrfs devs = /dev/sdb1
>> [osd1]
>>         host = h1farm184
>>         btrfs devs = /dev/sdb1
>> -----------
>>
>> Thank you for help,
>>
>> Cheers,
>>
>> Bogdan
>>
>>
>> On Tue, 31 Aug 2010, Bogdan Lobodzinski wrote:
>>
>>>
>>> Hello Sage,
>>>
>>> On Mon, 30 Aug 2010, Sage Weil wrote:
>>>
>>>> On Mon, 30 Aug 2010, Bogdan Lobodzinski wrote:
>>>>>
>>>>> Hello Sage,
>>>>>
>>>>> I moved to the kernel 2.6.35, keeping ext3 filesystem.
>>>>> After executing teh same command:
>>>>> svn co https://root.cern.ch/svn/root/trunk root
>>>>>
>>>>> System is again dead. The command and kjournald are stuck
>>>>> bogdan  8539  0.9  0.6  31168 22040 pts/0  DL+  16:44  0:21 svn co
>>>>> https://root.cern.ch/svn/root/trunk root
>>>>> root    802   0.0  0.0      0     0 ?        D  12:59  0:01 [kjournald]
>>>>
>>>> Hmm.  Have you tried ext4?
>>>>
>>>> I stopped seeing this on my own machine with recent kernels, but it looks
>>>> like it isn't in fact fixed.  This should be reported to the ext4 list.
>>>> Are you running ceph via vstart.sh or a custom ceph.conf?
>>> I am using vstart.sh taken from compiled by me source tarball
>>> ceph-0.21.tar.gz  (http://ceph.newdream.net/download/)
>>> and the client from
>>> git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client-standalone.git
>>>
>>> Cheers,
>>>
>>> Bogdan
>>>
>>>>
>>>> sage
>>>>
>>>>>
>>>>> Looks like the bug is not fixed, dmesg shows:
>>>>> ---------
>>>>> [14325.304068] kernel BUG at
>>>>> /build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385!
>>>>> [14325.304191] invalid opcode: 0000 [#1] SMP
>>>>> [14325.304263] last sysfs file:
>>>>> /sys/devices/pci0000:00/0000:00:00.0/device
>>>>> [14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss
>>>>> sunrpc
>>>>> ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse agpgart
>>>>> i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas shpchp
>>>>> serio_raw mptsas mptscsih mptbase scsi_transport_sas
>>>>> [14325.304266]
>>>>> [14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic
>>>>> #20~lucid2-Ubuntu 0DT097/PowerEdge 1950
>>>>> [14325.304266] EIP: 0060:[<c0274a4d>] EFLAGS: 00210286 CPU: 1
>>>>> [14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>>>>> [14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000
>>>>> [14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10
>>>>> [14325.304266]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>>>>> [14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70
>>>>> task.ti=f5822000)
>>>>> [14325.304266] Stack:
>>>>> [14325.304266]  000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007
>>>>> c8641454
>>>>> 007b7fff
>>>>> [14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6
>>>>> c4063ec0 00000000
>>>>> [14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8
>>>>> f5823cac c0256017
>>>>> [14325.304266] Call Trace:
>>>>> [14325.304266]  [<c0273a58>] ? read_block_bitmap+0x48/0x160
>>>>> [14325.304266]  [<c0274daf>] ? ext3_new_blocks+0x1ff/0x610
>>>>> [14325.304266]  [<c0256017>] ? mb_cache_entry_find_first+0x67/0x80
>>>>> [14325.304266]  [<c02751e5>] ? ext3_new_block+0x25/0x30
>>>>> [14325.304266]  [<c0287721>] ? ext3_xattr_block_set+0x481/0x550
>>>>> [14325.304266]  [<c0286490>] ? ext3_xattr_set_entry+0x20/0x2f0
>>>>> [14325.304266]  [<c0287b0b>] ? ext3_xattr_set_handle+0x31b/0x400
>>>>> [14325.304266]  [<c0287c65>] ? ext3_xattr_set+0x75/0xc0
>>>>> [14325.304266]  [<c0287d24>] ? ext3_xattr_user_set+0x74/0x80
>>>>> [14325.304266]  [<c023348b>] ? generic_setxattr+0x9b/0xb0
>>>>> [14325.304266]  [<c02333f0>] ? generic_setxattr+0x0/0xb0
>>>>> [14325.304266]  [<c0234084>] ? __vfs_setxattr_noperm+0x44/0x150
>>>>> [14325.304266]  [<c03017dc>] ? cap_inode_setxattr+0x2c/0x60
>>>>> [14325.304266]  [<c0234221>] ? vfs_setxattr+0x91/0xa0
>>>>> [14325.304266]  [<c02342e8>] ? setxattr+0xb8/0x110
>>>>> [14325.304266]  [<c0221d0e>] ? path_to_nameidata+0x1e/0x50
>>>>> [14325.304266]  [<c0223492>] ? link_path_walk+0x412/0x890
>>>>> [14325.304266]  [<c013a159>] ? enqueue_task_fair+0x39/0x80
>>>>> [14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
>>>>> [14325.304266]  [<c022ff3f>] ? mntput_no_expire+0x1f/0xd0
>>>>> [14325.304266]  [<c022168b>] ? putname+0x2b/0x40
>>>>> [14325.304266]  [<c022470a>] ? user_path_at+0x4a/0x80
>>>>> [14325.304266]  [<c0179902>] ? sys_futex+0x72/0x120
>>>>> [14325.304266]  [<c0234503>] ? sys_setxattr+0x83/0x90
>>>>> [14325.304266]  [<c05c9bb4>] ? syscall_call+0x7/0xb
>>>>> [14325.304266]  [<c05c0000>] ? cache_add_dev+0x73/0x195
>>>>> [14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83
>>>>> 32
>>>>> ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff
>>>>> <0f>
>>>>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b
>>>>> [14325.304266] EIP: [<c0274a4d>] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>>>>> SS:ESP 0068:f5823c10
>>>>> [14325.326777] ---[ end trace 53e0b3b55af7a83c ]---
>>>>> [14384.001261] ceph: mds0 caps stale
>>>>> [14413.616132] ceph:  tid 33594 timed out on osd2, will reset osd
>>>>> [14628.992279] ceph: mds0 hung
>>>>> ---------
>>>>>
>>>>> as a next step I wil try to use btrfs .
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Bogdan
>>>>>
>>>>>
>>>>> On Fri, 27 Aug 2010, Sage Weil wrote:
>>>>>
>>>>>> Hi Bogdan,
>>>>>>
>>>>>> This is a bug in the ext3 xattr code.  It seems to be gone in 2.6.34 and
>>>>>> later.  Or, you can switch to btrfs!
>>>>>>
>>>>>> sage
>>>>>>
>>>>>>
>>>>>> On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> working with ceph on my test configuration
>>>>>>> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP)
>>>>>>> After starting
>>>>>>> svn co https://root.cern.ch/svn/root/trunk root
>>>>>>>
>>>>>>> on the /ceph directory, the command become stuck, and also:
>>>>>>> root      5303  0.0  0.0      0     0 ?        D    Aug26   0:00
>>>>>>> [kjournald]
>>>>>>> root     30181  0.0  0.0   6972  2056 pts/1    D+   13:46   0:00
>>>>>>> /usr//bin/cosd
>>>>>>> -i 2 -c /etc/ceph/ceph.conf
>>>>>>>
>>>>>>> any mount, unmount are going also to the state D.
>>>>>>> This is a permanennt behaviour of the ceph if the command is started.
>>>>>>>
>>>>>>> dmesg shows:
>>>>>>> -------------
>>>>>>> [99048.567704] ------------[ cut here ]------------
>>>>>>> [99048.568767] kernel BUG at
>>>>>>> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384!
>>>>>>> [99048.568767] invalid opcode: 0000 [#1] SMP
>>>>>>> [99048.568767] last sysfs file:
>>>>>>> /sys/devices/pci0000:00/0000:00:00.0/device
>>>>>>> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc
>>>>>>> ceph
>>>>>>> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga
>>>>>>> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac
>>>>>>> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas
>>>>>>> usbhid mptsas mptscsih mptbase scsi_transport_sas
>>>>>>> [99048.596652]
>>>>>>> [99048.596652] Pid: 6258, comm: cosd Tainted: P
>>>>>>> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950
>>>>>>> [99048.596652] EIP: 0060:[<c026dc8d>] EFLAGS: 00210296 CPU: 3
>>>>>>> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>>>>>>> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000
>>>>>>> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14
>>>>>>> [99048.596652]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>>>>>>> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300
>>>>>>> task.ti=f5cca000)
>>>>>>> [99048.596652] Stack:
>>>>>>> [99048.596652]  00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c
>>>>>>> f6dd5494 02147fff
>>>>>>> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428
>>>>>>> f1058500 00000000
>>>>>>> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0
>>>>>>> f5ccbcb4 f5ccbc90
>>>>>>> [99048.596652] Call Trace:
>>>>>>> [99048.596652]  [<c026cc88>] ? read_block_bitmap+0x48/0x160
>>>>>>> [99048.596652]  [<c026e048>] ? ext3_new_blocks+0x228/0x6c0
>>>>>>> [99048.596652]  [<c024fbd7>] ? mb_cache_entry_find_first+0x67/0x80
>>>>>>> [99048.596652]  [<c026e505>] ? ext3_new_block+0x25/0x30
>>>>>>> [99048.596652]  [<c02809a4>] ? ext3_xattr_block_set+0x554/0x670
>>>>>>> [99048.596652]  [<c027f589>] ? ext3_xattr_set_entry+0x29/0x350
>>>>>>> [99048.596652]  [<c0280d8b>] ? ext3_xattr_set_handle+0x2cb/0x3e0
>>>>>>> [99048.596652]  [<c0280f15>] ? ext3_xattr_set+0x75/0xc0
>>>>>>> [99048.596652]  [<c0280fd6>] ? ext3_xattr_user_set+0x76/0x80
>>>>>>> [99048.596652]  [<c022dd8c>] ? generic_setxattr+0x9c/0xb0
>>>>>>> [99048.596652]  [<c022dcf0>] ? generic_setxattr+0x0/0xb0
>>>>>>> [99048.596652]  [<c022e984>] ? __vfs_setxattr_noperm+0x44/0x160
>>>>>>> [99048.596652]  [<c02fed4c>] ? cap_inode_setxattr+0x2c/0x60
>>>>>>> [99048.596652]  [<c022eb31>] ? vfs_setxattr+0x91/0xa0
>>>>>>> [99048.596652]  [<c022ebf8>] ? setxattr+0xb8/0x110
>>>>>>> [99048.596652]  [<c021d512>] ? __link_path_walk+0x632/0xca0
>>>>>>> [99048.596652]  [<c014e369>] ? enqueue_task_fair+0x39/0x80
>>>>>>> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
>>>>>>> [99048.596652]  [<c022a9bf>] ? mntput_no_expire+0x1f/0xe0
>>>>>>> [99048.596652]  [<c021be45>] ? path_put+0x25/0x30
>>>>>>> [99048.596652]  [<c021ba8b>] ? putname+0x2b/0x40
>>>>>>> [99048.596652]  [<c021ea6a>] ? user_path_at+0x4a/0x80
>>>>>>> [99048.596652]  [<c0183242>] ? sys_futex+0x72/0x120
>>>>>>> [99048.596652]  [<c022ee13>] ? sys_setxattr+0x83/0x90
>>>>>>> [99048.596652]  [<c0109763>] ? sysenter_do_call+0x12/0x28
>>>>>>> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f
>>>>>>> 83
>>>>>>> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1
>>>>>>> ff<0f>
>>>>>>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53
>>>>>>> [99048.596652] EIP: [<c026dc8d>]
>>>>>>> ext3_try_to_allocate_with_rsv+0x1cd/0x2b0
>>>>>>> SS:ESP 0068:f5ccbc14
>>>>>>> [99049.044090] ---[ end trace 35860103963ee444 ]---
>>>>>>> h1farm184#
>>>>>>> --------------------
>>>>>>>
>>>>>>> my ceph.conf is:
>>>>>>> -------
>>>>>>> [global]
>>>>>>>        pid file = /var/run/ceph/$name.pid
>>>>>>>        debug ms = 1
>>>>>>>        keyring = /etc/ceph/keyring.bin
>>>>>>> ; monitors
>>>>>>> [mon]
>>>>>>>        ;Directory for monitor files
>>>>>>>        mon data = /x02/mon$id
>>>>>>>        debug mon = 20
>>>>>>>        debug paxos = 20
>>>>>>>        mon lease wiggle room = 0.5
>>>>>>>
>>>>>>> [mon0]
>>>>>>>        host = h1farm182
>>>>>>>        mon addr = xxx.xxx.xx.116:6789
>>>>>>> [mon1]
>>>>>>>        host = h1farm183
>>>>>>>        mon addr = xxx.xxx.xx.117:6789
>>>>>>> ; metadata servers
>>>>>>> [mds]
>>>>>>>        debug mds = 20
>>>>>>>        mds log max segments = 2
>>>>>>>        keyring = /etc/ceph/keyring.$name
>>>>>>> [mds0]
>>>>>>>        host = h1farm182
>>>>>>> [mds1]
>>>>>>>        host = h1farm183
>>>>>>> [osd]
>>>>>>>        sudo = true
>>>>>>>        osd data = /x02/osd$id
>>>>>>>        osd journal = /x02/osd$id/journal
>>>>>>>        osd journal size = 100
>>>>>>>        keyring = /etc/ceph/keyring.$name
>>>>>>>        debug osd = 20
>>>>>>>        debug journal = 20
>>>>>>>        debug filestore = 20
>>>>>>>        ;osd journal size = 100
>>>>>>> [osd0]
>>>>>>>        host = h1farm182
>>>>>>> [osd1]
>>>>>>>        host = h1farm183
>>>>>>> [osd2]
>>>>>>>        host = h1farm184
>>>>>>>
>>>>>>> -------
>>>>>>>
>>>>>>> Any idea how to improve the situation ?
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-09-03 15:02             ` Bogdan Lobodzinski
@ 2010-09-03 17:10               ` Yehuda Sadeh Weinraub
  2010-09-03 19:20                 ` Yehuda Sadeh Weinraub
  0 siblings, 1 reply; 27+ messages in thread
From: Yehuda Sadeh Weinraub @ 2010-09-03 17:10 UTC (permalink / raw)
  To: Bogdan Lobodzinski; +Cc: Wido den Hollander, Sage Weil, ceph-devel

On Fri, Sep 3, 2010 at 8:02 AM, Bogdan Lobodzinski <bogdan@mail.desy.de> wrote:
>
> Hello all,
>
> let me continue my troubles, the title can stay the same.
> As I wrote, my ceph configuration survived my critical test
> svn co https://root.cern.ch/svn/root/trunk root
> and suddenly, during the night, at 5 oclock ceph became stuck again - without any kind of user activity, no work at all with /ceph directory.
> The node is running as
> mds1, mon1, osd0
>
> System log file reports (the problem starts with entry:
> "Sep  2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale" ):
> --------
> Sep  1 12:40:38 h1farm183 kernel: [10983.398458] Btrfs loaded
> Sep  1 12:44:25 h1farm183 kernel: [11210.109913] ceph: loaded (mon/mds/osd proto 15/32/24, osdmap 5/5 5/5)
> Sep  1 13:08:25 h1farm183 kernel: [12650.255052] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
> 1
> Sep  1 14:25:06 h1farm183 kernel: [17251.100851] RPC: Registered udp transport module.
> Sep  1 14:25:06 h1farm183 kernel: [17251.100854] RPC: Registered tcp transport module.
> Sep  1 14:25:06 h1farm183 kernel: [17251.100855] RPC: Registered tcp NFSv4.1 backchannel transport module.
> Sep  1 14:25:20 h1farm183 kernel: [17265.404967] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
> 1
> Sep  1 14:25:20 h1farm183 kernel: [17265.562870] udev: starting version 151
> Sep  1 14:25:26 h1farm183 kernel: [17271.752817] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
> 1
> ...
> Sep  1 16:41:51 h1farm183 kernel: [25456.385184] device fsid 4940eafa1c110ce7-c14b44192348589f devid 1 transid 12 /dev/sdb1
> Sep  1 16:42:21 h1farm183 kernel: [25486.297025] ceph: client4100 fsid 4ea08089-acf1-b738-6f72-96c3ed029b71
> Sep  1 16:42:21 h1farm183 kernel: [25486.297169] ceph: mon0 131.169.74.116:6789 session established
> Sep  2 02:37:54 h1farm183 rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="863" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
> Sep  2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale
> Sep  2 05:44:57 h1farm183 kernel: [72441.976037] ceph: mds0 caps stale
> Sep  2 05:45:27 h1farm183 kernel: [72472.066320] ceph: mds0 reconnect start
> Sep  2 05:45:27 h1farm183 kernel: [72472.069681] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph btrfs zlib_deflate crc32c libcrc32c ppdev lp parport openafs(P) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp fbcon tileblit font bitblit softcursor vga16fb vgastate radeon ttm mptctl drm_kms_helper bnx2 drm usbhid i5000_edac hid dell_wmi shpchp edac_core agpgart i2c_algo_bit i5k_amb dcdbas psmouse serio_raw mptsas mptscsih mptbase scsi_transport_sas [last unloaded: kvm]
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] Pid: 6184, comm: ceph-msgr/1 Tainted: P           (2.6.32-24-generic-pae #42-Ubuntu) PowerEdge 1950
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] EIP: 0060:[<c01ea907>] EFLAGS: 00010246 CPU: 1
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] EIP is at kunmap_high+0x97/0xa0
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] EAX: 00000000 EBX: f5d17000 ECX: c0916848 EDX: 00000292
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] ESI: c17ee940 EDI: f5d18000 EBP: f5fb3c6c ESP: f5fb3c64
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  c07d9280 f50b10a0 f5fb3c74 c0138307 f5fb3c98 f9ad7d54 00000000 f5fb3cbc
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] <0> 00000038 0000002b eaee1018 ee4bcd70 00000000 f5fb3d14 f9ada09d 00000000
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] <0> eaee108c 0000005c f60bab40 eaee0e00 ee788440 f50b10a0 00000a21 00000000
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c0138307>] ? kunmap+0x57/0x60
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ad7d54>] ? ceph_pagelist_append+0x54/0x110 [ceph]
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ada09d>] ? encode_caps_cb+0x16d/0x1f0 [ceph]
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ad89e0>] ? iterate_session_caps+0xa0/0x170 [ceph]
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ad9f30>] ? encode_caps_cb+0x0/0x1f0 [ceph]
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9adb46f>] ? send_mds_reconnect+0x23f/0x3b0 [ceph]
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9adb804>] ? ceph_mdsc_handle_map+0x224/0x380 [ceph]
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9addd9e>] ? dispatch+0x8e/0x430 [ceph]
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ad7776>] ? con_work+0x1cf6/0x1ed0 [ceph]
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c010807d>] ? __switch_to+0xcd/0x180
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c0146d83>] ? finish_task_switch+0x43/0xc0
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c05b10dc>] ? schedule+0x44c/0x840
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016bbce>] ? run_workqueue+0x8e/0x150
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ad5a80>] ? con_work+0x0/0x1ed0 [ceph]
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016bd14>] ? worker_thread+0x84/0xe0
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016fc70>] ? autoremove_wake_function+0x0/0x50
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016bc90>] ? worker_thread+0x0/0xe0
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016f9e4>] ? kthread+0x74/0x80
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c016f970>] ? kthread+0x0/0x80
> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c010a4e7>] ? kernel_thread_helper+0x7/0x10
> Sep  2 05:45:27 h1farm183 kernel: [72472.304298] ---[ end trace 47e346731d47774d ]---

Is that all info? Missing some info about what triggered this trace.

> --------
>
> The node was stuck at all.
> Do you know what can be a reason ?

What client version are you using? This looks like a 32-bit related
client issue that we haven't hit, as we mostly run 64 bit. Does this
client have more than 4GB of memory? If not, you can try running it on
a non-pae kernel.

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-09-03 17:10               ` Yehuda Sadeh Weinraub
@ 2010-09-03 19:20                 ` Yehuda Sadeh Weinraub
  0 siblings, 0 replies; 27+ messages in thread
From: Yehuda Sadeh Weinraub @ 2010-09-03 19:20 UTC (permalink / raw)
  To: Bogdan Lobodzinski; +Cc: Wido den Hollander, Sage Weil, ceph-devel

> On Fri, Sep 3, 2010 at 8:02 AM, Bogdan Lobodzinski <bogdan@mail.desy.de> wrote:
>>
>> Hello all,
>>
>> let me continue my troubles, the title can stay the same.
>> As I wrote, my ceph configuration survived my critical test
>> svn co https://root.cern.ch/svn/root/trunk root
>> and suddenly, during the night, at 5 oclock ceph became stuck again - without any kind of user activity, no work at all with /ceph directory.
>> The node is running as
>> mds1, mon1, osd0
>>
>> System log file reports (the problem starts with entry:
>> "Sep  2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale" ):
>> --------
>> Sep  1 12:40:38 h1farm183 kernel: [10983.398458] Btrfs loaded
>> Sep  1 12:44:25 h1farm183 kernel: [11210.109913] ceph: loaded (mon/mds/osd proto 15/32/24, osdmap 5/5 5/5)
>> Sep  1 13:08:25 h1farm183 kernel: [12650.255052] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
>> 1
>> Sep  1 14:25:06 h1farm183 kernel: [17251.100851] RPC: Registered udp transport module.
>> Sep  1 14:25:06 h1farm183 kernel: [17251.100854] RPC: Registered tcp transport module.
>> Sep  1 14:25:06 h1farm183 kernel: [17251.100855] RPC: Registered tcp NFSv4.1 backchannel transport module.
>> Sep  1 14:25:20 h1farm183 kernel: [17265.404967] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
>> 1
>> Sep  1 14:25:20 h1farm183 kernel: [17265.562870] udev: starting version 151
>> Sep  1 14:25:26 h1farm183 kernel: [17271.752817] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb
>> 1
>> ...
>> Sep  1 16:41:51 h1farm183 kernel: [25456.385184] device fsid 4940eafa1c110ce7-c14b44192348589f devid 1 transid 12 /dev/sdb1
>> Sep  1 16:42:21 h1farm183 kernel: [25486.297025] ceph: client4100 fsid 4ea08089-acf1-b738-6f72-96c3ed029b71
>> Sep  1 16:42:21 h1farm183 kernel: [25486.297169] ceph: mon0 131.169.74.116:6789 session established
>> Sep  2 02:37:54 h1farm183 rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="863" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
>> Sep  2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale
>> Sep  2 05:44:57 h1farm183 kernel: [72441.976037] ceph: mds0 caps stale
>> Sep  2 05:45:27 h1farm183 kernel: [72472.066320] ceph: mds0 reconnect start
>> Sep  2 05:45:27 h1farm183 kernel: [72472.069681] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph btrfs zlib_deflate crc32c libcrc32c ppdev lp parport openafs(P) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp fbcon tileblit font bitblit softcursor vga16fb vgastate radeon ttm mptctl drm_kms_helper bnx2 drm usbhid i5000_edac hid dell_wmi shpchp edac_core agpgart i2c_algo_bit i5k_amb dcdbas psmouse serio_raw mptsas mptscsih mptbase scsi_transport_sas [last unloaded: kvm]
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] Pid: 6184, comm: ceph-msgr/1 Tainted: P           (2.6.32-24-generic-pae #42-Ubuntu) PowerEdge 1950
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] EIP: 0060:[<c01ea907>] EFLAGS: 00010246 CPU: 1
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] EIP is at kunmap_high+0x97/0xa0
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] EAX: 00000000 EBX: f5d17000 ECX: c0916848 EDX: 00000292
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] ESI: c17ee940 EDI: f5d18000 EBP: f5fb3c6c ESP: f5fb3c64
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  c07d9280 f50b10a0 f5fb3c74 c0138307 f5fb3c98 f9ad7d54 00000000 f5fb3cbc
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] <0> 00000038 0000002b eaee1018 ee4bcd70 00000000 f5fb3d14 f9ada09d 00000000
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332] <0> eaee108c 0000005c f60bab40 eaee0e00 ee788440 f50b10a0 00000a21 00000000
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<c0138307>] ? kunmap+0x57/0x60
>> Sep  2 05:45:27 h1farm183 kernel: [72472.072332]  [<f9ad7d54>] ? ceph_pagelist_append+0x54/0x110 [ceph]
...
>> The node was stuck at all.
>> Do you know what can be a reason ?

Maybe the following patch fixes it? I'll push a fix to the unstable
branch, let me know if it works for you.

Thanks,
Yehuda

diff --git a/fs/ceph/pagelist.c b/fs/ceph/pagelist.c
index b6859f4..46a368b 100644
--- a/fs/ceph/pagelist.c
+++ b/fs/ceph/pagelist.c
@@ -5,10 +5,18 @@

 #include "pagelist.h"

+static void ceph_pagelist_unmap_tail(struct ceph_pagelist *pl)
+{
+	struct page *page = list_entry(pl->head.prev, struct page,
+				       lru);
+	kunmap(page);
+}
+
 int ceph_pagelist_release(struct ceph_pagelist *pl)
 {
 	if (pl->mapped_tail)
-		kunmap(pl->mapped_tail);
+		ceph_pagelist_unmap_tail(pl);
+
 	while (!list_empty(&pl->head)) {
 		struct page *page = list_first_entry(&pl->head, struct page,
 						     lru);
@@ -26,7 +34,7 @@ static int ceph_pagelist_addpage(struct ceph_pagelist *pl)
 	pl->room += PAGE_SIZE;
 	list_add_tail(&page->lru, &pl->head);
 	if (pl->mapped_tail)
-		kunmap(pl->mapped_tail);
+		ceph_pagelist_unmap_tail(pl);
 	pl->mapped_tail = kmap(page);
 	return 0;
 }
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-24 18:43                     ` Sage Weil
  2010-02-24 23:21                       ` Talyansky, Roman
@ 2010-02-25 10:07                       ` Talyansky, Roman
  1 sibling, 0 replies; 27+ messages in thread
From: Talyansky, Roman @ 2010-02-25 10:07 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi Sage,

The file with the new traces can be found at https://sapmats-de.sap-ag.de/download/download.cgi?id=A90NAAZ5KZG8IXQP7Z9Z084IKX6HF47GE2OEEPT740RBRSGJNO

Thanks,

Roman

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Wednesday, February 24, 2010 8:44 PM
To: Talyansky, Roman
Cc: ceph-devel@lists.sourceforge.net
Subject: Re: [ceph-devel] Write operation is stuck

Hi Roman,

On Wed, 24 Feb 2010, Talyansky, Roman wrote:
> Hi Sage,
> 
> Besides the bug with the return value of write operation, the system hangs in a write operation.
> You can access the trace files of ceph servers at 
> https://sapmats-de.sap-ag.de/download/download.cgi?id=DHTK24DYMIH8MJEYGMO5GT1GKDLBR9IJDKFAA3R0C4D1JGTYVW

The .tar.gz appears to be corrupt (gunzip complains)... :(

> The traces were collected without "debug mds = 20" in the configuration 
> file. I still have the hang system available. I also run the systems to 
> regenerate the hang with "debug mds = 20" defined.
> 
> It would be great if we could have a chat to resolve the hang in about 4 
> hours.

Sounds good.  We should just be back from lunch, and will be in #ceph on 
irc.oftc.net, or on jabber!

sage

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-24 18:43                     ` Sage Weil
@ 2010-02-24 23:21                       ` Talyansky, Roman
  2010-02-25 10:07                       ` Talyansky, Roman
  1 sibling, 0 replies; 27+ messages in thread
From: Talyansky, Roman @ 2010-02-24 23:21 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi Sage,

That's right the file was corrupted. Currently I have extremely slow network. So probably I'll open access to the non-corrupted file tomorrow. 
Meanwhile the system with higher trace level also hangs and I'll be able to send you more informative traces. 

Thanks,
Roman

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Wednesday, February 24, 2010 8:44 PM
To: Talyansky, Roman
Cc: ceph-devel@lists.sourceforge.net
Subject: Re: [ceph-devel] Write operation is stuck

Hi Roman,

On Wed, 24 Feb 2010, Talyansky, Roman wrote:
> Hi Sage,
> 
> Besides the bug with the return value of write operation, the system hangs in a write operation.
> You can access the trace files of ceph servers at 
> https://sapmats-de.sap-ag.de/download/download.cgi?id=DHTK24DYMIH8MJEYGMO5GT1GKDLBR9IJDKFAA3R0C4D1JGTYVW

The .tar.gz appears to be corrupt (gunzip complains)... :(

> The traces were collected without "debug mds = 20" in the configuration 
> file. I still have the hang system available. I also run the systems to 
> regenerate the hang with "debug mds = 20" defined.
> 
> It would be great if we could have a chat to resolve the hang in about 4 
> hours.

Sounds good.  We should just be back from lunch, and will be in #ceph on 
irc.oftc.net, or on jabber!

sage

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-24 16:42                   ` Talyansky, Roman
@ 2010-02-24 18:43                     ` Sage Weil
  2010-02-24 23:21                       ` Talyansky, Roman
  2010-02-25 10:07                       ` Talyansky, Roman
  0 siblings, 2 replies; 27+ messages in thread
From: Sage Weil @ 2010-02-24 18:43 UTC (permalink / raw)
  To: Talyansky, Roman; +Cc: ceph-devel

Hi Roman,

On Wed, 24 Feb 2010, Talyansky, Roman wrote:
> Hi Sage,
> 
> Besides the bug with the return value of write operation, the system hangs in a write operation.
> You can access the trace files of ceph servers at 
> https://sapmats-de.sap-ag.de/download/download.cgi?id=DHTK24DYMIH8MJEYGMO5GT1GKDLBR9IJDKFAA3R0C4D1JGTYVW

The .tar.gz appears to be corrupt (gunzip complains)... :(

> The traces were collected without "debug mds = 20" in the configuration 
> file. I still have the hang system available. I also run the systems to 
> regenerate the hang with "debug mds = 20" defined.
> 
> It would be great if we could have a chat to resolve the hang in about 4 
> hours.

Sounds good.  We should just be back from lunch, and will be in #ceph on 
irc.oftc.net, or on jabber!

sage

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-24 14:56                 ` Sage Weil
@ 2010-02-24 16:42                   ` Talyansky, Roman
  2010-02-24 18:43                     ` Sage Weil
  0 siblings, 1 reply; 27+ messages in thread
From: Talyansky, Roman @ 2010-02-24 16:42 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi Sage,

Besides the bug with the return value of write operation, the system hangs in a write operation.
You can access the trace files of ceph servers at 
https://sapmats-de.sap-ag.de/download/download.cgi?id=DHTK24DYMIH8MJEYGMO5GT1GKDLBR9IJDKFAA3R0C4D1JGTYVW
 
The traces were collected without "debug mds = 20" in the configuration file. I still have the hang system available. I also run the systems to regenerate the hang with "debug mds = 20" defined. 

It would be great if we could have a chat to resolve the hang in about 4 hours.

Thanks,
Roman

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Wednesday, February 24, 2010 4:56 PM
To: Talyansky, Roman
Cc: Yehuda Sadeh Weinraub; ceph-devel@lists.sourceforge.net
Subject: Re: [ceph-devel] Write operation is stuck

On Wed, 24 Feb 2010, Talyansky, Roman wrote:

> Hi Yehuda,
> 
> Thanks for the info on the fix. I'll incorporate it into the code and rerun the experiments.
> It also seems that the code at that location became a bit more complex - new #if occurred:
> 
> #if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 32)
> 
> And consequently the code under #else should be fixed as well.

Yeah.  I pushed this fix (and another fix that comes up when there's >1 
mds) to the stable 'master' branch of ceph-client.git and 
ceph-client-standalone.git.  The 'master-backport' branch of 
ceph-client-standalone.git has the backport #ifdefs (and builds back to 
2.6.28 or so).

Thanks!
sage


> 
> Thanks,
> 
> Roman
> 
> From: Yehuda Sadeh Weinraub [mailto:yehudasa@gmail.com]
> Sent: Tuesday, February 23, 2010 8:11 PM
> To: Talyansky, Roman
> Cc: Sage Weil; ceph-devel@lists.sourceforge.net
> Subject: Re: [ceph-devel] Write operation is stuck
> 
> 
> On Tue, Feb 23, 2010 at 6:11 AM, Talyansky, Roman <roman.talyansky@sap.com<mailto:roman.talyansky@sap.com>> wrote:
> Hi Sage,
> 
> As you advised us, we switched to the release 0.19 of ceph and ran into another bug in the ceph client. When writing to a file with the O_SYNC flag,  "0" is always returned although the data is written to disk.
> This poses a problem in our benchmark which uses the return value as number of bytes written. Also it seems that such behavior infringes the POSIX write() contract.
> 
> Yeah, thanks. A fix was pushed to the unstable branch. We will probably start maintaining a stable version that will contain such fixes, but you can apply this in the mean time:
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 2c4ae44..88932c9 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -807,7 +807,7 @@ static ssize_t ceph_aio_write(struct kiocb *iocb, const struct iovec *iov,
>         struct ceph_osd_client *osdc = &ceph_client(inode->i_sb)->osdc;
>         loff_t endoff = pos + iov->iov_len;
>         int got = 0;
> -       int ret;
> +       int ret, err;
> 
>         if (ceph_snap(inode) != CEPH_NOSNAP)
>                 return -EROFS;
> @@ -838,9 +838,12 @@ retry_snap:
> 
>                 if ((ret >= 0 || ret == -EIOCBQUEUED) &&
>                     ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host)
> -                    || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL)))
> -                       ret = vfs_fsync_range(file, file->f_path.dentry,
> +                    || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) {
> +                       err = vfs_fsync_range(file, file->f_path.dentry,
>                                               pos, pos + ret - 1, 1);
> +                       if (err < 0)
> +                               ret = err;
> +               }
>         }
>         if (ret >= 0) {
>                 spin_lock(&inode->i_lock);
> 
> 
> 
> Yehuda
> 

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-24 13:34               ` Talyansky, Roman
@ 2010-02-24 14:56                 ` Sage Weil
  2010-02-24 16:42                   ` Talyansky, Roman
  0 siblings, 1 reply; 27+ messages in thread
From: Sage Weil @ 2010-02-24 14:56 UTC (permalink / raw)
  To: Talyansky, Roman; +Cc: ceph-devel

On Wed, 24 Feb 2010, Talyansky, Roman wrote:

> Hi Yehuda,
> 
> Thanks for the info on the fix. I'll incorporate it into the code and rerun the experiments.
> It also seems that the code at that location became a bit more complex - new #if occurred:
> 
> #if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 32)
> 
> And consequently the code under #else should be fixed as well.

Yeah.  I pushed this fix (and another fix that comes up when there's >1 
mds) to the stable 'master' branch of ceph-client.git and 
ceph-client-standalone.git.  The 'master-backport' branch of 
ceph-client-standalone.git has the backport #ifdefs (and builds back to 
2.6.28 or so).

Thanks!
sage


> 
> Thanks,
> 
> Roman
> 
> From: Yehuda Sadeh Weinraub [mailto:yehudasa@gmail.com]
> Sent: Tuesday, February 23, 2010 8:11 PM
> To: Talyansky, Roman
> Cc: Sage Weil; ceph-devel@lists.sourceforge.net
> Subject: Re: [ceph-devel] Write operation is stuck
> 
> 
> On Tue, Feb 23, 2010 at 6:11 AM, Talyansky, Roman <roman.talyansky@sap.com<mailto:roman.talyansky@sap.com>> wrote:
> Hi Sage,
> 
> As you advised us, we switched to the release 0.19 of ceph and ran into another bug in the ceph client. When writing to a file with the O_SYNC flag,  "0" is always returned although the data is written to disk.
> This poses a problem in our benchmark which uses the return value as number of bytes written. Also it seems that such behavior infringes the POSIX write() contract.
> 
> Yeah, thanks. A fix was pushed to the unstable branch. We will probably start maintaining a stable version that will contain such fixes, but you can apply this in the mean time:
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 2c4ae44..88932c9 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -807,7 +807,7 @@ static ssize_t ceph_aio_write(struct kiocb *iocb, const struct iovec *iov,
>         struct ceph_osd_client *osdc = &ceph_client(inode->i_sb)->osdc;
>         loff_t endoff = pos + iov->iov_len;
>         int got = 0;
> -       int ret;
> +       int ret, err;
> 
>         if (ceph_snap(inode) != CEPH_NOSNAP)
>                 return -EROFS;
> @@ -838,9 +838,12 @@ retry_snap:
> 
>                 if ((ret >= 0 || ret == -EIOCBQUEUED) &&
>                     ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host)
> -                    || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL)))
> -                       ret = vfs_fsync_range(file, file->f_path.dentry,
> +                    || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) {
> +                       err = vfs_fsync_range(file, file->f_path.dentry,
>                                               pos, pos + ret - 1, 1);
> +                       if (err < 0)
> +                               ret = err;
> +               }
>         }
>         if (ret >= 0) {
>                 spin_lock(&inode->i_lock);
> 
> 
> 
> Yehuda
> 

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-23 18:11             ` Yehuda Sadeh Weinraub
@ 2010-02-24 13:34               ` Talyansky, Roman
  2010-02-24 14:56                 ` Sage Weil
  0 siblings, 1 reply; 27+ messages in thread
From: Talyansky, Roman @ 2010-02-24 13:34 UTC (permalink / raw)
  To: Yehuda Sadeh Weinraub; +Cc: Sage Weil, ceph-devel


[-- Attachment #1.1: Type: text/plain, Size: 2486 bytes --]

Hi Yehuda,

Thanks for the info on the fix. I'll incorporate it into the code and rerun the experiments.
It also seems that the code at that location became a bit more complex - new #if occurred:

#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 32)

And consequently the code under #else should be fixed as well.

Thanks,

Roman

From: Yehuda Sadeh Weinraub [mailto:yehudasa@gmail.com]
Sent: Tuesday, February 23, 2010 8:11 PM
To: Talyansky, Roman
Cc: Sage Weil; ceph-devel@lists.sourceforge.net
Subject: Re: [ceph-devel] Write operation is stuck


On Tue, Feb 23, 2010 at 6:11 AM, Talyansky, Roman <roman.talyansky@sap.com<mailto:roman.talyansky@sap.com>> wrote:
Hi Sage,

As you advised us, we switched to the release 0.19 of ceph and ran into another bug in the ceph client. When writing to a file with the O_SYNC flag,  "0" is always returned although the data is written to disk.
This poses a problem in our benchmark which uses the return value as number of bytes written. Also it seems that such behavior infringes the POSIX write() contract.

Yeah, thanks. A fix was pushed to the unstable branch. We will probably start maintaining a stable version that will contain such fixes, but you can apply this in the mean time:

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 2c4ae44..88932c9 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -807,7 +807,7 @@ static ssize_t ceph_aio_write(struct kiocb *iocb, const struct iovec *iov,
        struct ceph_osd_client *osdc = &ceph_client(inode->i_sb)->osdc;
        loff_t endoff = pos + iov->iov_len;
        int got = 0;
-       int ret;
+       int ret, err;

        if (ceph_snap(inode) != CEPH_NOSNAP)
                return -EROFS;
@@ -838,9 +838,12 @@ retry_snap:

                if ((ret >= 0 || ret == -EIOCBQUEUED) &&
                    ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host)
-                    || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL)))
-                       ret = vfs_fsync_range(file, file->f_path.dentry,
+                    || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) {
+                       err = vfs_fsync_range(file, file->f_path.dentry,
                                              pos, pos + ret - 1, 1);
+                       if (err < 0)
+                               ret = err;
+               }
        }
        if (ret >= 0) {
                spin_lock(&inode->i_lock);



Yehuda

[-- Attachment #1.2: Type: text/html, Size: 9404 bytes --]

[-- Attachment #2: Type: text/plain, Size: 345 bytes --]

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

[-- Attachment #3: Type: text/plain, Size: 161 bytes --]

_______________________________________________
Ceph-devel mailing list
Ceph-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ceph-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-23 14:11           ` Talyansky, Roman
@ 2010-02-23 18:11             ` Yehuda Sadeh Weinraub
  2010-02-24 13:34               ` Talyansky, Roman
  0 siblings, 1 reply; 27+ messages in thread
From: Yehuda Sadeh Weinraub @ 2010-02-23 18:11 UTC (permalink / raw)
  To: Talyansky, Roman; +Cc: Sage Weil, ceph-devel


[-- Attachment #1.1: Type: text/plain, Size: 1864 bytes --]

On Tue, Feb 23, 2010 at 6:11 AM, Talyansky, Roman
<roman.talyansky@sap.com>wrote:

> Hi Sage,
>
> As you advised us, we switched to the release 0.19 of ceph and ran into
> another bug in the ceph client. When writing to a file with the O_SYNC flag,
>  "0" is always returned although the data is written to disk.
> This poses a problem in our benchmark which uses the return value as number
> of bytes written. Also it seems that such behavior infringes the POSIX
> write() contract.
>
>
Yeah, thanks. A fix was pushed to the unstable branch. We will probably
start maintaining a stable version that will contain such fixes, but you can
apply this in the mean time:

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 2c4ae44..88932c9 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -807,7 +807,7 @@ static ssize_t ceph_aio_write(struct kiocb *iocb, const
struct iovec *iov,
        struct ceph_osd_client *osdc = &ceph_client(inode->i_sb)->osdc;
        loff_t endoff = pos + iov->iov_len;
        int got = 0;
-       int ret;
+       int ret, err;

        if (ceph_snap(inode) != CEPH_NOSNAP)
                return -EROFS;
@@ -838,9 +838,12 @@ retry_snap:

                if ((ret >= 0 || ret == -EIOCBQUEUED) &&
                    ((file->f_flags & O_SYNC) ||
IS_SYNC(file->f_mapping->host)
-                    || ceph_osdmap_flag(osdc->osdmap,
CEPH_OSDMAP_NEARFULL)))
-                       ret = vfs_fsync_range(file, file->f_path.dentry,
+                    || ceph_osdmap_flag(osdc->osdmap,
CEPH_OSDMAP_NEARFULL))) {
+                       err = vfs_fsync_range(file, file->f_path.dentry,
                                              pos, pos + ret - 1, 1);
+                       if (err < 0)
+                               ret = err;
+               }
        }
        if (ret >= 0) {
                spin_lock(&inode->i_lock);



Yehuda

[-- Attachment #1.2: Type: text/html, Size: 2602 bytes --]

[-- Attachment #2: Type: text/plain, Size: 345 bytes --]

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

[-- Attachment #3: Type: text/plain, Size: 161 bytes --]

_______________________________________________
Ceph-devel mailing list
Ceph-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ceph-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-19 18:39         ` Sage Weil
@ 2010-02-23 14:11           ` Talyansky, Roman
  2010-02-23 18:11             ` Yehuda Sadeh Weinraub
  0 siblings, 1 reply; 27+ messages in thread
From: Talyansky, Roman @ 2010-02-23 14:11 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 3262 bytes --]

Hi Sage,

As you advised us, we switched to the release 0.19 of ceph and ran into another bug in the ceph client. When writing to a file with the O_SYNC flag,  "0" is always returned although the data is written to disk.
This poses a problem in our benchmark which uses the return value as number of bytes written. Also it seems that such behavior infringes the POSIX write() contract.

Attached is a small unit test in c++.
The unit test creates 2 files which are exactly the same, both filled randomly with numbers 0-9.
Afterwards the both files are closed.
Then one file is reopened and filled with 1's.

Running the test:
$ g++ temp.cc
$ ./a.out 100  (this is the number of bytes in the files)
Each time 0 is returned it is printed out on the screen.
Run the executable a.out from within a directory on a ceph file system.

After the program  finishes you will find 2 files:
./test  - filled with one's
./test.start - filled with random numeric data

If you run this test on NFS and ceph you will see that no errors are printed out on the NFS file system, and 100 errors are printed out on ceph.

Thanks,

Roman & Roman

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Friday, February 19, 2010 8:39 PM
To: Talyansky, Roman
Cc: ceph-devel@lists.sourceforge.net
Subject: Re: [ceph-devel] Write operation is stuck

Hi Roman,

On Fri, 19 Feb 2010, Talyansky, Roman wrote:
> Since I test several ceph versions simultaneously I could confuse the error checking at different nodes.
> I'll double check this and let you know.

Thanks.  If you haven't switched to the just-released 0.19, now might be 
the time to do that.

> > It also looks like the IO is synchronous, which may have something 
> > to do with your performance.  Are you mounting with -o sync or using 
> > direct IO, or are multiple clients reading and writing to the same file or 
> > something?
>
> The IO is indeed synchronous. However the performance under ceph is much 
> worse than even under nfs, which looks strange. I do not mount with -o 
> synch. And in our experiments multiple clients read and write the same 
> file.

If you are accessing the same file from multiple clients, then any 
comparison with nfs is going to be misleading.  NFS provides only close to 
open consistency, so IO will be buffered and inconsistent.  Ceph provides 
fully consistent semantics by switching to synchronous IO when there are 
multiple clients.  Ceph will be slower, but correct; nfs will be fast, but 
incorrect.

If your application is smart enough to handle it's own consistency (each 
client is writing to a different region of the file) then you probably 
want something along the lines of O_LAZY [1], so that the application can 
tell the FS not to worry about consistency and stick with buffered IO.  
Unfortunately O_LAZY doesn't exist in Linux at this point.  There is some 
preliminary support for it in Ceph... if that's what you're looking for, 
we can cook up some patches for you.

If you can find us in #ceph on irc.oftc.net that might be a quicker way to 
diagnose the performance problems with your workload.

Thanks!
sage

[1] http://www.pdl.cmu.edu/posix/docs/posix_lazy_io.pdf

[-- Attachment #2: temp.cc --]
[-- Type: application/octet-stream, Size: 1658 bytes --]

#include <sys/types.h>
#include <dirent.h>
#include <errno.h>
#include <vector>
#include <string>
#include <iostream>
#include <fstream>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
using namespace std;


#define BUF_LEN 100000
#define BEGIN_INPUT_F_NAME "test.start"
#define F_NAME "test"
int main(int argc, char** argv)
{

				switch(argc){
								case 2:
												cout<<"File size is "<<atoi(argv[1])<<endl;		
												break;	
								default:
												cerr<<"Usage: "<<argv[0]<<" size of file"<<endl;
												exit(1);
				}
				int fSize	= atoi(argv[1]);
				ofstream beginFile;
				ofstream workFile;
				beginFile.open(BEGIN_INPUT_F_NAME);
				workFile.open(F_NAME);

				int ran=0;
				for(int i=0;i<fSize;i++){
								ran=rand();
								ran=48+ran%10;
								beginFile<<(char)ran;
								workFile<<(char)ran;
				}
				beginFile.close();
				workFile.close();

				char buff[]={49};
				//Start filling files with ones
				//

				int flags = O_SYNC|O_RDWR;
				int fd = ::open(F_NAME, flags);
				if (fd <= 0) {
								cerr << " open problem with: " << F_NAME << endl;
				}

				for(int i = 0; i <fSize; i++){
								off_t res = ::lseek(fd, i, SEEK_SET);
								if (res != i) {
												cerr << "seek op failed res=" << res << " offset=" << i << endl;
								}
								res = ::write(fd,buff,1 );
								if (res != 1){
												cerr << "res=" << res << " write error=" << strerror(errno) << std::endl;
								}

				}

				int res_close = ::close(fd); 
				if (res_close == -1){
								cerr << "close error=" << strerror(errno) << std::endl;
				}

				exit(0);
}


[-- Attachment #3: Type: text/plain, Size: 345 bytes --]

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

[-- Attachment #4: Type: text/plain, Size: 161 bytes --]

_______________________________________________
Ceph-devel mailing list
Ceph-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ceph-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-19 15:40       ` Talyansky, Roman
@ 2010-02-19 18:39         ` Sage Weil
  2010-02-23 14:11           ` Talyansky, Roman
  0 siblings, 1 reply; 27+ messages in thread
From: Sage Weil @ 2010-02-19 18:39 UTC (permalink / raw)
  To: Talyansky, Roman; +Cc: ceph-devel

Hi Roman,

On Fri, 19 Feb 2010, Talyansky, Roman wrote:
> Since I test several ceph versions simultaneously I could confuse the error checking at different nodes.
> I'll double check this and let you know.

Thanks.  If you haven't switched to the just-released 0.19, now might be 
the time to do that.

> > It also looks like the IO is synchronous, which may have something 
> > to do with your performance.  Are you mounting with -o sync or using 
> > direct IO, or are multiple clients reading and writing to the same file or 
> > something?
>
> The IO is indeed synchronous. However the performance under ceph is much 
> worse than even under nfs, which looks strange. I do not mount with -o 
> synch. And in our experiments multiple clients read and write the same 
> file.

If you are accessing the same file from multiple clients, then any 
comparison with nfs is going to be misleading.  NFS provides only close to 
open consistency, so IO will be buffered and inconsistent.  Ceph provides 
fully consistent semantics by switching to synchronous IO when there are 
multiple clients.  Ceph will be slower, but correct; nfs will be fast, but 
incorrect.

If your application is smart enough to handle it's own consistency (each 
client is writing to a different region of the file) then you probably 
want something along the lines of O_LAZY [1], so that the application can 
tell the FS not to worry about consistency and stick with buffered IO.  
Unfortunately O_LAZY doesn't exist in Linux at this point.  There is some 
preliminary support for it in Ceph... if that's what you're looking for, 
we can cook up some patches for you.

If you can find us in #ceph on irc.oftc.net that might be a quicker way to 
diagnose the performance problems with your workload.

Thanks!
sage

[1] http://www.pdl.cmu.edu/posix/docs/posix_lazy_io.pdf

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-16 18:35     ` Sage Weil
@ 2010-02-19 15:40       ` Talyansky, Roman
  2010-02-19 18:39         ` Sage Weil
  0 siblings, 1 reply; 27+ messages in thread
From: Talyansky, Roman @ 2010-02-19 15:40 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi Sage,

Thanks for the answer.

> It looks like dmesg shows it trying to connect to the monitor at .70, but you tested .83?
Since I test several ceph versions simultaneously I could confuse the error checking at different nodes.
I'll double check this and let you know.

> It also looks like the IO is synchronous, which may have something 
> to do with your performance.  Are you mounting with -o sync or using 
> direct IO, or are multiple clients reading and writing to the same file or 
> something?
The IO is indeed synchronous. However the performance under ceph is much worse than even under nfs, which looks strange. I do not mount with -o synch. And in our experiments multiple clients read and write the same file.

Thanks,
Roman


-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Tuesday, February 16, 2010 8:35 PM
To: Talyansky, Roman
Cc: ceph-devel@lists.sourceforge.net
Subject: Re: [ceph-devel] Write operation is stuck

On Tue, 16 Feb 2010, Talyansky, Roman wrote:

> Hi Sage,
> 
> I am trying to reproduce the hang with the latest client and servers.
> I am able to start the servers, however mount fails with input/output error 5. The dmesg listing shows the following info:
> 
> [17008.244739] ceph: loaded 0.18.0 (mon/mds/osd proto 15/30/22)
> [17015.888143] ceph: mon0 10.55.147.70:6789 connection failed
> [17025.880170] ceph: mon0 10.55.147.70:6789 connection failed
> [17035.880121] ceph: mon0 10.55.147.70:6789 connection failed
> [17045.880189] ceph: mon0 10.55.147.70:6789 connection failed
> [17055.880130] ceph: mon0 10.55.147.70:6789 connection failed
> [17065.880113] ceph: mon0 10.55.147.70:6789 connection failed
> [17075.880170] ceph: mon0 10.55.147.70:6789 connection failed
> 
> The server is reachable, as the following command output shows:
> 
> $ nc 10.55.147.83 6789
> ceph v027

It looks like dmesg shows it trying to connect to the monitor at .70, but 
you tested .83?

> I started running the experiments with ceph 0.18 using the 
> configuration, where clients and servers run on separate nodes. It turns 
> out that the performance is extremely bad. Looking at dmesg trace I see 
> ceph-related faults (the partial trace is attached to the email).

The oops in the attached trace.txt was fixed last week in the unstable 
code.  It also looks like the IO is synchronous, which may have something 
to do with your performance.  Are you mounting with -o sync or using 
direct IO, or are multiple clients reading and writing to the same file or 
something?

Thanks-
sage


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-16 17:27   ` Talyansky, Roman
@ 2010-02-16 18:35     ` Sage Weil
  2010-02-19 15:40       ` Talyansky, Roman
  0 siblings, 1 reply; 27+ messages in thread
From: Sage Weil @ 2010-02-16 18:35 UTC (permalink / raw)
  To: Talyansky, Roman; +Cc: ceph-devel

On Tue, 16 Feb 2010, Talyansky, Roman wrote:

> Hi Sage,
> 
> I am trying to reproduce the hang with the latest client and servers.
> I am able to start the servers, however mount fails with input/output error 5. The dmesg listing shows the following info:
> 
> [17008.244739] ceph: loaded 0.18.0 (mon/mds/osd proto 15/30/22)
> [17015.888143] ceph: mon0 10.55.147.70:6789 connection failed
> [17025.880170] ceph: mon0 10.55.147.70:6789 connection failed
> [17035.880121] ceph: mon0 10.55.147.70:6789 connection failed
> [17045.880189] ceph: mon0 10.55.147.70:6789 connection failed
> [17055.880130] ceph: mon0 10.55.147.70:6789 connection failed
> [17065.880113] ceph: mon0 10.55.147.70:6789 connection failed
> [17075.880170] ceph: mon0 10.55.147.70:6789 connection failed
> 
> The server is reachable, as the following command output shows:
> 
> $ nc 10.55.147.83 6789
> ceph v027

It looks like dmesg shows it trying to connect to the monitor at .70, but 
you tested .83?

> I started running the experiments with ceph 0.18 using the 
> configuration, where clients and servers run on separate nodes. It turns 
> out that the performance is extremely bad. Looking at dmesg trace I see 
> ceph-related faults (the partial trace is attached to the email).

The oops in the attached trace.txt was fixed last week in the unstable 
code.  It also looks like the IO is synchronous, which may have something 
to do with your performance.  Are you mounting with -o sync or using 
direct IO, or are multiple clients reading and writing to the same file or 
something?

Thanks-
sage


------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-10 21:39 ` Sage Weil
  2010-02-10 22:44   ` Talyansky, Roman
@ 2010-02-16 17:27   ` Talyansky, Roman
  2010-02-16 18:35     ` Sage Weil
  1 sibling, 1 reply; 27+ messages in thread
From: Talyansky, Roman @ 2010-02-16 17:27 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 2516 bytes --]

Hi Sage,

I am trying to reproduce the hang with the latest client and servers.
I am able to start the servers, however mount fails with input/output error 5. The dmesg listing shows the following info:

[17008.244739] ceph: loaded 0.18.0 (mon/mds/osd proto 15/30/22)
[17015.888143] ceph: mon0 10.55.147.70:6789 connection failed
[17025.880170] ceph: mon0 10.55.147.70:6789 connection failed
[17035.880121] ceph: mon0 10.55.147.70:6789 connection failed
[17045.880189] ceph: mon0 10.55.147.70:6789 connection failed
[17055.880130] ceph: mon0 10.55.147.70:6789 connection failed
[17065.880113] ceph: mon0 10.55.147.70:6789 connection failed
[17075.880170] ceph: mon0 10.55.147.70:6789 connection failed

The server is reachable, as the following command output shows:

$ nc 10.55.147.83 6789
ceph v027

I started running the experiments with ceph 0.18 using the configuration, where clients and servers run on separate nodes. It turns out that the performance is extremely bad. Looking at dmesg trace I see ceph-related faults (the partial trace is attached to the email).

Any suggestions how to proceed are more than welcome.

Thanks,
Roman

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Wednesday, February 10, 2010 11:39 PM
To: Talyansky, Roman
Cc: ceph-devel@lists.sourceforge.net
Subject: Re: [ceph-devel] Write operation is stuck

Hi Roman,

On Wed, 10 Feb 2010, Talyansky, Roman wrote:

> Hello,
> 
> Recently I ran three application  instances simultaneously over a mounted CEPH file system and one of them got stuck calling a write operation.
> I had the following CEPH configuration:
> -       The nodes have Debian installation - lenny  , unstable
> -       Three nodes with osd servers
> -       Three client nodes
> -       One client node among the three mentioned above was located at a node where an osd server ran.
> 
> Can the origin of the problem be the client collocated with an osd server?

The collocated client+osd can theoretically cause problems when you run 
out of memory, but it doesn't sound like that's the case here.

> Can you help me to resolve this issue?

I assume the OSDs and MDS are all still running?

We fixed a number of bugs recently with multiple clients interacting with 
the same files.  Is the hang reproducable?  Can you try it with the latest 
unstable client and servers?  Or, enable mds debug logging and post that 
somewhere (debug mds = 20, debug ms = 1)?

Thanks-
sage

[-- Attachment #2: trace.txt --]
[-- Type: text/plain, Size: 4287 bytes --]

[112691.516538] general protection fault: 0000 [#73] SMP
[112691.520517] last sysfs file: /sys/devices/virtual/net/lo/operstate
[112691.520517] CPU 1
[112691.520517] Modules linked in: ceph crc32c libcrc32c nfs lockd fscache nfs_acl auth_rpcgss sunrpc autofs4 ext4 jbd2 crc16 loop parport_pc fschmd i2c_i801
 parport i2c_core snd_hda_codec_realtek evdev tpm_infineon snd_hda_intel psmouse serio_raw tpm snd_hda_codec snd_pcsp snd_hwdep snd_pcm snd_timer snd soundco
re snd_page_alloc container tpm_bios processor ext3 jbd mbcache sg sd_mod crc_t10dif sr_mod cdrom ide_pci_generic ide_core ata_generic uhci_hcd floppy ata_pi
ix button e1000e intel_agp agpgart libata ehci_hcd scsi_mod usbcore nls_base thermal fan thermal_sys [last unloaded: scsi_wait_scan]
[112691.520517] Pid: 3780, comm: ioplayer2 Tainted: G      D    2.6.32-trunk-amd64 #1 ESPRIMO P5925
[112691.520517] RIP: 0010:[<ffffffffa03a6f16>]  [<ffffffffa03a6f16>] zero_user_segment+0x62/0x75 [ceph]
[112691.520517] RSP: 0018:ffff880037861c88  EFLAGS: 00010246
[112691.520517] RAX: 0000000000000000 RBX: 00000000fffa8d87 RCX: 0000000000001000
[112691.520517] RDX: 6db6db6db6db6db7 RSI: 0000000000000000 RDI: 76f19732eb7bc000
[112691.520517] RBP: 0000000000001000 R08: 3120393532383120 R09: ffffffff814390b0
[112691.520517] R10: ffff88010b157800 R11: ffff8800d6802000 R12: 00000000fffa8d86
[112691.520517] R13: 0000000000002000 R14: ffff88010a918b10 R15: ffff88010a918b10
[112691.520517] FS:  00007f5be27fc910(0000) GS:ffff880005100000(0000) knlGS:0000000000000000
[112691.520517] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[112691.520517] CR2: 00007f4bd3a9ca90 CR3: 0000000109461000 CR4: 00000000000006e0
[112691.520517] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[112691.520517] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[112691.520517] Process ioplayer2 (pid: 3780, threadinfo ffff880037860000, task ffff880037945bd0)
[112691.520517] Stack:
[112691.520517]  0000010000000230 ffffffffa03a6f93 0000000000000000 00000001a8d86000
[112691.520517] <0> 0000000000000001 0000000000000000 00000001a8d86000 ffffffffa03a7bf7
[112691.520517] <0> ffff880000000000 ffffffffffffffff ffff88010a918b10 ffff880100000002
[112691.520517] Call Trace:
[112691.520517]  [<ffffffffa03a6f93>] ? zero_page_vector_range+0x6a/0xa5 [ceph]
[112691.520517]  [<ffffffffa03a7bf7>] ? ceph_aio_read+0x33d/0x4aa [ceph]
[112691.520517]  [<ffffffff810ebf01>] ? do_sync_read+0xce/0x113
[112691.520517]  [<ffffffff810676d4>] ? hrtimer_try_to_cancel+0x3a/0x43
[112691.520517]  [<ffffffff81064aae>] ? autoremove_wake_function+0x0/0x2e
[112691.520517]  [<ffffffff810676e9>] ? hrtimer_cancel+0xc/0x16
[112691.520517]  [<ffffffff812e62aa>] ? do_nanosleep+0x6d/0xa3
[112691.520517]  [<ffffffff8103aa9a>] ? pick_next_task+0x24/0x3f
[112691.520517]  [<ffffffff810ec94a>] ? vfs_read+0xa6/0xff
[112691.520517]  [<ffffffff810eca5f>] ? sys_read+0x45/0x6e
[112691.520517]  [<ffffffff81010b02>] ? system_call_fastpath+0x16/0x1b
[112691.520517] Code: b6 6d db b6 6d 49 8d 04 00 89 f7 29 f1 48 c1 f8 03 48 0f af c2 48 c1 e0 0c 48 01 c7 48 b8 00 00 00 00 00 88 ff ff 48 01 c7 31 c0 <f3> a
a 65 48 8b 04 25 c8 cb 00 00 ff 88 44 e0 ff ff 59 c3 41 56
[112691.520517] RIP  [<ffffffffa03a6f16>] zero_user_segment+0x62/0x75 [ceph]
[112691.520517]  RSP <ffff880037861c88>
[112691.871238] ---[ end trace 09486983a8cdbe04 ]---
[112691.876331] note: ioplayer2[3780] exited with preempt_count 1
[112691.881409] BUG: scheduling while atomic: ioplayer2/3780/0x10000001
[112691.886452] Modules linked in: ceph crc32c libcrc32c nfs lockd fscache nfs_acl auth_rpcgss sunrpc autofs4 ext4 jbd2 crc16 loop parport_pc fschmd i2c_i801
 parport i2c_core snd_hda_codec_realtek evdev tpm_infineon snd_hda_intel psmouse serio_raw tpm snd_hda_codec snd_pcsp snd_hwdep snd_pcm snd_timer snd soundco
re snd_page_alloc container tpm_bios processor ext3 jbd mbcache sg sd_mod crc_t10dif sr_mod cdrom ide_pci_generic ide_core ata_generic uhci_hcd floppy ata_pi
ix button e1000e intel_agp agpgart libata ehci_hcd scsi_mod usbcore nls_base thermal fan thermal_sys [last unloaded: scsi_wait_scan]
[112691.928434] Pid: 3780, comm: ioplayer2 Tainted: G      D    2.6.32-trunk-amd64 #1
[112691.938802] Call Trace:

[-- Attachment #3: Type: text/plain, Size: 254 bytes --]

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev

[-- Attachment #4: Type: text/plain, Size: 161 bytes --]

_______________________________________________
Ceph-devel mailing list
Ceph-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ceph-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-10 22:44   ` Talyansky, Roman
@ 2010-02-10 22:49     ` Sage Weil
  0 siblings, 0 replies; 27+ messages in thread
From: Sage Weil @ 2010-02-10 22:49 UTC (permalink / raw)
  To: Talyansky, Roman; +Cc: ceph-devel

On Wed, 10 Feb 2010, Talyansky, Roman wrote:
> > Or, enable mds debug logging and post that somewhere (debug mds = 20, 
> > debug ms = 1)?
> Should I place these two lines into the ceph.conf file in the [mds] section?

Yes.

Thanks-
sage

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-10 21:39 ` Sage Weil
@ 2010-02-10 22:44   ` Talyansky, Roman
  2010-02-10 22:49     ` Sage Weil
  2010-02-16 17:27   ` Talyansky, Roman
  1 sibling, 1 reply; 27+ messages in thread
From: Talyansky, Roman @ 2010-02-10 22:44 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi Sage,

Thanks for the reply.

> I assume the OSDs and MDS are all still running?
They are not running. Since I did not have trace files started for ceph, I decided to reproduce the hang with traces started. Currently I try to reproduce the hang.

> Can you try it with the latest unstable client and servers?
I will definitely try.

> Or, enable mds debug logging and post that somewhere (debug mds = 20, debug ms = 1)?
Should I place these two lines into the ceph.conf file in the [mds] section?

Thanks,
Roman

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Wednesday, February 10, 2010 11:39 PM
To: Talyansky, Roman
Cc: ceph-devel@lists.sourceforge.net
Subject: Re: [ceph-devel] Write operation is stuck

Hi Roman,

On Wed, 10 Feb 2010, Talyansky, Roman wrote:

> Hello,
> 
> Recently I ran three application  instances simultaneously over a mounted CEPH file system and one of them got stuck calling a write operation.
> I had the following CEPH configuration:
> -       The nodes have Debian installation - lenny  , unstable
> -       Three nodes with osd servers
> -       Three client nodes
> -       One client node among the three mentioned above was located at a node where an osd server ran.
> 
> Can the origin of the problem be the client collocated with an osd server?

The collocated client+osd can theoretically cause problems when you run 
out of memory, but it doesn't sound like that's the case here.

> Can you help me to resolve this issue?

I assume the OSDs and MDS are all still running?

We fixed a number of bugs recently with multiple clients interacting with 
the same files.  Is the hang reproducable?  Can you try it with the latest 
unstable client and servers?  Or, enable mds debug logging and post that 
somewhere (debug mds = 20, debug ms = 1)?

Thanks-
sage

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Write operation is stuck
  2010-02-10 21:26 Talyansky, Roman
@ 2010-02-10 21:39 ` Sage Weil
  2010-02-10 22:44   ` Talyansky, Roman
  2010-02-16 17:27   ` Talyansky, Roman
  0 siblings, 2 replies; 27+ messages in thread
From: Sage Weil @ 2010-02-10 21:39 UTC (permalink / raw)
  To: Talyansky, Roman; +Cc: ceph-devel

Hi Roman,

On Wed, 10 Feb 2010, Talyansky, Roman wrote:

> Hello,
> 
> Recently I ran three application  instances simultaneously over a mounted CEPH file system and one of them got stuck calling a write operation.
> I had the following CEPH configuration:
> -       The nodes have Debian installation - lenny  , unstable
> -       Three nodes with osd servers
> -       Three client nodes
> -       One client node among the three mentioned above was located at a node where an osd server ran.
> 
> Can the origin of the problem be the client collocated with an osd server?

The collocated client+osd can theoretically cause problems when you run 
out of memory, but it doesn't sound like that's the case here.

> Can you help me to resolve this issue?

I assume the OSDs and MDS are all still running?

We fixed a number of bugs recently with multiple clients interacting with 
the same files.  Is the hang reproducable?  Can you try it with the latest 
unstable client and servers?  Or, enable mds debug logging and post that 
somewhere (debug mds = 20, debug ms = 1)?

Thanks-
sage

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Write operation is stuck
@ 2010-02-10 21:26 Talyansky, Roman
  2010-02-10 21:39 ` Sage Weil
  0 siblings, 1 reply; 27+ messages in thread
From: Talyansky, Roman @ 2010-02-10 21:26 UTC (permalink / raw)
  To: ceph-devel


[-- Attachment #1.1: Type: text/plain, Size: 703 bytes --]

Hello,

Recently I ran three application  instances simultaneously over a mounted CEPH file system and one of them got stuck calling a write operation.
I had the following CEPH configuration:
-       The nodes have Debian installation - lenny  , unstable
-       Three nodes with osd servers
-       Three client nodes
-       One client node among the three mentioned above was located at a node where an osd server ran.

Can the origin of the problem be the client collocated with an osd server?
Can you help me to resolve this issue?

Thanks and regards,

Roman

--

Roman Talyansky
SAP Research, Israel

T +972 777 5538
M +972 3388 032
mailto:roman.talyansky@sap.com





[-- Attachment #1.2: Type: text/html, Size: 1655 bytes --]

[-- Attachment #2: Type: text/plain, Size: 254 bytes --]

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev

[-- Attachment #3: Type: text/plain, Size: 161 bytes --]

_______________________________________________
Ceph-devel mailing list
Ceph-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ceph-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-09-03 19:20 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-27 12:18 Write operation is stuck Bogdan Lobodzinski
2010-08-27 15:42 ` Wido den Hollander
2010-08-27 16:09 ` Sage Weil
2010-08-30 15:32   ` Bogdan Lobodzinski
2010-08-30 19:39     ` Sage Weil
2010-08-31  7:56       ` Bogdan Lobodzinski
2010-09-01 15:21         ` Bogdan Lobodzinski
2010-09-01 19:29           ` Wido den Hollander
2010-09-03 15:02             ` Bogdan Lobodzinski
2010-09-03 17:10               ` Yehuda Sadeh Weinraub
2010-09-03 19:20                 ` Yehuda Sadeh Weinraub
  -- strict thread matches above, loose matches on Subject: below --
2010-02-10 21:26 Talyansky, Roman
2010-02-10 21:39 ` Sage Weil
2010-02-10 22:44   ` Talyansky, Roman
2010-02-10 22:49     ` Sage Weil
2010-02-16 17:27   ` Talyansky, Roman
2010-02-16 18:35     ` Sage Weil
2010-02-19 15:40       ` Talyansky, Roman
2010-02-19 18:39         ` Sage Weil
2010-02-23 14:11           ` Talyansky, Roman
2010-02-23 18:11             ` Yehuda Sadeh Weinraub
2010-02-24 13:34               ` Talyansky, Roman
2010-02-24 14:56                 ` Sage Weil
2010-02-24 16:42                   ` Talyansky, Roman
2010-02-24 18:43                     ` Sage Weil
2010-02-24 23:21                       ` Talyansky, Roman
2010-02-25 10:07                       ` Talyansky, Roman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.