All of lore.kernel.org
 help / color / mirror / Atom feed
* Ceph hangs when accessed
       [not found] <1126025760.301101.1316793469807.JavaMail.root@zmbs4.inria.fr>
@ 2011-09-23 15:58 ` Cedric Morandin
  2011-09-23 17:20   ` Wido den Hollander
  0 siblings, 1 reply; 6+ messages in thread
From: Cedric Morandin @ 2011-09-23 15:58 UTC (permalink / raw)
  To: ceph-devel

Hi everybody,

I didn't find any ceph-users list so I post here. If this is not the right place to do it please let me know.
I'm currently trying to test ceph but I'm probably doing something wrong because I have a really strange behavior.

Context:
Ceph compiled and installed on five Centos6 machines.
A BTRFS partition is available on each machine.
This partition is mounted under /data/osd.[0-3]
Clients are using cfuse compiled for FC11 ( 2.6.29.4-167.fc11.x86_64 )

What happen:
I configured everything in ceph.conf, started ceph daemons on all nodes.
When I issue ceph health, I have a HEALTH_OK answer.
I can access the filesystem through cfuse and create some files on it, but when I try to create files bigger than 2 or 3 Mo, the filesystem hangs.
When I try to copy an entire directory ( ceph sources for instance) I have the same problem.
When the system is in this state, the cosd daemon die on OSD machines: [INF] osd0 out (down for 304.836218)
Even killing it doesn't release the mountpoint :
cosd       9170      root   10uW     REG                8,6          8    2506754 /data/osd.0/fsid
cosd       9170      root   11r      DIR                8,6       4096    2506753 /data/osd.0
cosd       9170      root   12r      DIR                8,6      24576    2506755 /data/osd.0/current
cosd       9170      root   13u      REG                8,6          4    2506757 /data/osd.0/current/commit_op_seq


I tried to change some parameters but it results in the same problem:
Tried both with the 0.34 and 0.35 releases and using both BTRFS or EXTR3 with user_attr attribute.
I tried the cfuse client on one of the Centos 6 machine.

I read everything on  http://ceph.newdream.net/wiki but I can't figure out the problem.
Does somebody have any clue of the problem's origin ?

Regards,

Cedric Morandin 



-- 
Cédric Morandin -  OASIS Research Team
INRIA Sophia Antipolis
2004 route des lucioles - BP 93
06902  Sophia-Antipolis (France)
Phone: +33 4 97 15 53 89


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Ceph hangs when accessed
  2011-09-23 15:58 ` Ceph hangs when accessed Cedric Morandin
@ 2011-09-23 17:20   ` Wido den Hollander
  2011-09-26 21:23     ` Cédric Morandin
  0 siblings, 1 reply; 6+ messages in thread
From: Wido den Hollander @ 2011-09-23 17:20 UTC (permalink / raw)
  To: Cedric Morandin; +Cc: ceph-devel

Hi.

Could you sent us your ceph.conf and the output of "ceph -s" ?

Wido

On Fri, 2011-09-23 at 17:58 +0200, Cedric Morandin wrote:
> Hi everybody,
> 
> I didn't find any ceph-users list so I post here. If this is not the right place to do it please let me know.
> I'm currently trying to test ceph but I'm probably doing something wrong because I have a really strange behavior.
> 
> Context:
> Ceph compiled and installed on five Centos6 machines.
> A BTRFS partition is available on each machine.
> This partition is mounted under /data/osd.[0-3]
> Clients are using cfuse compiled for FC11 ( 2.6.29.4-167.fc11.x86_64 )
> 
> What happen:
> I configured everything in ceph.conf, started ceph daemons on all nodes.
> When I issue ceph health, I have a HEALTH_OK answer.
> I can access the filesystem through cfuse and create some files on it, but when I try to create files bigger than 2 or 3 Mo, the filesystem hangs.
> When I try to copy an entire directory ( ceph sources for instance) I have the same problem.
> When the system is in this state, the cosd daemon die on OSD machines: [INF] osd0 out (down for 304.836218)
> Even killing it doesn't release the mountpoint :
> cosd       9170      root   10uW     REG                8,6          8    2506754 /data/osd.0/fsid
> cosd       9170      root   11r      DIR                8,6       4096    2506753 /data/osd.0
> cosd       9170      root   12r      DIR                8,6      24576    2506755 /data/osd.0/current
> cosd       9170      root   13u      REG                8,6          4    2506757 /data/osd.0/current/commit_op_seq
> 
> 
> I tried to change some parameters but it results in the same problem:
> Tried both with the 0.34 and 0.35 releases and using both BTRFS or EXTR3 with user_attr attribute.
> I tried the cfuse client on one of the Centos 6 machine.
> 
> I read everything on  http://ceph.newdream.net/wiki but I can't figure out the problem.
> Does somebody have any clue of the problem's origin ?
> 
> Regards,
> 
> Cedric Morandin 
> 
> 
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Ceph hangs when accessed
  2011-09-23 17:20   ` Wido den Hollander
@ 2011-09-26 21:23     ` Cédric Morandin
  2011-09-26 23:56       ` huang jun
  2011-09-27 16:32       ` Tommi Virtanen
  0 siblings, 2 replies; 6+ messages in thread
From: Cédric Morandin @ 2011-09-26 21:23 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel

Hi Wido,

Thanks for your answer and your kind help.
I tried to give you all useful information but maybe something is missing.
Let me know if you want me to do more tests.

Please find the output of ceph -s below:
[root@node91 ~]# ceph -s
2011-09-26 22:48:08.048659    pg v297: 792 pgs: 792 active+clean; 24 KB data, 80512 KB used, 339 GB / 340 GB avail
2011-09-26 22:48:08.049742   mds e5: 1/1/1 up {0=alpha=up:active}, 1 up:standby
2011-09-26 22:48:08.049764   osd e5: 4 osds: 4 up, 4 in
2011-09-26 22:48:08.049800   log 2011-09-26 19:38:14.372125 osd3 138.96.126.95:6800/2973 242 : [INF] 2.1p3 scrub ok
2011-09-26 22:48:08.049847   mon e1: 3 mons at {alpha=138.96.126.91:6789/0,beta=138.96.126.92:6789/0,gamma=138.96.126.93:6789/0}

The same command ten minutes after the cfuse hangs on the client node :

[root@node91 ~]# ceph -s
2011-09-26 23:07:49.403774    pg v335: 792 pgs: 101 active, 276 active+clean, 415 active+clean+degraded; 4806 KB data, 114 MB used, 339 GB / 340 GB avail; 24/56 degraded (42.857%)
2011-09-26 23:07:49.404847   mds e5: 1/1/1 up {0=alpha=up:active}, 1 up:standby
2011-09-26 23:07:49.404867   osd e13: 4 osds: 2 up, 4 in
2011-09-26 23:07:49.404929   log 2011-09-26 23:07:46.093670 mds0 138.96.126.91:6800/4682 2 : [INF] closing stale session client4124 138.96.126.91:0/5563 after 455.778957
2011-09-26 23:07:49.404966   mon e1: 3 mons at {alpha=138.96.126.91:6789/0,beta=138.96.126.92:6789/0,gamma=138.96.126.93:6789/0}

[root@node91 ~]# /etc/init.d/ceph -a status
=== mon.alpha === 
running...
=== mon.beta === 
running...
=== mon.gamma === 
running...
=== mds.alpha === 
running...
=== mds.beta === 
running...
=== osd.0 === 
dead.
=== osd.1 === 
running...
=== osd.2 === 
running...
=== osd.3 === 
dead.

I finally paste the last lines of osd.0 :

2011-09-26 22:57:06.822182 7faf6a6f8700 -- 138.96.126.92:6802/3157 >> 138.96.126.93:6801/3162 pipe(0x7faf50001320 sd=20 pgs=0 cs=0 l=0).accept connect_seq 2 vs existing 1 state 3
2011-09-26 23:07:09.084901 7faf8e1b5700 FileStore: sync_entry timed out after 600 seconds.
 ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
2011-09-26 23:07:09.084934 1: (SafeTimer::timer_thread()+0x323) [0x5c95a3]
2011-09-26 23:07:09.084943 2: (SafeTimerThread::entry()+0xd) [0x5cbc7d]
2011-09-26 23:07:09.084950 3: /lib64/libpthread.so.0() [0x31fec077e1]
2011-09-26 23:07:09.084957 4: (clone()+0x6d) [0x31fe4e18ed]
2011-09-26 23:07:09.084963 *** Caught signal (Aborted) **
 in thread 0x7faf8e1b5700
 ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
 1: /usr/bin/cosd() [0x649ca9]
 2: /lib64/libpthread.so.0() [0x31fec0f4c0]
 3: (gsignal()+0x35) [0x31fe4329a5]
 4: (abort()+0x175) [0x31fe434185]
 5: (__assert_fail()+0xf5) [0x31fe42b935]
 6: (SyncEntryTimeout::finish(int)+0x130) [0x683400]
 7: (SafeTimer::timer_thread()+0x323) [0x5c95a3]
 8: (SafeTimerThread::entry()+0xd) [0x5cbc7d]
 9: /lib64/libpthread.so.0() [0x31fec077e1]
 10: (clone()+0x6d) [0x31fe4e18ed]

ceph.conf:

[global]
        max open files = 131072
        log file = /var/log/ceph/$name.log
        pid file = /var/run/ceph/$name.pid
[mon]
        mon data = /data/$name
        mon clock drift allowed = 1
[mon.alpha]
        host = node91
        mon addr = 138.96.126.91:6789
[mon.beta]
        host = node92
        mon addr = 138.96.126.92:6789
[mon.gamma]
        host = node93
        mon addr = 138.96.126.93:6789
[mds]

        keyring = /data/keyring.$name
[mds.alpha]
        host = node91
[mds.beta]
        host = node92
[osd]
        osd data = /data/$name
        osd journal = /data/$name/journal
        osd journal size = 1000
[osd.0]
        host = node92
[osd.1]
        host = node93
[osd.2]
        host = node94
[osd.3]
        host = node95

----

Thank you one more time for your help.

Regards

Cédric

Le 23 sept. 2011 à 19:20, Wido den Hollander a écrit :

> Hi.
> 
> Could you sent us your ceph.conf and the output of "ceph -s" ?
> 
> Wido
> 
> On Fri, 2011-09-23 at 17:58 +0200, Cedric Morandin wrote:
>> Hi everybody,
>> 
>> I didn't find any ceph-users list so I post here. If this is not the right place to do it please let me know.
>> I'm currently trying to test ceph but I'm probably doing something wrong because I have a really strange behavior.
>> 
>> Context:
>> Ceph compiled and installed on five Centos6 machines.
>> A BTRFS partition is available on each machine.
>> This partition is mounted under /data/osd.[0-3]
>> Clients are using cfuse compiled for FC11 ( 2.6.29.4-167.fc11.x86_64 )
>> 
>> What happen:
>> I configured everything in ceph.conf, started ceph daemons on all nodes.
>> When I issue ceph health, I have a HEALTH_OK answer.
>> I can access the filesystem through cfuse and create some files on it, but when I try to create files bigger than 2 or 3 Mo, the filesystem hangs.
>> When I try to copy an entire directory ( ceph sources for instance) I have the same problem.
>> When the system is in this state, the cosd daemon die on OSD machines: [INF] osd0 out (down for 304.836218)
>> Even killing it doesn't release the mountpoint :
>> cosd       9170      root   10uW     REG                8,6          8    2506754 /data/osd.0/fsid
>> cosd       9170      root   11r      DIR                8,6       4096    2506753 /data/osd.0
>> cosd       9170      root   12r      DIR                8,6      24576    2506755 /data/osd.0/current
>> cosd       9170      root   13u      REG                8,6          4    2506757 /data/osd.0/current/commit_op_seq
>> 
>> 
>> I tried to change some parameters but it results in the same problem:
>> Tried both with the 0.34 and 0.35 releases and using both BTRFS or EXTR3 with user_attr attribute.
>> I tried the cfuse client on one of the Centos 6 machine.
>> 
>> I read everything on  http://ceph.newdream.net/wiki but I can't figure out the problem.
>> Does somebody have any clue of the problem's origin ?
>> 
>> Regards,
>> 
>> Cedric Morandin 
>> 
>> 
>> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Ceph hangs when accessed
  2011-09-26 21:23     ` Cédric Morandin
@ 2011-09-26 23:56       ` huang jun
  2011-09-27 16:32       ` Tommi Virtanen
  1 sibling, 0 replies; 6+ messages in thread
From: huang jun @ 2011-09-26 23:56 UTC (permalink / raw)
  To: Cédric Morandin; +Cc: Wido den Hollander, ceph-devel

2011/9/27 Cédric Morandin <cedric.morandin@inria.fr>:
> Hi Wido,
>
> Thanks for your answer and your kind help.
> I tried to give you all useful information but maybe something is missing.
> Let me know if you want me to do more tests.
>
> Please find the output of ceph -s below:
> [root@node91 ~]# ceph -s
> 2011-09-26 22:48:08.048659    pg v297: 792 pgs: 792 active+clean; 24 KB data, 80512 KB used, 339 GB / 340 GB avail
> 2011-09-26 22:48:08.049742   mds e5: 1/1/1 up {0=alpha=up:active}, 1 up:standby
> 2011-09-26 22:48:08.049764   osd e5: 4 osds: 4 up, 4 in
> 2011-09-26 22:48:08.049800   log 2011-09-26 19:38:14.372125 osd3 138.96.126.95:6800/2973 242 : [INF] 2.1p3 scrub ok
> 2011-09-26 22:48:08.049847   mon e1: 3 mons at {alpha=138.96.126.91:6789/0,beta=138.96.126.92:6789/0,gamma=138.96.126.93:6789/0}
>
> The same command ten minutes after the cfuse hangs on the client node :
>
> [root@node91 ~]# ceph -s
> 2011-09-26 23:07:49.403774    pg v335: 792 pgs: 101 active, 276 active+clean, 415 active+clean+degraded; 4806 KB data, 114 MB used, 339 GB / 340 GB avail; 24/56 degraded (42.857%)
> 2011-09-26 23:07:49.404847   mds e5: 1/1/1 up {0=alpha=up:active}, 1 up:standby
> 2011-09-26 23:07:49.404867   osd e13: 4 osds: 2 up, 4 in
> 2011-09-26 23:07:49.404929   log 2011-09-26 23:07:46.093670 mds0 138.96.126.91:6800/4682 2 : [INF] closing stale session client4124 138.96.126.91:0/5563 after 455.778957
> 2011-09-26 23:07:49.404966   mon e1: 3 mons at {alpha=138.96.126.91:6789/0,beta=138.96.126.92:6789/0,gamma=138.96.126.93:6789/0}
>
> [root@node91 ~]# /etc/init.d/ceph -a status
> === mon.alpha ===
> running...
> === mon.beta ===
> running...
> === mon.gamma ===
> running...
> === mds.alpha ===
> running...
> === mds.beta ===
> running...
> === osd.0 ===
> dead.
> === osd.1 ===
> running...
> === osd.2 ===
> running...
> === osd.3 ===
> dead.
>
> I finally paste the last lines of osd.0 :
>
> 2011-09-26 22:57:06.822182 7faf6a6f8700 -- 138.96.126.92:6802/3157 >> 138.96.126.93:6801/3162 pipe(0x7faf50001320 sd=20 pgs=0 cs=0 l=0).accept connect_seq 2 vs existing 1 state 3
> 2011-09-26 23:07:09.084901 7faf8e1b5700 FileStore: sync_entry timed out after 600 seconds.
>  ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
> 2011-09-26 23:07:09.084934 1: (SafeTimer::timer_thread()+0x323) [0x5c95a3]
> 2011-09-26 23:07:09.084943 2: (SafeTimerThread::entry()+0xd) [0x5cbc7d]
> 2011-09-26 23:07:09.084950 3: /lib64/libpthread.so.0() [0x31fec077e1]
> 2011-09-26 23:07:09.084957 4: (clone()+0x6d) [0x31fe4e18ed]
> 2011-09-26 23:07:09.084963 *** Caught signal (Aborted) **
>  in thread 0x7faf8e1b5700
>  ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
>  1: /usr/bin/cosd() [0x649ca9]
>  2: /lib64/libpthread.so.0() [0x31fec0f4c0]
>  3: (gsignal()+0x35) [0x31fe4329a5]
>  4: (abort()+0x175) [0x31fe434185]
>  5: (__assert_fail()+0xf5) [0x31fe42b935]
>  6: (SyncEntryTimeout::finish(int)+0x130) [0x683400]
>  7: (SafeTimer::timer_thread()+0x323) [0x5c95a3]
>  8: (SafeTimerThread::entry()+0xd) [0x5cbc7d]
>  9: /lib64/libpthread.so.0() [0x31fec077e1]
>  10: (clone()+0x6d) [0x31fe4e18ed]
may be the underlayer fs(btrfs/ext4)  is busy or hang that result the
sync_commit took more than 600s,the osd think it is dead, so use
ceph_abort to terminate cosd process.
> ceph.conf:
>
> [global]
>        max open files = 131072
>        log file = /var/log/ceph/$name.log
>        pid file = /var/run/ceph/$name.pid
> [mon]
>        mon data = /data/$name
>        mon clock drift allowed = 1
> [mon.alpha]
>        host = node91
>        mon addr = 138.96.126.91:6789
> [mon.beta]
>        host = node92
>        mon addr = 138.96.126.92:6789
> [mon.gamma]
>        host = node93
>        mon addr = 138.96.126.93:6789
> [mds]
>
>        keyring = /data/keyring.$name
> [mds.alpha]
>        host = node91
> [mds.beta]
>        host = node92
> [osd]
>        osd data = /data/$name
>        osd journal = /data/$name/journal
>        osd journal size = 1000
> [osd.0]
>        host = node92
> [osd.1]
>        host = node93
> [osd.2]
>        host = node94
> [osd.3]
>        host = node95
>
> ----
>
> Thank you one more time for your help.
>
> Regards
>
> Cédric
>
> Le 23 sept. 2011 à 19:20, Wido den Hollander a écrit :
>
>> Hi.
>>
>> Could you sent us your ceph.conf and the output of "ceph -s" ?
>>
>> Wido
>>
>> On Fri, 2011-09-23 at 17:58 +0200, Cedric Morandin wrote:
>>> Hi everybody,
>>>
>>> I didn't find any ceph-users list so I post here. If this is not the right place to do it please let me know.
>>> I'm currently trying to test ceph but I'm probably doing something wrong because I have a really strange behavior.
>>>
>>> Context:
>>> Ceph compiled and installed on five Centos6 machines.
>>> A BTRFS partition is available on each machine.
>>> This partition is mounted under /data/osd.[0-3]
>>> Clients are using cfuse compiled for FC11 ( 2.6.29.4-167.fc11.x86_64 )
>>>
>>> What happen:
>>> I configured everything in ceph.conf, started ceph daemons on all nodes.
>>> When I issue ceph health, I have a HEALTH_OK answer.
>>> I can access the filesystem through cfuse and create some files on it, but when I try to create files bigger than 2 or 3 Mo, the filesystem hangs.
>>> When I try to copy an entire directory ( ceph sources for instance) I have the same problem.
>>> When the system is in this state, the cosd daemon die on OSD machines: [INF] osd0 out (down for 304.836218)
>>> Even killing it doesn't release the mountpoint :
>>> cosd       9170      root   10uW     REG                8,6          8    2506754 /data/osd.0/fsid
>>> cosd       9170      root   11r      DIR                8,6       4096    2506753 /data/osd.0
>>> cosd       9170      root   12r      DIR                8,6      24576    2506755 /data/osd.0/current
>>> cosd       9170      root   13u      REG                8,6          4    2506757 /data/osd.0/current/commit_op_seq
>>>
>>>
>>> I tried to change some parameters but it results in the same problem:
>>> Tried both with the 0.34 and 0.35 releases and using both BTRFS or EXTR3 with user_attr attribute.
>>> I tried the cfuse client on one of the Centos 6 machine.
>>>
>>> I read everything on  http://ceph.newdream.net/wiki but I can't figure out the problem.
>>> Does somebody have any clue of the problem's origin ?
>>>
>>> Regards,
>>>
>>> Cedric Morandin
>>>
>>>
>>>
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Ceph hangs when accessed
  2011-09-26 21:23     ` Cédric Morandin
  2011-09-26 23:56       ` huang jun
@ 2011-09-27 16:32       ` Tommi Virtanen
  2011-09-29 15:22         ` Cedric Morandin
  1 sibling, 1 reply; 6+ messages in thread
From: Tommi Virtanen @ 2011-09-27 16:32 UTC (permalink / raw)
  To: Cédric Morandin; +Cc: Wido den Hollander, ceph-devel

On Mon, Sep 26, 2011 at 14:23, Cédric Morandin <cedric.morandin@inria.fr> wrote:
> 2011-09-26 23:07:49.404867   osd e13: 4 osds: 2 up, 4 in
...
> 2011-09-26 22:57:06.822182 7faf6a6f8700 -- 138.96.126.92:6802/3157 >> 138.96.126.93:6801/3162 pipe(0x7faf50001320 sd=20 pgs=0 cs=0 l=0).accept connect_seq 2 vs existing 1 state 3
> 2011-09-26 23:07:09.084901 7faf8e1b5700 FileStore: sync_entry timed out after 600 seconds.
>  ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)

And earlier you said the OSDs are using btrfs. That definitely sounds
like a btrfs bug, then.

Do the osd machines have anything interesting in dmesg or /var/log/kern.log ?

You may want to try a newer kernel, or running on ext4 for now.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Ceph hangs when accessed
  2011-09-27 16:32       ` Tommi Virtanen
@ 2011-09-29 15:22         ` Cedric Morandin
  0 siblings, 0 replies; 6+ messages in thread
From: Cedric Morandin @ 2011-09-29 15:22 UTC (permalink / raw)
  To: Tommi Virtanen; +Cc: Wido den Hollander, ceph-devel

Hello Tommi,

I followed your advice and tried with ext4.
Everything is working fine with ext4.
I'll try with a newer version of btrfs when I would have time.

I paste below the trace related to btrfs which appears in var/log/messages during the problem :

Sep 26 23:04:51 node95 kernel: INFO: task cosd:2988 blocked for more than 120 seconds.
Sep 26 23:04:51 node95 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 26 23:04:51 node95 kernel: cosd          D ffff880c3fc25700     0  2988      1 0x00000080
Sep 26 23:04:51 node95 kernel: ffff8817fb919bc8 0000000000000082 0000000000000000 ffffffffa0175fa1
Sep 26 23:04:51 node95 kernel: 0000000000000000 ffff8817f844f448 ffffffffa015df40 0000000100d0253c
Sep 26 23:04:51 node95 kernel: ffff881806d05a98 ffff8817fb919fd8 0000000000010518 ffff881806d05a98
Sep 26 23:04:51 node95 kernel: Call Trace:
Sep 26 23:04:51 node95 kernel: [<ffffffffa0175fa1>] ? extent_writepages+0x51/0x60 [btrfs]
Sep 26 23:04:51 node95 kernel: [<ffffffffa015df40>] ? btrfs_get_extent+0x0/0x8b0 [btrfs]
Sep 26 23:04:51 node95 kernel: [<ffffffffa016ef8d>] btrfs_start_ordered_extent+0x6d/0xc0 [btrfs]
Sep 26 23:04:51 node95 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
Sep 26 23:04:51 node95 kernel: [<ffffffffa016f16b>] btrfs_wait_ordered_extents+0x12b/0x1e0 [btrfs]
Sep 26 23:04:51 node95 kernel: [<ffffffffa015336f>] btrfs_commit_transaction+0x20f/0x710 [btrfs]
Sep 26 23:04:51 node95 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
Sep 26 23:04:51 node95 kernel: [<ffffffffa01801b6>] btrfs_mksubvol+0x2d6/0x350 [btrfs]
Sep 26 23:04:51 node95 kernel: [<ffffffffa0180343>] btrfs_ioctl_snap_create+0x113/0x160 [btrfs]
Sep 26 23:04:51 node95 kernel: [<ffffffffa0181d9a>] btrfs_ioctl+0x4ca/0x970 [btrfs]
Sep 26 23:04:51 node95 kernel: [<ffffffff8117f182>] vfs_ioctl+0x22/0xa0
Sep 26 23:04:51 node95 kernel: [<ffffffff81059d12>] ? finish_task_switch+0x42/0xd0
Sep 26 23:04:51 node95 kernel: [<ffffffff8117f324>] do_vfs_ioctl+0x84/0x580
Sep 26 23:04:51 node95 kernel: [<ffffffff8116c892>] ? vfs_write+0x132/0x1a0
Sep 26 23:04:51 node95 kernel: [<ffffffff8117f8a1>] sys_ioctl+0x81/0xa0
Sep 26 23:04:51 node95 kernel: [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
Sep 26 23:06:51 node95 kernel: INFO: task btrfs-transacti:1093 blocked for more than 120 seconds.
Sep 26 23:06:51 node95 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 26 23:06:51 node95 kernel: btrfs-transac D ffff880c3fc25700     0  1093      2 0x00000000
Sep 26 23:06:51 node95 kernel: ffff880c070e5d50 0000000000000046 0000000000000000 ffffffff81059d12
Sep 26 23:06:51 node95 kernel: 0000000000000000 0000000000016980 0000000000000000 0000000100d09a6f
Sep 26 23:06:51 node95 kernel: ffff880c058fd028 ffff880c070e5fd8 0000000000010518 ffff880c058fd028
Sep 26 23:06:51 node95 kernel: Call Trace:
Sep 26 23:06:51 node95 kernel: [<ffffffff81059d12>] ? finish_task_switch+0x42/0xd0
Sep 26 23:06:51 node95 kernel: [<ffffffffa0151ec9>] wait_for_commit+0x89/0xf0 [btrfs]
Sep 26 23:06:51 node95 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
Sep 26 23:06:51 node95 kernel: [<ffffffffa015374e>] btrfs_commit_transaction+0x5ee/0x710 [btrfs]
Sep 26 23:06:51 node95 kernel: [<ffffffff814c963e>] ? mutex_lock+0x1e/0x50
Sep 26 23:06:51 node95 kernel: [<ffffffffa0153c8b>] ? start_transaction+0x1ab/0x230 [btrfs]
Sep 26 23:06:51 node95 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
Sep 26 23:06:51 node95 kernel: [<ffffffffa014d9ab>] transaction_kthread+0x26b/0x280 [btrfs]
Sep 26 23:06:51 node95 kernel: [<ffffffffa014d740>] ? transaction_kthread+0x0/0x280 [btrfs]
Sep 26 23:06:51 node95 kernel: [<ffffffff81091936>] kthread+0x96/0xa0
Sep 26 23:06:51 node95 kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Sep 26 23:06:51 node95 kernel: [<ffffffff810918a0>] ? kthread+0x0/0xa0
Sep 26 23:06:51 node95 kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20

Regards

Cédric

----- Mail original -----
> De: "Tommi Virtanen" <tommi.virtanen@dreamhost.com>
> À: "Cédric Morandin" <cedric.morandin@inria.fr>
> Cc: "Wido den Hollander" <wido@widodh.nl>, ceph-devel@vger.kernel.org
> Envoyé: Mardi 27 Septembre 2011 18:32:24
> Objet: Re: Ceph hangs when accessed
> On Mon, Sep 26, 2011 at 14:23, Cédric Morandin
> <cedric.morandin@inria.fr> wrote:
> > 2011-09-26 23:07:49.404867 osd e13: 4 osds: 2 up, 4 in
> ...
> > 2011-09-26 22:57:06.822182 7faf6a6f8700 -- 138.96.126.92:6802/3157
> > >> 138.96.126.93:6801/3162 pipe(0x7faf50001320 sd=20 pgs=0 cs=0
> > l=0).accept connect_seq 2 vs existing 1 state 3
> > 2011-09-26 23:07:09.084901 7faf8e1b5700 FileStore: sync_entry timed
> > out after 600 seconds.
> >  ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
> 
> And earlier you said the OSDs are using btrfs. That definitely sounds
> like a btrfs bug, then.
> 
> Do the osd machines have anything interesting in dmesg or
> /var/log/kern.log ?
> 
> You may want to try a newer kernel, or running on ext4 for now.


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-09-29 15:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1126025760.301101.1316793469807.JavaMail.root@zmbs4.inria.fr>
2011-09-23 15:58 ` Ceph hangs when accessed Cedric Morandin
2011-09-23 17:20   ` Wido den Hollander
2011-09-26 21:23     ` Cédric Morandin
2011-09-26 23:56       ` huang jun
2011-09-27 16:32       ` Tommi Virtanen
2011-09-29 15:22         ` Cedric Morandin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.