All of lore.kernel.org
 help / color / mirror / Atom feed
* MDS crash
@ 2011-05-23 21:21 Fyodor Ustinov
  2011-05-23 22:27 ` Sage Weil
  2011-05-24 23:54 ` Sage Weil
  0 siblings, 2 replies; 16+ messages in thread
From: Fyodor Ustinov @ 2011-05-23 21:21 UTC (permalink / raw)
  To: ceph-devel

Hi.

2011-05-24 00:17:45.490684 7f45415e1740 ceph version 0.28.commit: 
071881d7e5599571e46bda17094bb4b48691e89a. process: cmds. pid: 4424
2011-05-24 00:17:45.492293 7f453ef81700 mds-1.0 ms_handle_connect on 
77.120.112.193:6789/0
2011-05-24 00:17:49.497862 7f453ef81700 mds-1.0 handle_mds_map standby
2011-05-24 00:17:53.274911 7f453ef81700 mds0.5 handle_mds_map i am now 
mds0.5
2011-05-24 00:17:53.274939 7f453ef81700 mds0.5 handle_mds_map state 
change up:standby --> up:replay
2011-05-24 00:17:53.274951 7f453ef81700 mds0.5 replay_start
2011-05-24 00:17:53.274962 7f453ef81700 mds0.5  recovery set is
2011-05-24 00:17:53.274969 7f453ef81700 mds0.5  need osdmap epoch 104, 
have 103
2011-05-24 00:17:53.274985 7f453ef81700 mds0.5  waiting for osdmap 104 
(which blacklists prior instance)
2011-05-24 00:17:53.275016 7f453ef81700 mds0.cache handle_mds_failure 
mds0 : recovery peers are
2011-05-24 00:17:53.276145 7f453ef81700 mds0.5 ms_handle_connect on 
77.120.112.201:6800/29765
2011-05-24 00:17:53.276223 7f453ef81700 mds0.5 ms_handle_connect on 
82.144.220.71:6800/5210
2011-05-24 00:17:53.276785 7f453ef81700 mds0.5 ms_handle_connect on 
82.144.220.72:6800/3960
2011-05-24 00:17:53.301249 7f453ef81700 mds0.5 ms_handle_connect on 
82.144.220.70:6800/25341
2011-05-24 00:17:53.307286 7f453ef81700 mds0.cache creating system inode 
with ino:100
2011-05-24 00:17:53.307441 7f453ef81700 mds0.cache creating system inode 
with ino:1
2011-05-24 00:17:53.308273 7f453ef81700 mds0.5 ms_handle_connect on 
77.120.112.200:6800/9187
2011-05-24 00:17:54.506400 7f4537fff700 mds0.5 replay_done
2011-05-24 00:17:54.506431 7f4537fff700 mds0.5 making mds journal writeable
2011-05-24 00:17:54.511104 7f453ef81700 mds0.5 handle_mds_map i am now 
mds0.5
2011-05-24 00:17:54.511127 7f453ef81700 mds0.5 handle_mds_map state 
change up:replay --> up:reconnect
2011-05-24 00:17:54.511138 7f453ef81700 mds0.5 reconnect_start
2011-05-24 00:17:54.511144 7f453ef81700 mds0.5 reopen_log
2011-05-24 00:17:54.511163 7f453ef81700 mds0.server reconnect_clients -- 
1 sessions
2011-05-24 00:17:54.511832 7f453c472700 -- 77.120.112.193:6800/4424 >> 
77.120.112.209:0/3638704563 pipe(0x10f8370 sd=11 pgs=0 cs=0 l=0).accept 
peer addr is really 77.120.112.209:0/3638704563 (socket is 
77.120.112.209:38599/0)
2011-05-24 00:17:54.512859 7f453ef81700 log [DBG] : reconnect by 
client4404 77.120.112.209:0/3638704563 after 0.001651
2011-05-24 00:17:54.513057 7f453ef81700 mds0.server missing 1000000860a 
#10000008019/vtapes/drive0/data (mine), will load later
2011-05-24 00:17:54.513091 7f453ef81700 mds0.5 reconnect_done
2011-05-24 00:17:54.515176 7f453ef81700 mds0.5 handle_mds_map i am now 
mds0.5
2011-05-24 00:17:54.515193 7f453ef81700 mds0.5 handle_mds_map state 
change up:reconnect --> up:rejoin
2011-05-24 00:17:54.515201 7f453ef81700 mds0.5 rejoin_joint_start
2011-05-24 00:17:54.522602 7f453ef81700 mds0.5 rejoin_done
2011-05-24 00:17:54.528794 7f453ef81700 mds0.5 handle_mds_map i am now 
mds0.5
2011-05-24 00:17:54.528812 7f453ef81700 mds0.5 handle_mds_map state 
change up:rejoin --> up:active
2011-05-24 00:17:54.528819 7f453ef81700 mds0.5 recovery_done -- 
successful recovery!
2011-05-24 00:17:54.529315 7f453ef81700 mds0.5 active_start
2011-05-24 00:17:54.531405 7f453ef81700 mds0.5 cluster recovered.
*** Caught signal (Segmentation fault) **
  in thread 0x7f453ef81700
  ceph version 0.28 (commit:071881d7e5599571e46bda17094bb4b48691e89a)
  1: /usr/bin/cmds() [0x712c5e]
  2: (()+0xfc60) [0x7f45411c0c60]
  3: (MDCache::get_or_create_stray_dentry(CInode*)+0x25) [0x5356f5]
  4: (Server::handle_client_unlink(MDRequest*)+0x997) [0x508857]
  5: (Server::handle_client_request(MClientRequest*)+0x522) [0x520852]
  6: (MDS::handle_deferrable_message(Message*)+0x9af) [0x4a266f]
  7: (MDS::_dispatch(Message*)+0x173e) [0x4b617e]
  8: (MDS::_dispatch(Message*)+0x427) [0x4b4e67]
  9: (MDS::ms_dispatch(Message*)+0x59) [0x4b66c9]
  10: (SimpleMessenger::dispatch_entry()+0x7ea) [0x4838aa]
  11: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x47b26c]
  12: (()+0x6d8c) [0x7f45411b7d8c]
  13: (clone()+0x6d) [0x7f454006a04d]

WBR,
     Fyodor.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MDS crash
  2011-05-23 21:21 MDS crash Fyodor Ustinov
@ 2011-05-23 22:27 ` Sage Weil
  2011-05-23 22:45   ` Fyodor Ustinov
  2011-05-24  0:32   ` Fyodor Ustinov
  2011-05-24 23:54 ` Sage Weil
  1 sibling, 2 replies; 16+ messages in thread
From: Sage Weil @ 2011-05-23 22:27 UTC (permalink / raw)
  To: Fyodor Ustinov; +Cc: ceph-devel

Hi Fyodor,

This looks like #1104.  Will try to sort it out today, it should be a 
simple one.

sage


On Tue, 24 May 2011, Fyodor Ustinov wrote:

> Hi.
> 
> 2011-05-24 00:17:45.490684 7f45415e1740 ceph version 0.28.commit:
> 071881d7e5599571e46bda17094bb4b48691e89a. process: cmds. pid: 4424
> 2011-05-24 00:17:45.492293 7f453ef81700 mds-1.0 ms_handle_connect on
> 77.120.112.193:6789/0
> 2011-05-24 00:17:49.497862 7f453ef81700 mds-1.0 handle_mds_map standby
> 2011-05-24 00:17:53.274911 7f453ef81700 mds0.5 handle_mds_map i am now mds0.5
> 2011-05-24 00:17:53.274939 7f453ef81700 mds0.5 handle_mds_map state change
> up:standby --> up:replay
> 2011-05-24 00:17:53.274951 7f453ef81700 mds0.5 replay_start
> 2011-05-24 00:17:53.274962 7f453ef81700 mds0.5  recovery set is
> 2011-05-24 00:17:53.274969 7f453ef81700 mds0.5  need osdmap epoch 104, have
> 103
> 2011-05-24 00:17:53.274985 7f453ef81700 mds0.5  waiting for osdmap 104 (which
> blacklists prior instance)
> 2011-05-24 00:17:53.275016 7f453ef81700 mds0.cache handle_mds_failure mds0 :
> recovery peers are
> 2011-05-24 00:17:53.276145 7f453ef81700 mds0.5 ms_handle_connect on
> 77.120.112.201:6800/29765
> 2011-05-24 00:17:53.276223 7f453ef81700 mds0.5 ms_handle_connect on
> 82.144.220.71:6800/5210
> 2011-05-24 00:17:53.276785 7f453ef81700 mds0.5 ms_handle_connect on
> 82.144.220.72:6800/3960
> 2011-05-24 00:17:53.301249 7f453ef81700 mds0.5 ms_handle_connect on
> 82.144.220.70:6800/25341
> 2011-05-24 00:17:53.307286 7f453ef81700 mds0.cache creating system inode with
> ino:100
> 2011-05-24 00:17:53.307441 7f453ef81700 mds0.cache creating system inode with
> ino:1
> 2011-05-24 00:17:53.308273 7f453ef81700 mds0.5 ms_handle_connect on
> 77.120.112.200:6800/9187
> 2011-05-24 00:17:54.506400 7f4537fff700 mds0.5 replay_done
> 2011-05-24 00:17:54.506431 7f4537fff700 mds0.5 making mds journal writeable
> 2011-05-24 00:17:54.511104 7f453ef81700 mds0.5 handle_mds_map i am now mds0.5
> 2011-05-24 00:17:54.511127 7f453ef81700 mds0.5 handle_mds_map state change
> up:replay --> up:reconnect
> 2011-05-24 00:17:54.511138 7f453ef81700 mds0.5 reconnect_start
> 2011-05-24 00:17:54.511144 7f453ef81700 mds0.5 reopen_log
> 2011-05-24 00:17:54.511163 7f453ef81700 mds0.server reconnect_clients -- 1
> sessions
> 2011-05-24 00:17:54.511832 7f453c472700 -- 77.120.112.193:6800/4424 >>
> 77.120.112.209:0/3638704563 pipe(0x10f8370 sd=11 pgs=0 cs=0 l=0).accept peer
> addr is really 77.120.112.209:0/3638704563 (socket is 77.120.112.209:38599/0)
> 2011-05-24 00:17:54.512859 7f453ef81700 log [DBG] : reconnect by client4404
> 77.120.112.209:0/3638704563 after 0.001651
> 2011-05-24 00:17:54.513057 7f453ef81700 mds0.server missing 1000000860a
> #10000008019/vtapes/drive0/data (mine), will load later
> 2011-05-24 00:17:54.513091 7f453ef81700 mds0.5 reconnect_done
> 2011-05-24 00:17:54.515176 7f453ef81700 mds0.5 handle_mds_map i am now mds0.5
> 2011-05-24 00:17:54.515193 7f453ef81700 mds0.5 handle_mds_map state change
> up:reconnect --> up:rejoin
> 2011-05-24 00:17:54.515201 7f453ef81700 mds0.5 rejoin_joint_start
> 2011-05-24 00:17:54.522602 7f453ef81700 mds0.5 rejoin_done
> 2011-05-24 00:17:54.528794 7f453ef81700 mds0.5 handle_mds_map i am now mds0.5
> 2011-05-24 00:17:54.528812 7f453ef81700 mds0.5 handle_mds_map state change
> up:rejoin --> up:active
> 2011-05-24 00:17:54.528819 7f453ef81700 mds0.5 recovery_done -- successful
> recovery!
> 2011-05-24 00:17:54.529315 7f453ef81700 mds0.5 active_start
> 2011-05-24 00:17:54.531405 7f453ef81700 mds0.5 cluster recovered.
> *** Caught signal (Segmentation fault) **
>  in thread 0x7f453ef81700
>  ceph version 0.28 (commit:071881d7e5599571e46bda17094bb4b48691e89a)
>  1: /usr/bin/cmds() [0x712c5e]
>  2: (()+0xfc60) [0x7f45411c0c60]
>  3: (MDCache::get_or_create_stray_dentry(CInode*)+0x25) [0x5356f5]
>  4: (Server::handle_client_unlink(MDRequest*)+0x997) [0x508857]
>  5: (Server::handle_client_request(MClientRequest*)+0x522) [0x520852]
>  6: (MDS::handle_deferrable_message(Message*)+0x9af) [0x4a266f]
>  7: (MDS::_dispatch(Message*)+0x173e) [0x4b617e]
>  8: (MDS::_dispatch(Message*)+0x427) [0x4b4e67]
>  9: (MDS::ms_dispatch(Message*)+0x59) [0x4b66c9]
>  10: (SimpleMessenger::dispatch_entry()+0x7ea) [0x4838aa]
>  11: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x47b26c]
>  12: (()+0x6d8c) [0x7f45411b7d8c]
>  13: (clone()+0x6d) [0x7f454006a04d]
> 
> WBR,
>     Fyodor.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MDS crash
  2011-05-23 22:27 ` Sage Weil
@ 2011-05-23 22:45   ` Fyodor Ustinov
  2011-05-23 23:08     ` Sage Weil
  2011-05-24  0:32   ` Fyodor Ustinov
  1 sibling, 1 reply; 16+ messages in thread
From: Fyodor Ustinov @ 2011-05-23 22:45 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 05/24/2011 01:27 AM, Sage Weil wrote:
> Hi Fyodor,
>
> This looks like #1104.  Will try to sort it out today, it should be a
> simple one.
>
> sage
May be need my "debug mds = 20, debug ms = 1" log?
But it is zipped - 26M. Need?

WBR,
     Fyodor.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MDS crash
  2011-05-23 22:45   ` Fyodor Ustinov
@ 2011-05-23 23:08     ` Sage Weil
  2011-05-23 23:52       ` Fyodor Ustinov
  0 siblings, 1 reply; 16+ messages in thread
From: Sage Weil @ 2011-05-23 23:08 UTC (permalink / raw)
  To: Fyodor Ustinov; +Cc: ceph-devel

On Tue, 24 May 2011, Fyodor Ustinov wrote:
> On 05/24/2011 01:27 AM, Sage Weil wrote:
> > Hi Fyodor,
> > 
> > This looks like #1104.  Will try to sort it out today, it should be a
> > simple one.
> > 
> > sage
> May be need my "debug mds = 20, debug ms = 1" log?
> But it is zipped - 26M. Need?

If you can attach it to that bug, that'd be great.  Also, do you have a 
core file?  It's not clear from the other log which bad pointer is being 
dereferenced.

Thanks!
sage

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MDS crash
  2011-05-23 23:08     ` Sage Weil
@ 2011-05-23 23:52       ` Fyodor Ustinov
  0 siblings, 0 replies; 16+ messages in thread
From: Fyodor Ustinov @ 2011-05-23 23:52 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 05/24/2011 02:08 AM, Sage Weil wrote:
> On Tue, 24 May 2011, Fyodor Ustinov wrote:
>> On 05/24/2011 01:27 AM, Sage Weil wrote:
>>> Hi Fyodor,
>>>
>>> This looks like #1104.  Will try to sort it out today, it should be a
>>> simple one.
>>>
>>> sage
>> May be need my "debug mds = 20, debug ms = 1" log?
>> But it is zipped - 26M. Need?
> If you can attach it to that bug, that'd be great.  Also, do you have a
> core file?  It's not clear from the other log which bad pointer is being
> dereferenced.
I can not attach files to this issue:
---
internal error

An error occurred on the page you were trying to access.
If you continue to experience problems please contact your redMine 
administrator for assistance.

-----

Please, use these links:

http://blog.ufm.su/core.zip
http://blog.ufm.su/mds.zip

WBR,
     Fyodor.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MDS crash
  2011-05-23 22:27 ` Sage Weil
  2011-05-23 22:45   ` Fyodor Ustinov
@ 2011-05-24  0:32   ` Fyodor Ustinov
  1 sibling, 0 replies; 16+ messages in thread
From: Fyodor Ustinov @ 2011-05-24  0:32 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi!

Sage, you receive my email about "bt" ?

WBR,
     Fyodor.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MDS crash
  2011-05-23 21:21 MDS crash Fyodor Ustinov
  2011-05-23 22:27 ` Sage Weil
@ 2011-05-24 23:54 ` Sage Weil
  1 sibling, 0 replies; 16+ messages in thread
From: Sage Weil @ 2011-05-24 23:54 UTC (permalink / raw)
  To: Fyodor Ustinov; +Cc: ceph-devel

On Tue, 24 May 2011, Fyodor Ustinov wrote:
> *** Caught signal (Segmentation fault) **
>  in thread 0x7f453ef81700
>  ceph version 0.28 (commit:071881d7e5599571e46bda17094bb4b48691e89a)
>  1: /usr/bin/cmds() [0x712c5e]
>  2: (()+0xfc60) [0x7f45411c0c60]
>  3: (MDCache::get_or_create_stray_dentry(CInode*)+0x25) [0x5356f5]
>  4: (Server::handle_client_unlink(MDRequest*)+0x997) [0x508857]
>  5: (Server::handle_client_request(MClientRequest*)+0x522) [0x520852]
>  6: (MDS::handle_deferrable_message(Message*)+0x9af) [0x4a266f]
>  7: (MDS::_dispatch(Message*)+0x173e) [0x4b617e]
>  8: (MDS::_dispatch(Message*)+0x427) [0x4b4e67]
>  9: (MDS::ms_dispatch(Message*)+0x59) [0x4b66c9]
>  10: (SimpleMessenger::dispatch_entry()+0x7ea) [0x4838aa]
>  11: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x47b26c]
>  12: (()+0x6d8c) [0x7f45411b7d8c]
>  13: (clone()+0x6d) [0x7f454006a04d]

In case anyone else was seeing this problem, it's now fixed in the
stable branch.

sage

^ permalink raw reply	[flat|nested] 16+ messages in thread

* MDS crash
@ 2011-10-28 22:57 Noah Watkins
  0 siblings, 0 replies; 16+ messages in thread
From: Noah Watkins @ 2011-10-28 22:57 UTC (permalink / raw)
  To: ceph-devel

This is a trace of an MDS crash. I was running a simple setup (./vstart -d -n), and this is from out/mds.b

This is from the latest wip-getdir branch. I posted some context preceding the crash. I have the full trace if more context is helpful.

-Noah

================================

2011-10-28 15:50:00.251876 7f2f3102b700 mds.1.cache.dir(100000003f6) pop_and_dirty_projected_fnode 0x13ab180 v55
2011-10-28 15:50:00.251902 7f2f3102b700 mds.1.cache.dir(100000003f6) mark_dirty (already dirty) [dir 100000003f6 /tmp/hadoop-nwatkins/mapred/staging/nwatkins/.staging/ [2,head] auth{0=1} pv=55 v=55 cv=0/0 ap=1+2+2 state=1610612738|complete f(v0 m2011-10-28 15:50:00.116185 3=0+3)->f(v0 m2011-10-28 15:50:00.116185 3=0+3) n(v5 rc2011-10-28 15:50:00.116185 b284930 5=2+3)->n(v5 rc2011-10-28 15:50:00.116185 b284930 5=2+3) hs=3+1,ss=0+0 dirty=4 | child replicated dirty authpin 0x12b6770] version 55
2011-10-28 15:50:00.251909 7f2f3102b700 mds.1.cache.dir(100000003f5) pop_and_dirty_projected_fnode 0x13abb40 v52
2011-10-28 15:50:00.251936 7f2f3102b700 mds.1.cache.dir(100000003f5) mark_dirty (already dirty) [dir 100000003f5 /tmp/hadoop-nwatkins/mapred/staging/nwatkins/ [2,head] auth{0=1} pv=52 v=52 cv=0/0 ap=1+1+2 state=1610612738|complete f(v0 m2011-10-28 15:39:07.835948 1=0+1)->f(v0 m2011-10-28 15:39:07.835948 1=0+1) n(v9 rc2011-10-28 15:50:00.116185 b284930 6=2+4)/n(v9 rc2011-10-28 15:46:30.070103 b284930 5=2+3)->n(v9 rc2011-10-28 15:50:00.116185 b284930 6=2+4)/n(v9 rc2011-10-28 15:46:30.070103 b284930 5=2+3) hs=1+0,ss=0+0 dirty=1 | child replicated dirty authpin 0x12b6378] version 52
2011-10-28 15:50:00.251957 7f2f3102b700 mds.1.cache send_dentry_link [dentry #1/tmp/hadoop-nwatkins/mapred/staging/nwatkins/.staging/job_201110281545_0003 [2,head] auth (dn xlock x=1 by 0x135bc00) (dversion lock w=1 last_client=4242) v=54 ap=2+0 inode=0x1311b60 | request lock inodepin dirty authpin 0x1345d80]
2011-10-28 15:50:00.251980 7f2f3102b700 mds.1.server reply_request 0 (Success) client_request(client.4242:11 mkdir #100000003f6/job_201110281545_0003) v1
2011-10-28 15:50:00.251990 7f2f3102b700 mds.1.server apply_allocated_inos 20000000004 / [20000000005~3e8] / 0
2011-10-28 15:50:00.252002 7f2f3102b700 mds.1.inotable: apply_alloc_id 20000000004 to [200000003ed~2fffffffc12]/[200000003ec~2fffffffc13]
./include/interval_set.h: In function 'void interval_set<T>::erase(T, T) [with T = inodeno_t]', in thread '7f2f3102b700'
./include/interval_set.h: 385: FAILED assert(p->first <= start)
 ceph version 0.37-192-g1a4eec2 (commit:1a4eec20a345ced993a48012aaaa8d8ca344a1ba)
 1: (InoTable::apply_alloc_id(inodeno_t)+0x441) [0x647041]
 2: (Server::apply_allocated_inos(MDRequest*)+0x4dd) [0x509f3d]
 3: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x83) [0x50a283]
 4: (C_MDS_mknod_finish::finish(int)+0xfe) [0x53686e]
 5: (Context::complete(int)+0xa) [0x4a4d7a]
 6: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xc8) [0x4c3568]
 7: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x18f) [0x69dd9f]
 8: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xc57) [0x686c47]
 9: (MDS::handle_core_message(Message*)+0x987) [0x4bedf7]
 10: (MDS::_dispatch(Message*)+0x2f) [0x4bef8f]
 11: (MDS::ms_dispatch(Message*)+0x70) [0x4c06f0]
 12: (SimpleMessenger::dispatch_entry()+0x833) [0x6edd13]
 13: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49ed7c]
 14: (()+0x7efc) [0x7f2f348f0efc]
 15: (clone()+0x6d) [0x7f2f3332a89d]
 ceph version 0.37-192-g1a4eec2 (commit:1a4eec20a345ced993a48012aaaa8d8ca344a1ba)
 1: (InoTable::apply_alloc_id(inodeno_t)+0x441) [0x647041]
 2: (Server::apply_allocated_inos(MDRequest*)+0x4dd) [0x509f3d]
 3: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x83) [0x50a283]
 4: (C_MDS_mknod_finish::finish(int)+0xfe) [0x53686e]
 5: (Context::complete(int)+0xa) [0x4a4d7a]
 6: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xc8) [0x4c3568]
 7: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x18f) [0x69dd9f]
 8: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xc57) [0x686c47]
 9: (MDS::handle_core_message(Message*)+0x987) [0x4bedf7]
 10: (MDS::_dispatch(Message*)+0x2f) [0x4bef8f]
 11: (MDS::ms_dispatch(Message*)+0x70) [0x4c06f0]
 12: (SimpleMessenger::dispatch_entry()+0x833) [0x6edd13]
 13: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49ed7c]
 14: (()+0x7efc) [0x7f2f348f0efc]
 15: (clone()+0x6d) [0x7f2f3332a89d]
*** Caught signal (Aborted) **
 in thread 7f2f3102b700
 ceph version 0.37-192-g1a4eec2 (commit:1a4eec20a345ced993a48012aaaa8d8ca344a1ba)
 1: ./ceph-mds() [0x777fb6]
 2: (()+0x10060) [0x7f2f348f9060]
 3: (gsignal()+0x35) [0x7f2f3327f3a5]
 4: (abort()+0x17b) [0x7f2f33282b0b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f2f33b3dd7d]
 6: (()+0xb9f26) [0x7f2f33b3bf26]
 7: (()+0xb9f53) [0x7f2f33b3bf53]
 8: (()+0xba04e) [0x7f2f33b3c04e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x193) [0x6fedf3]
 10: (InoTable::apply_alloc_id(inodeno_t)+0x441) [0x647041]
 11: (Server::apply_allocated_inos(MDRequest*)+0x4dd) [0x509f3d]
 12: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x83) [0x50a283]
 13: (C_MDS_mknod_finish::finish(int)+0xfe) [0x53686e]
 14: (Context::complete(int)+0xa) [0x4a4d7a]
 15: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xc8) [0x4c3568]
 16: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x18f) [0x69dd9f]
 17: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xc57) [0x686c47]
 18: (MDS::handle_core_message(Message*)+0x987) [0x4bedf7]
 19: (MDS::_dispatch(Message*)+0x2f) [0x4bef8f]
 20: (MDS::ms_dispatch(Message*)+0x70) [0x4c06f0]
 21: (SimpleMessenger::dispatch_entry()+0x833) [0x6edd13]
 22: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49ed7c]
 23: (()+0x7efc) [0x7f2f348f0efc]
 24: (clone()+0x6d) [0x7f2f3332a89d]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MDS crash.
  2011-07-02 21:30 Fyodor Ustinov
  2011-07-02 22:03 ` Sage Weil
@ 2011-07-05 16:03 ` Sage Weil
  1 sibling, 0 replies; 16+ messages in thread
From: Sage Weil @ 2011-07-05 16:03 UTC (permalink / raw)
  To: Fyodor Ustinov; +Cc: ceph-devel

On Sun, 3 Jul 2011, Fyodor Ustinov wrote:

> Hi!
> 
> mds - 0.30
> 
> I can not to reproduce, sorry.
> 
> mds/Locker.cc: In function 'void Locker::file_excl(ScatterLock*, bool*)', in
> thread '0x7fefc6c68700'
> mds/Locker.cc: 3982: FAILED assert(in->get_loner() >= 0 &&
> in->mds_caps_wanted.empty())
>  ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
>  1: (Locker::file_excl(ScatterLock*, bool*)+0x944) [0x5baa04]
>  2: (Locker::simple_sync(SimpleLock*, bool*)+0x211) [0x5bea41]
>  3: (Locker::file_eval(ScatterLock*, bool*)+0x2c7) [0x5bf537]
> [...]

Thanks, pushed a fix for this to the stable branch.  It'll be included in 
v0.31.

sage

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MDS crash.
  2011-07-02 22:03 ` Sage Weil
@ 2011-07-02 22:16   ` Fyodor Ustinov
  0 siblings, 0 replies; 16+ messages in thread
From: Fyodor Ustinov @ 2011-07-02 22:16 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 07/03/2011 01:03 AM, Sage Weil wrote:
> Which commit were you running?
On mds server from here:

deb http://ceph.newdream.net/debian/ natty main
deb-src http://ceph.newdream.net/debian/ natty main


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MDS crash.
  2011-07-02 21:30 Fyodor Ustinov
@ 2011-07-02 22:03 ` Sage Weil
  2011-07-02 22:16   ` Fyodor Ustinov
  2011-07-05 16:03 ` Sage Weil
  1 sibling, 1 reply; 16+ messages in thread
From: Sage Weil @ 2011-07-02 22:03 UTC (permalink / raw)
  To: Fyodor Ustinov; +Cc: ceph-devel

Which commit were you running?

sage

On Sun, 3 Jul 2011, Fyodor Ustinov wrote:

> Hi!
> 
> mds - 0.30
> 
> I can not to reproduce, sorry.
> 
> mds/Locker.cc: In function 'void Locker::file_excl(ScatterLock*, bool*)', in
> thread '0x7fefc6c68700'
> mds/Locker.cc: 3982: FAILED assert(in->get_loner() >= 0 &&
> in->mds_caps_wanted.empty())
>  ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
>  1: (Locker::file_excl(ScatterLock*, bool*)+0x944) [0x5baa04]
>  2: (Locker::simple_sync(SimpleLock*, bool*)+0x211) [0x5bea41]
>  3: (Locker::file_eval(ScatterLock*, bool*)+0x2c7) [0x5bf537]
>  4: (Locker::rdlock_finish(SimpleLock*, Mutation*, bool*)+0x174) [0x5c7ec4]
>  5: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>,
> std::allocator<CInode*> >*)+0x148) [0x5c8488]
>  6: (MDCache::request_cleanup(MDRequest*)+0x8a) [0x54247a]
>  7: (MDCache::request_finish(MDRequest*)+0xf3) [0x542b23]
>  8: (Server::reply_request(MDRequest*, MClientReply*, CInode*,
> CDentry*)+0x193) [0x4e2633]
>  9: (Server::handle_client_stat(MDRequest*)+0x344) [0x4ebbe4]
>  10: (Server::dispatch_client_request(MDRequest*)+0x3d6) [0x50b4a6]
>  11: (MDCache::dispatch_request(MDRequest*)+0x46) [0x52b056]
>  12: (C_MDS_RetryRequest::finish(int)+0x11) [0x5180e1]
>  13: /usr/bin/cmds() [0x5b06d2]
>  14: /usr/bin/cmds() [0x5b0817]
>  15: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*,
> std::allocator<Context*> >*)+0x175f) [0x5c517f]
>  16: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x378) [0x5c7998]
>  17: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>,
> std::allocator<CInode*> >*)+0x208) [0x5c8548]
>  18: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t,
> Capability*, MClientCaps*)+0x2c6) [0x5cc6c6]
>  19: (C_Locker_FileUpdate_finish::finish(int)+0x2c) [0x5d9b1c]
>  20: (finish_contexts(std::list<Context*, std::allocator<Context*> >&,
> int)+0xc4) [0x5943b4]
>  21: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20b) [0x6ae8ab]
>  22: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xa6c) [0x692b6c]
>  23: (MDS::handle_core_message(Message*)+0x82f) [0x4a3e6f]
>  24: (MDS::_dispatch(Message*)+0x2a9) [0x4a4189]
>  25: (MDS::ms_dispatch(Message*)+0x6d) [0x4a5d7d]
>  26: (SimpleMessenger::dispatch_entry()+0x8f3) [0x6e16d3]
>  27: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4829bc]
>  28: (()+0x6d8c) [0x7fefc969fd8c]
>  29: (clone()+0x6d) [0x7fefc855204d]
>  ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
>  1: (Locker::file_excl(ScatterLock*, bool*)+0x944) [0x5baa04]
>  2: (Locker::simple_sync(SimpleLock*, bool*)+0x211) [0x5bea41]
>  3: (Locker::file_eval(ScatterLock*, bool*)+0x2c7) [0x5bf537]
>  4: (Locker::rdlock_finish(SimpleLock*, Mutation*, bool*)+0x174) [0x5c7ec4]
>  5: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>,
> std::allocator<CInode*> >*)+0x148) [0x5c8488]
>  6: (MDCache::request_cleanup(MDRequest*)+0x8a) [0x54247a]
>  7: (MDCache::request_finish(MDRequest*)+0xf3) [0x542b23]
>  8: (Server::reply_request(MDRequest*, MClientReply*, CInode*,
> CDentry*)+0x193) [0x4e2633]
>  9: (Server::handle_client_stat(MDRequest*)+0x344) [0x4ebbe4]
>  10: (Server::dispatch_client_request(MDRequest*)+0x3d6) [0x50b4a6]
>  11: (MDCache::dispatch_request(MDRequest*)+0x46) [0x52b056]
>  12: (C_MDS_RetryRequest::finish(int)+0x11) [0x5180e1]
>  13: /usr/bin/cmds() [0x5b06d2]
>  14: /usr/bin/cmds() [0x5b0817]
>  15: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*,
> std::allocator<Context*> >*)+0x175f) [0x5c517f]
>  16: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x378) [0x5c7998]
>  17: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>,
> std::allocator<CInode*> >*)+0x208) [0x5c8548]
>  18: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t,
> Capability*, MClientCaps*)+0x2c6) [0x5cc6c6]
>  19: (C_Locker_FileUpdate_finish::finish(int)+0x2c) [0x5d9b1c]
>  20: (finish_contexts(std::list<Context*, std::allocator<Context*> >&,
> int)+0xc4) [0x5943b4]
>  21: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20b) [0x6ae8ab]
>  22: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xa6c) [0x692b6c]
>  23: (MDS::handle_core_message(Message*)+0x82f) [0x4a3e6f]
>  24: (MDS::_dispatch(Message*)+0x2a9) [0x4a4189]
>  25: (MDS::ms_dispatch(Message*)+0x6d) [0x4a5d7d]
>  26: (SimpleMessenger::dispatch_entry()+0x8f3) [0x6e16d3]
>  27: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4829bc]
>  28: (()+0x6d8c) [0x7fefc969fd8c]
>  29: (clone()+0x6d) [0x7fefc855204d]
> *** Caught signal (Aborted) **
>  in thread 0x7fefc6c68700
>  ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
>  1: /usr/bin/cmds() [0x70495e]
>  2: (()+0xfc60) [0x7fefc96a8c60]
>  3: (gsignal()+0x35) [0x7fefc849fd05]
>  4: (abort()+0x186) [0x7fefc84a3ab6]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fefc8d566dd]
>  6: (()+0xb9926) [0x7fefc8d54926]
>  7: (()+0xb9953) [0x7fefc8d54953]
>  8: (()+0xb9a5e) [0x7fefc8d54a5e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x371) [0x6cffa1]
>  10: (Locker::file_excl(ScatterLock*, bool*)+0x944) [0x5baa04]
>  11: (Locker::simple_sync(SimpleLock*, bool*)+0x211) [0x5bea41]
>  12: (Locker::file_eval(ScatterLock*, bool*)+0x2c7) [0x5bf537]
>  13: (Locker::rdlock_finish(SimpleLock*, Mutation*, bool*)+0x174) [0x5c7ec4]
>  14: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>,
> std::allocator<CInode*> >*)+0x148) [0x5c8488]
>  15: (MDCache::request_cleanup(MDRequest*)+0x8a) [0x54247a]
>  16: (MDCache::request_finish(MDRequest*)+0xf3) [0x542b23]
>  17: (Server::reply_request(MDRequest*, MClientReply*, CInode*,
> CDentry*)+0x193) [0x4e2633]
>  18: (Server::handle_client_stat(MDRequest*)+0x344) [0x4ebbe4]
>  19: (Server::dispatch_client_request(MDRequest*)+0x3d6) [0x50b4a6]
>  20: (MDCache::dispatch_request(MDRequest*)+0x46) [0x52b056]
>  21: (C_MDS_RetryRequest::finish(int)+0x11) [0x5180e1]
>  22: /usr/bin/cmds() [0x5b06d2]
>  23: /usr/bin/cmds() [0x5b0817]
>  24: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*,
> std::allocator<Context*> >*)+0x175f) [0x5c517f]
>  25: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x378) [0x5c7998]
>  26: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>,
> std::allocator<CInode*> >*)+0x208) [0x5c8548]
>  27: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t,
> Capability*, MClientCaps*)+0x2c6) [0x5cc6c6]
>  28: (C_Locker_FileUpdate_finish::finish(int)+0x2c) [0x5d9b1c]
>  29: (finish_contexts(std::list<Context*, std::allocator<Context*> >&,
> int)+0xc4) [0x5943b4]
>  30: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20b) [0x6ae8ab]
>  31: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xa6c) [0x692b6c]
>  32: (MDS::handle_core_message(Message*)+0x82f) [0x4a3e6f]
>  33: (MDS::_dispatch(Message*)+0x2a9) [0x4a4189]
>  34: (MDS::ms_dispatch(Message*)+0x6d) [0x4a5d7d]
>  35: (SimpleMessenger::dispatch_entry()+0x8f3) [0x6e16d3]
>  36: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4829bc]
>  37: (()+0x6d8c) [0x7fefc969fd8c]
>  38: (clone()+0x6d) [0x7fefc855204d]
> 
> core bt:
> (gdb) bt
> #0  0x00007fefc96a8b3b in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
> #1  0x0000000000703a53 in ?? ()
> #2  0x0000000000704b7b in ?? ()
> #3 <signal handler called>
> #4  0x00007fefc849fd05 in raise (sig=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #5  0x00007fefc84a3ab6 in abort () at abort.c:92
> #6  0x00007fefc8d566dd in __gnu_cxx::__verbose_terminate_handler() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #7  0x00007fefc8d54926 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #8  0x00007fefc8d54953 in std::terminate() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #9  0x00007fefc8d54a5e in __cxa_throw () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #10 0x00000000006cffa1 in ceph::__ceph_assert_fail(char const*, char const*,
> int, char const*) ()
> #11 0x00000000005baa04 in Locker::file_excl(ScatterLock*, bool*) ()
> #12 0x00000000005bea41 in Locker::simple_sync(SimpleLock*, bool*) ()
> #13 0x00000000005bf537 in Locker::file_eval(ScatterLock*, bool*) ()
> #14 0x00000000005c7ec4 in Locker::rdlock_finish(SimpleLock*, Mutation*, bool*)
> ()
> #15 0x00000000005c8488 in Locker::drop_locks(Mutation*, std::set<CInode*,
> std::less<CInode*>, std::allocator<CInode*> >*) ()
> #16 0x000000000054247a in MDCache::request_cleanup(MDRequest*) ()
> #17 0x0000000000542b23 in MDCache::request_finish(MDRequest*) ()
> #18 0x00000000004e2633 in Server::reply_request(MDRequest*, MClientReply*,
> CInode*, CDentry*) ()
> #19 0x00000000004ebbe4 in Server::handle_client_stat(MDRequest*) ()
> #20 0x000000000050b4a6 in Server::dispatch_client_request(MDRequest*) ()
> #21 0x000000000052b056 in MDCache::dispatch_request(MDRequest*) ()
> #22 0x00000000005180e1 in C_MDS_RetryRequest::finish(int) ()
> #23 0x00000000005b06d2 in ?? ()
> #24 0x00000000005b0817 in ?? ()
> #25 0x00000000005c517f in Locker::eval_gather(SimpleLock*, bool, bool*,
> std::list<Context*, std::allocator<Context*> >*) ()
> #26 0x00000000005c7998 in Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)
> ()
> #27 0x00000000005c8548 in Locker::drop_locks(Mutation*, std::set<CInode*,
> std::less<CInode*>, std::allocator<CInode*> >*) ()
> #28 0x00000000005cc6c6 in Locker::file_update_finish(CInode*, Mutation*, bool,
> client_t, Capability*, MClientCaps*) ()
> #29 0x00000000005d9b1c in C_Locker_FileUpdate_finish::finish(int) ()
> #30 0x00000000005943b4 in finish_contexts(std::list<Context*,
> std::allocator<Context*> >&, int) ()
> #31 0x00000000006ae8ab in Journaler::_finish_flush(int, unsigned long,
> utime_t) ()
> #32 0x0000000000692b6c in Objecter::handle_osd_op_reply(MOSDOpReply*) ()
> #33 0x00000000004a3e6f in MDS::handle_core_message(Message*) ()
> #34 0x00000000004a4189 in MDS::_dispatch(Message*) ()
> #35 0x00000000004a5d7d in MDS::ms_dispatch(Message*) ()
> #36 0x00000000006e16d3 in SimpleMessenger::dispatch_entry() ()
> #37 0x00000000004829bc in SimpleMessenger::DispatchThread::entry() ()
> #38 0x00007fefc969fd8c in start_thread (arg=0x7fefc6c68700) at
> pthread_create.c:304
> #39 0x00007fefc855204d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #40 0x0000000000000000 in ?? ()
> 
> WBR,
>     Fyodor.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* MDS crash.
@ 2011-07-02 21:30 Fyodor Ustinov
  2011-07-02 22:03 ` Sage Weil
  2011-07-05 16:03 ` Sage Weil
  0 siblings, 2 replies; 16+ messages in thread
From: Fyodor Ustinov @ 2011-07-02 21:30 UTC (permalink / raw)
  To: ceph-devel

Hi!

mds - 0.30

I can not to reproduce, sorry.

mds/Locker.cc: In function 'void Locker::file_excl(ScatterLock*, 
bool*)', in thread '0x7fefc6c68700'
mds/Locker.cc: 3982: FAILED assert(in->get_loner() >= 0 && 
in->mds_caps_wanted.empty())
  ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
  1: (Locker::file_excl(ScatterLock*, bool*)+0x944) [0x5baa04]
  2: (Locker::simple_sync(SimpleLock*, bool*)+0x211) [0x5bea41]
  3: (Locker::file_eval(ScatterLock*, bool*)+0x2c7) [0x5bf537]
  4: (Locker::rdlock_finish(SimpleLock*, Mutation*, bool*)+0x174) [0x5c7ec4]
  5: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x148) [0x5c8488]
  6: (MDCache::request_cleanup(MDRequest*)+0x8a) [0x54247a]
  7: (MDCache::request_finish(MDRequest*)+0xf3) [0x542b23]
  8: (Server::reply_request(MDRequest*, MClientReply*, CInode*, 
CDentry*)+0x193) [0x4e2633]
  9: (Server::handle_client_stat(MDRequest*)+0x344) [0x4ebbe4]
  10: (Server::dispatch_client_request(MDRequest*)+0x3d6) [0x50b4a6]
  11: (MDCache::dispatch_request(MDRequest*)+0x46) [0x52b056]
  12: (C_MDS_RetryRequest::finish(int)+0x11) [0x5180e1]
  13: /usr/bin/cmds() [0x5b06d2]
  14: /usr/bin/cmds() [0x5b0817]
  15: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*, 
std::allocator<Context*> >*)+0x175f) [0x5c517f]
  16: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x378) 
[0x5c7998]
  17: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x208) [0x5c8548]
  18: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, 
Capability*, MClientCaps*)+0x2c6) [0x5cc6c6]
  19: (C_Locker_FileUpdate_finish::finish(int)+0x2c) [0x5d9b1c]
  20: (finish_contexts(std::list<Context*, std::allocator<Context*> >&, 
int)+0xc4) [0x5943b4]
  21: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20b) 
[0x6ae8ab]
  22: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xa6c) [0x692b6c]
  23: (MDS::handle_core_message(Message*)+0x82f) [0x4a3e6f]
  24: (MDS::_dispatch(Message*)+0x2a9) [0x4a4189]
  25: (MDS::ms_dispatch(Message*)+0x6d) [0x4a5d7d]
  26: (SimpleMessenger::dispatch_entry()+0x8f3) [0x6e16d3]
  27: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4829bc]
  28: (()+0x6d8c) [0x7fefc969fd8c]
  29: (clone()+0x6d) [0x7fefc855204d]
  ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
  1: (Locker::file_excl(ScatterLock*, bool*)+0x944) [0x5baa04]
  2: (Locker::simple_sync(SimpleLock*, bool*)+0x211) [0x5bea41]
  3: (Locker::file_eval(ScatterLock*, bool*)+0x2c7) [0x5bf537]
  4: (Locker::rdlock_finish(SimpleLock*, Mutation*, bool*)+0x174) [0x5c7ec4]
  5: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x148) [0x5c8488]
  6: (MDCache::request_cleanup(MDRequest*)+0x8a) [0x54247a]
  7: (MDCache::request_finish(MDRequest*)+0xf3) [0x542b23]
  8: (Server::reply_request(MDRequest*, MClientReply*, CInode*, 
CDentry*)+0x193) [0x4e2633]
  9: (Server::handle_client_stat(MDRequest*)+0x344) [0x4ebbe4]
  10: (Server::dispatch_client_request(MDRequest*)+0x3d6) [0x50b4a6]
  11: (MDCache::dispatch_request(MDRequest*)+0x46) [0x52b056]
  12: (C_MDS_RetryRequest::finish(int)+0x11) [0x5180e1]
  13: /usr/bin/cmds() [0x5b06d2]
  14: /usr/bin/cmds() [0x5b0817]
  15: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*, 
std::allocator<Context*> >*)+0x175f) [0x5c517f]
  16: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x378) 
[0x5c7998]
  17: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x208) [0x5c8548]
  18: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, 
Capability*, MClientCaps*)+0x2c6) [0x5cc6c6]
  19: (C_Locker_FileUpdate_finish::finish(int)+0x2c) [0x5d9b1c]
  20: (finish_contexts(std::list<Context*, std::allocator<Context*> >&, 
int)+0xc4) [0x5943b4]
  21: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20b) 
[0x6ae8ab]
  22: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xa6c) [0x692b6c]
  23: (MDS::handle_core_message(Message*)+0x82f) [0x4a3e6f]
  24: (MDS::_dispatch(Message*)+0x2a9) [0x4a4189]
  25: (MDS::ms_dispatch(Message*)+0x6d) [0x4a5d7d]
  26: (SimpleMessenger::dispatch_entry()+0x8f3) [0x6e16d3]
  27: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4829bc]
  28: (()+0x6d8c) [0x7fefc969fd8c]
  29: (clone()+0x6d) [0x7fefc855204d]
*** Caught signal (Aborted) **
  in thread 0x7fefc6c68700
  ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
  1: /usr/bin/cmds() [0x70495e]
  2: (()+0xfc60) [0x7fefc96a8c60]
  3: (gsignal()+0x35) [0x7fefc849fd05]
  4: (abort()+0x186) [0x7fefc84a3ab6]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fefc8d566dd]
  6: (()+0xb9926) [0x7fefc8d54926]
  7: (()+0xb9953) [0x7fefc8d54953]
  8: (()+0xb9a5e) [0x7fefc8d54a5e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x371) [0x6cffa1]
  10: (Locker::file_excl(ScatterLock*, bool*)+0x944) [0x5baa04]
  11: (Locker::simple_sync(SimpleLock*, bool*)+0x211) [0x5bea41]
  12: (Locker::file_eval(ScatterLock*, bool*)+0x2c7) [0x5bf537]
  13: (Locker::rdlock_finish(SimpleLock*, Mutation*, bool*)+0x174) 
[0x5c7ec4]
  14: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x148) [0x5c8488]
  15: (MDCache::request_cleanup(MDRequest*)+0x8a) [0x54247a]
  16: (MDCache::request_finish(MDRequest*)+0xf3) [0x542b23]
  17: (Server::reply_request(MDRequest*, MClientReply*, CInode*, 
CDentry*)+0x193) [0x4e2633]
  18: (Server::handle_client_stat(MDRequest*)+0x344) [0x4ebbe4]
  19: (Server::dispatch_client_request(MDRequest*)+0x3d6) [0x50b4a6]
  20: (MDCache::dispatch_request(MDRequest*)+0x46) [0x52b056]
  21: (C_MDS_RetryRequest::finish(int)+0x11) [0x5180e1]
  22: /usr/bin/cmds() [0x5b06d2]
  23: /usr/bin/cmds() [0x5b0817]
  24: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*, 
std::allocator<Context*> >*)+0x175f) [0x5c517f]
  25: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x378) 
[0x5c7998]
  26: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x208) [0x5c8548]
  27: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, 
Capability*, MClientCaps*)+0x2c6) [0x5cc6c6]
  28: (C_Locker_FileUpdate_finish::finish(int)+0x2c) [0x5d9b1c]
  29: (finish_contexts(std::list<Context*, std::allocator<Context*> >&, 
int)+0xc4) [0x5943b4]
  30: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20b) 
[0x6ae8ab]
  31: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xa6c) [0x692b6c]
  32: (MDS::handle_core_message(Message*)+0x82f) [0x4a3e6f]
  33: (MDS::_dispatch(Message*)+0x2a9) [0x4a4189]
  34: (MDS::ms_dispatch(Message*)+0x6d) [0x4a5d7d]
  35: (SimpleMessenger::dispatch_entry()+0x8f3) [0x6e16d3]
  36: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4829bc]
  37: (()+0x6d8c) [0x7fefc969fd8c]
  38: (clone()+0x6d) [0x7fefc855204d]

core bt:
(gdb) bt
#0  0x00007fefc96a8b3b in raise (sig=<value optimized out>) at 
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1  0x0000000000703a53 in ?? ()
#2  0x0000000000704b7b in ?? ()
#3 <signal handler called>
#4  0x00007fefc849fd05 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#5  0x00007fefc84a3ab6 in abort () at abort.c:92
#6  0x00007fefc8d566dd in __gnu_cxx::__verbose_terminate_handler() () 
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007fefc8d54926 in ?? () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007fefc8d54953 in std::terminate() () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007fefc8d54a5e in __cxa_throw () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00000000006cffa1 in ceph::__ceph_assert_fail(char const*, char 
const*, int, char const*) ()
#11 0x00000000005baa04 in Locker::file_excl(ScatterLock*, bool*) ()
#12 0x00000000005bea41 in Locker::simple_sync(SimpleLock*, bool*) ()
#13 0x00000000005bf537 in Locker::file_eval(ScatterLock*, bool*) ()
#14 0x00000000005c7ec4 in Locker::rdlock_finish(SimpleLock*, Mutation*, 
bool*) ()
#15 0x00000000005c8488 in Locker::drop_locks(Mutation*, 
std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*) ()
#16 0x000000000054247a in MDCache::request_cleanup(MDRequest*) ()
#17 0x0000000000542b23 in MDCache::request_finish(MDRequest*) ()
#18 0x00000000004e2633 in Server::reply_request(MDRequest*, 
MClientReply*, CInode*, CDentry*) ()
#19 0x00000000004ebbe4 in Server::handle_client_stat(MDRequest*) ()
#20 0x000000000050b4a6 in Server::dispatch_client_request(MDRequest*) ()
#21 0x000000000052b056 in MDCache::dispatch_request(MDRequest*) ()
#22 0x00000000005180e1 in C_MDS_RetryRequest::finish(int) ()
#23 0x00000000005b06d2 in ?? ()
#24 0x00000000005b0817 in ?? ()
#25 0x00000000005c517f in Locker::eval_gather(SimpleLock*, bool, bool*, 
std::list<Context*, std::allocator<Context*> >*) ()
#26 0x00000000005c7998 in Locker::wrlock_finish(SimpleLock*, Mutation*, 
bool*) ()
#27 0x00000000005c8548 in Locker::drop_locks(Mutation*, 
std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*) ()
#28 0x00000000005cc6c6 in Locker::file_update_finish(CInode*, Mutation*, 
bool, client_t, Capability*, MClientCaps*) ()
#29 0x00000000005d9b1c in C_Locker_FileUpdate_finish::finish(int) ()
#30 0x00000000005943b4 in finish_contexts(std::list<Context*, 
std::allocator<Context*> >&, int) ()
#31 0x00000000006ae8ab in Journaler::_finish_flush(int, unsigned long, 
utime_t) ()
#32 0x0000000000692b6c in Objecter::handle_osd_op_reply(MOSDOpReply*) ()
#33 0x00000000004a3e6f in MDS::handle_core_message(Message*) ()
#34 0x00000000004a4189 in MDS::_dispatch(Message*) ()
#35 0x00000000004a5d7d in MDS::ms_dispatch(Message*) ()
#36 0x00000000006e16d3 in SimpleMessenger::dispatch_entry() ()
#37 0x00000000004829bc in SimpleMessenger::DispatchThread::entry() ()
#38 0x00007fefc969fd8c in start_thread (arg=0x7fefc6c68700) at 
pthread_create.c:304
#39 0x00007fefc855204d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#40 0x0000000000000000 in ?? ()

WBR,
     Fyodor.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: mds crash
  2011-04-20 14:00   ` Mark Nigh
@ 2011-04-21 19:08     ` Tommi Virtanen
  0 siblings, 0 replies; 16+ messages in thread
From: Tommi Virtanen @ 2011-04-21 19:08 UTC (permalink / raw)
  To: Mark Nigh; +Cc: Sage Weil, ceph-devel

On Wed, Apr 20, 2011 at 09:00:34AM -0500, Mark Nigh wrote:
> That seems to fix the problem. Both of my mds are up and active.
> 
> mds e1838: 2/2/2 up {0=up:active,1=up:active}
> 
> The only thing that doesn't seem right to me (I am not a developer
> so my understanding of git is limited) is that my version is of the
> following....
> 
> ceph version 0.26-303-g36f0068 (commit:36f00685633a6f953b046106f5dd31a9169c82d4)
> 
> I don't think this is correct?

I can confirm that your ceph build now includes the commit
d55399ffec224206ea324e83bb8ead1e9ca1eddc Sage asked you to test, so
all should be good.

> 1. What is the proper way to upgrade from version to version. Has
> anyone documentation the proper procedure?

We try pretty hard not to break anything, so you should be able just
grab new releases, either as tarballs or via git (for git, run
something like "git fetch && git checkout v0.26"), build them, and
restart your cluster. However, Ceph is still going through rapid
changes, read the release notes carefully.

> 2. If I just want to test out the file system capabilities, do I
> need to install all the packages or is there some I can do
> without. I have been currently installing the following packages
> (Ubuntu v10.10) in this order.
> 
> sudo dpkg -i libcrush1_0.26-1_amd64.deb
> sudo dpkg -i libceph1_0.26-1_amd64.deb
> sudo dpkg -i ceph-fuse_0.26-1_amd64.deb
> sudo dpkg -i librados2_0.26-1_amd64.deb
> sudo dpkg -i librbd1_0.26-1_amd64.deb
> sudo dpkg -i ceph_0.26-1_amd64.deb

The server side needs ceph.deb.

For in-kernel client side, you need a fresh enough kernel, and
mount.ceph in ceph-client-tools makes it nicer to use.

For FUSE client side, you need libfuse.

librados is a client library for the object store, you don't seem to
be using it.

librbd is a client library for things like QEMU/KVM, you don't seem to
be using it.

I don't see any of our packages actually needing libcrush or libceph
(I guess that means they bundle the library in the binary directly),
so you shouldn't need those either (at least right now; some day we
may wish to wrestle the Automake monster and change that).

-- 
:(){ :|:&};:

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: mds crash
  2011-04-19 16:17 ` Sage Weil
@ 2011-04-20 14:00   ` Mark Nigh
  2011-04-21 19:08     ` Tommi Virtanen
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Nigh @ 2011-04-20 14:00 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Sage,

Thanks for the quick response.

That seems to fix the problem. Both of my mds are up and active.

mds e1838: 2/2/2 up {0=up:active,1=up:active}

The only thing that doesn't seem right to me (I am not a developer so my understanding of git is limited) is that my version is of the following....

ceph version 0.26-303-g36f0068 (commit:36f00685633a6f953b046106f5dd31a9169c82d4)

I don't think this is correct?

I do have a few questions for the group as well.

1. What is the proper way to upgrade from version to version. Has anyone documentation the proper procedure?

2. If I just want to test out the file system capabilities, do I need to install all the packages or is there some I can do without. I have been currently installing the following packages (Ubuntu v10.10) in this order.

sudo dpkg -i libcrush1_0.26-1_amd64.deb
sudo dpkg -i libceph1_0.26-1_amd64.deb
sudo dpkg -i ceph-fuse_0.26-1_amd64.deb
sudo dpkg -i librados2_0.26-1_amd64.deb
sudo dpkg -i librbd1_0.26-1_amd64.deb
sudo dpkg -i ceph_0.26-1_amd64.deb

Thanks for everyone's help.

Mark Nigh
Systems Architect
mnigh@netelligent.com
 (p) 314.392.6926


-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net]
Sent: Tuesday, April 19, 2011 11:18 AM
To: Mark Nigh
Cc: ceph-devel@vger.kernel.org
Subject: Re: mds crash

Hi Mark,

This should be fixed by d55399ffec224206ea324e83bb8ead1e9ca1eddc in the
'next' branch of ceph.git.  Can you test it out and see if that allows
journal replay to complete?

Thanks!
sage

http://tracker.newdream.net/issues/1019



On Tue, 19 Apr 2011, Mark Nigh wrote:

> I recently have been working with exporting ceph to NFS. I have had stability problems with NFS (ceph is working but NFS crashes). But most recently, my mds0 will not start after one of these instances with NFS.
>
> My setup. 2 mds, 1 mon (located on mds0), 5 osds. All running Ubuntu v10.10.
>
> Here is the output when I try to start the mds0. Is there other debugging I can turn on?
>
> /etc/init.d/ceph start mds0
>
> 2011-04-19 10:06:58.602640 7fb202fe4700 mds0.11 ms_handle_connect on 10.6.1.93:6800/945
> ./include/elist.h: In function 'elist<T>::item::~item() [with T = MDSlaveUpdate*]', in thread '0x7fb2004d5700'
> ./include/elist.h: 39: FAILED assert(!is_on_list())
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
>  2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
>  3: (MDLog::_replay_thread()+0xb90) [0x67f850]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
>  5: (()+0x7971) [0x7fb20564a971]
>  6: (clone()+0x6d) [0x7fb2042e692d]
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
>  2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
>  3: (MDLog::_replay_thread()+0xb90) [0x67f850]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
>  5: (()+0x7971) [0x7fb20564a971]
>  6: (clone()+0x6d) [0x7fb2042e692d]
> *** Caught signal (Aborted) **
>  in thread 0x7fb2004d5700
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: /usr/bin/cmds() [0x70fc38]
>  2: (()+0xfb40) [0x7fb205652b40]
>  3: (gsignal()+0x35) [0x7fb204233ba5]
>  4: (abort()+0x180) [0x7fb2042376b0]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb204ad76bd]
>  6: (()+0xb9906) [0x7fb204ad5906]
>  7: (()+0xb9933) [0x7fb204ad5933]
>  8: (()+0xb9a3e) [0x7fb204ad5a3e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x36a) [0x6f5eaa]
>  10: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
>  11: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
>  12: (MDLog::_replay_thread()+0xb90) [0x67f850]
>  13: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
>  14: (()+0x7971) [0x7fb20564a971]
>  15: (clone()+0x6d) [0x7fb2042e692d]
>
> I am not sure why the IP address of 0.0.0.0 shows up with starting the mds0.
>
> root@mds0:/var/log/ceph# /etc/init.d/ceph start mds0
> === mds.0 ===
> Starting Ceph mds.0 on mds0...
>  ** WARNING: Ceph is still under heavy development, and is only suitable for **
>  **          testing and review.  Do not trust it with important data.       **
> starting mds.0 at 0.0.0.0:6800/2994
>
> Thanks for your assistance.
>
> Mark Nigh
> Systems Architect
> mnigh@netelligent.com
>  (p) 314.392.6926
>
>
>
>
> This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: mds crash
  2011-04-19 15:18 mds crash Mark Nigh
@ 2011-04-19 16:17 ` Sage Weil
  2011-04-20 14:00   ` Mark Nigh
  0 siblings, 1 reply; 16+ messages in thread
From: Sage Weil @ 2011-04-19 16:17 UTC (permalink / raw)
  To: Mark Nigh; +Cc: ceph-devel

Hi Mark,

This should be fixed by d55399ffec224206ea324e83bb8ead1e9ca1eddc in the 
'next' branch of ceph.git.  Can you test it out and see if that allows 
journal replay to complete?

Thanks!
sage

http://tracker.newdream.net/issues/1019



On Tue, 19 Apr 2011, Mark Nigh wrote:

> I recently have been working with exporting ceph to NFS. I have had stability problems with NFS (ceph is working but NFS crashes). But most recently, my mds0 will not start after one of these instances with NFS.
> 
> My setup. 2 mds, 1 mon (located on mds0), 5 osds. All running Ubuntu v10.10.
> 
> Here is the output when I try to start the mds0. Is there other debugging I can turn on?
> 
> /etc/init.d/ceph start mds0
> 
> 2011-04-19 10:06:58.602640 7fb202fe4700 mds0.11 ms_handle_connect on 10.6.1.93:6800/945
> ./include/elist.h: In function 'elist<T>::item::~item() [with T = MDSlaveUpdate*]', in thread '0x7fb2004d5700'
> ./include/elist.h: 39: FAILED assert(!is_on_list())
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
>  2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
>  3: (MDLog::_replay_thread()+0xb90) [0x67f850]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
>  5: (()+0x7971) [0x7fb20564a971]
>  6: (clone()+0x6d) [0x7fb2042e692d]
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
>  2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
>  3: (MDLog::_replay_thread()+0xb90) [0x67f850]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
>  5: (()+0x7971) [0x7fb20564a971]
>  6: (clone()+0x6d) [0x7fb2042e692d]
> *** Caught signal (Aborted) **
>  in thread 0x7fb2004d5700
>  ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>  1: /usr/bin/cmds() [0x70fc38]
>  2: (()+0xfb40) [0x7fb205652b40]
>  3: (gsignal()+0x35) [0x7fb204233ba5]
>  4: (abort()+0x180) [0x7fb2042376b0]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb204ad76bd]
>  6: (()+0xb9906) [0x7fb204ad5906]
>  7: (()+0xb9933) [0x7fb204ad5933]
>  8: (()+0xb9a3e) [0x7fb204ad5a3e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x36a) [0x6f5eaa]
>  10: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
>  11: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
>  12: (MDLog::_replay_thread()+0xb90) [0x67f850]
>  13: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
>  14: (()+0x7971) [0x7fb20564a971]
>  15: (clone()+0x6d) [0x7fb2042e692d]
> 
> I am not sure why the IP address of 0.0.0.0 shows up with starting the mds0.
> 
> root@mds0:/var/log/ceph# /etc/init.d/ceph start mds0
> === mds.0 ===
> Starting Ceph mds.0 on mds0...
>  ** WARNING: Ceph is still under heavy development, and is only suitable for **
>  **          testing and review.  Do not trust it with important data.       **
> starting mds.0 at 0.0.0.0:6800/2994
> 
> Thanks for your assistance.
> 
> Mark Nigh
> Systems Architect
> mnigh@netelligent.com
>  (p) 314.392.6926
> 
> 
> 
> 
> This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* mds crash
@ 2011-04-19 15:18 Mark Nigh
  2011-04-19 16:17 ` Sage Weil
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Nigh @ 2011-04-19 15:18 UTC (permalink / raw)
  To: ceph-devel

I recently have been working with exporting ceph to NFS. I have had stability problems with NFS (ceph is working but NFS crashes). But most recently, my mds0 will not start after one of these instances with NFS.

My setup. 2 mds, 1 mon (located on mds0), 5 osds. All running Ubuntu v10.10.

Here is the output when I try to start the mds0. Is there other debugging I can turn on?

/etc/init.d/ceph start mds0

2011-04-19 10:06:58.602640 7fb202fe4700 mds0.11 ms_handle_connect on 10.6.1.93:6800/945
./include/elist.h: In function 'elist<T>::item::~item() [with T = MDSlaveUpdate*]', in thread '0x7fb2004d5700'
./include/elist.h: 39: FAILED assert(!is_on_list())
 ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
 1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
 2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
 3: (MDLog::_replay_thread()+0xb90) [0x67f850]
 4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
 5: (()+0x7971) [0x7fb20564a971]
 6: (clone()+0x6d) [0x7fb2042e692d]
 ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
 1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
 2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
 3: (MDLog::_replay_thread()+0xb90) [0x67f850]
 4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
 5: (()+0x7971) [0x7fb20564a971]
 6: (clone()+0x6d) [0x7fb2042e692d]
*** Caught signal (Aborted) **
 in thread 0x7fb2004d5700
 ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
 1: /usr/bin/cmds() [0x70fc38]
 2: (()+0xfb40) [0x7fb205652b40]
 3: (gsignal()+0x35) [0x7fb204233ba5]
 4: (abort()+0x180) [0x7fb2042376b0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb204ad76bd]
 6: (()+0xb9906) [0x7fb204ad5906]
 7: (()+0xb9933) [0x7fb204ad5933]
 8: (()+0xb9a3e) [0x7fb204ad5a3e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x36a) [0x6f5eaa]
 10: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
 11: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
 12: (MDLog::_replay_thread()+0xb90) [0x67f850]
 13: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
 14: (()+0x7971) [0x7fb20564a971]
 15: (clone()+0x6d) [0x7fb2042e692d]

I am not sure why the IP address of 0.0.0.0 shows up with starting the mds0.

root@mds0:/var/log/ceph# /etc/init.d/ceph start mds0
=== mds.0 ===
Starting Ceph mds.0 on mds0...
 ** WARNING: Ceph is still under heavy development, and is only suitable for **
 **          testing and review.  Do not trust it with important data.       **
starting mds.0 at 0.0.0.0:6800/2994

Thanks for your assistance.

Mark Nigh
Systems Architect
mnigh@netelligent.com
 (p) 314.392.6926




This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-10-28 22:57 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-23 21:21 MDS crash Fyodor Ustinov
2011-05-23 22:27 ` Sage Weil
2011-05-23 22:45   ` Fyodor Ustinov
2011-05-23 23:08     ` Sage Weil
2011-05-23 23:52       ` Fyodor Ustinov
2011-05-24  0:32   ` Fyodor Ustinov
2011-05-24 23:54 ` Sage Weil
  -- strict thread matches above, loose matches on Subject: below --
2011-10-28 22:57 Noah Watkins
2011-07-02 21:30 Fyodor Ustinov
2011-07-02 22:03 ` Sage Weil
2011-07-02 22:16   ` Fyodor Ustinov
2011-07-05 16:03 ` Sage Weil
2011-04-19 15:18 mds crash Mark Nigh
2011-04-19 16:17 ` Sage Weil
2011-04-20 14:00   ` Mark Nigh
2011-04-21 19:08     ` Tommi Virtanen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.