All of lore.kernel.org
 help / color / mirror / Atom feed
* OSD crash
@ 2011-05-27  0:12 Fyodor Ustinov
  2011-05-27 15:16 ` Gregory Farnum
  0 siblings, 1 reply; 28+ messages in thread
From: Fyodor Ustinov @ 2011-05-27  0:12 UTC (permalink / raw)
  To: ceph-devel

Hi!

2011-05-27 02:35:22.046798 7fa8ff058700 journal check_for_full at 
837623808 : JOURNAL FULL 837623808 >= 147455 (max_size 996147200 start 
837771264)
2011-05-27 02:35:23.479379 7fa8f7f49700 journal throttle: waited for bytes
2011-05-27 02:35:34.730418 7fa8ff058700 journal check_for_full at 
836984832 : JOURNAL FULL 836984832 >= 638975 (max_size 996147200 start 
837623808)
2011-05-27 02:35:36.050384 7fa8f7f49700 journal throttle: waited for bytes
2011-05-27 02:35:47.226789 7fa8ff058700 journal check_for_full at 
836882432 : JOURNAL FULL 836882432 >= 102399 (max_size 996147200 start 
836984832)
2011-05-27 02:35:48.937259 7fa8f874a700 journal throttle: waited for bytes
2011-05-27 02:35:59.985040 7fa8ff058700 journal check_for_full at 
836685824 : JOURNAL FULL 836685824 >= 196607 (max_size 996147200 start 
836882432)
2011-05-27 02:36:01.654955 7fa8f874a700 journal throttle: waited for bytes
2011-05-27 02:36:12.362896 7fa8ff058700 journal check_for_full at 
835723264 : JOURNAL FULL 835723264 >= 962559 (max_size 996147200 start 
836685824)
2011-05-27 02:36:14.375435 7fa8f7f49700 journal throttle: waited for bytes
./include/xlist.h: In function 'void xlist<T>::remove(xlist<T>::item*) 
[with T = PG*]', in thread '0x7fa8f7748700'
./include/xlist.h: 107: FAILED assert(i->_list == this)
  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
  3: (ThreadPool::worker()+0x10a) [0x65799a]
  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
  5: (()+0x6d8c) [0x7fa904294d8c]
  6: (clone()+0x6d) [0x7fa90314704d]
  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
  3: (ThreadPool::worker()+0x10a) [0x65799a]
  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
  5: (()+0x6d8c) [0x7fa904294d8c]
  6: (clone()+0x6d) [0x7fa90314704d]
*** Caught signal (Aborted) **
  in thread 0x7fa8f7748700
  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
  1: /usr/bin/cosd() [0x6729f9]
  2: (()+0xfc60) [0x7fa90429dc60]
  3: (gsignal()+0x35) [0x7fa903094d05]
  4: (abort()+0x186) [0x7fa903098ab6]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fa90394b6dd]
  6: (()+0xb9926) [0x7fa903949926]
  7: (()+0xb9953) [0x7fa903949953]
  8: (()+0xb9a5e) [0x7fa903949a5e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x362) [0x655e32]
  10: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
  11: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
  12: (ThreadPool::worker()+0x10a) [0x65799a]
  13: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
  14: (()+0x6d8c) [0x7fa904294d8c]
  15: (clone()+0x6d) [0x7fa90314704d]

WBR,
     Fyodor.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2011-05-27  0:12 OSD crash Fyodor Ustinov
@ 2011-05-27 15:16 ` Gregory Farnum
  2011-05-27 16:41   ` Fyodor Ustinov
  0 siblings, 1 reply; 28+ messages in thread
From: Gregory Farnum @ 2011-05-27 15:16 UTC (permalink / raw)
  To: ceph-devel, Fyodor Ustinov

This is an interesting one -- the invariant that assert is checking
isn't too complicated (that the object lives on the RecoveryWQ's
queue) and seems to hold everywhere the RecoveryWQ is called. And the
functions modifying the queue are always called under the workqueue
lock, and do maintenance if the xlist::item is on a different list.
Which makes me think that the problem must be from conflating the
RecoveryWQ lock and the PG lock in the few places that modify the
PG::recovery_item directly, rather than via RecoveryWQ functions.
Anybody more familiar than me with this have ideas?
Fyodor, based on the time stamps and output you've given us, I assume
you don't have more detailed logs?
-Greg

On Thu, May 26, 2011 at 5:12 PM, Fyodor Ustinov <ufm@ufm.su> wrote:
> Hi!
>
> 2011-05-27 02:35:22.046798 7fa8ff058700 journal check_for_full at 837623808
> : JOURNAL FULL 837623808 >= 147455 (max_size 996147200 start 837771264)
> 2011-05-27 02:35:23.479379 7fa8f7f49700 journal throttle: waited for bytes
> 2011-05-27 02:35:34.730418 7fa8ff058700 journal check_for_full at 836984832
> : JOURNAL FULL 836984832 >= 638975 (max_size 996147200 start 837623808)
> 2011-05-27 02:35:36.050384 7fa8f7f49700 journal throttle: waited for bytes
> 2011-05-27 02:35:47.226789 7fa8ff058700 journal check_for_full at 836882432
> : JOURNAL FULL 836882432 >= 102399 (max_size 996147200 start 836984832)
> 2011-05-27 02:35:48.937259 7fa8f874a700 journal throttle: waited for bytes
> 2011-05-27 02:35:59.985040 7fa8ff058700 journal check_for_full at 836685824
> : JOURNAL FULL 836685824 >= 196607 (max_size 996147200 start 836882432)
> 2011-05-27 02:36:01.654955 7fa8f874a700 journal throttle: waited for bytes
> 2011-05-27 02:36:12.362896 7fa8ff058700 journal check_for_full at 835723264
> : JOURNAL FULL 835723264 >= 962559 (max_size 996147200 start 836685824)
> 2011-05-27 02:36:14.375435 7fa8f7f49700 journal throttle: waited for bytes
> ./include/xlist.h: In function 'void xlist<T>::remove(xlist<T>::item*) [with
> T = PG*]', in thread '0x7fa8f7748700'
> ./include/xlist.h: 107: FAILED assert(i->_list == this)
>  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
>  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
>  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
>  3: (ThreadPool::worker()+0x10a) [0x65799a]
>  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
>  5: (()+0x6d8c) [0x7fa904294d8c]
>  6: (clone()+0x6d) [0x7fa90314704d]
>  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
>  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
>  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
>  3: (ThreadPool::worker()+0x10a) [0x65799a]
>  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
>  5: (()+0x6d8c) [0x7fa904294d8c]
>  6: (clone()+0x6d) [0x7fa90314704d]
> *** Caught signal (Aborted) **
>  in thread 0x7fa8f7748700
>  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
>  1: /usr/bin/cosd() [0x6729f9]
>  2: (()+0xfc60) [0x7fa90429dc60]
>  3: (gsignal()+0x35) [0x7fa903094d05]
>  4: (abort()+0x186) [0x7fa903098ab6]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fa90394b6dd]
>  6: (()+0xb9926) [0x7fa903949926]
>  7: (()+0xb9953) [0x7fa903949953]
>  8: (()+0xb9a5e) [0x7fa903949a5e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x362) [0x655e32]
>  10: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
>  11: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
>  12: (ThreadPool::worker()+0x10a) [0x65799a]
>  13: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
>  14: (()+0x6d8c) [0x7fa904294d8c]
>  15: (clone()+0x6d) [0x7fa90314704d]
>
> WBR,
>    Fyodor.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2011-05-27 15:16 ` Gregory Farnum
@ 2011-05-27 16:41   ` Fyodor Ustinov
  2011-05-27 16:49     ` Gregory Farnum
  0 siblings, 1 reply; 28+ messages in thread
From: Fyodor Ustinov @ 2011-05-27 16:41 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

On 05/27/2011 06:16 PM, Gregory Farnum wrote:
> This is an interesting one -- the invariant that assert is checking
> isn't too complicated (that the object lives on the RecoveryWQ's
> queue) and seems to hold everywhere the RecoveryWQ is called. And the
> functions modifying the queue are always called under the workqueue
> lock, and do maintenance if the xlist::item is on a different list.
> Which makes me think that the problem must be from conflating the
> RecoveryWQ lock and the PG lock in the few places that modify the
> PG::recovery_item directly, rather than via RecoveryWQ functions.
> Anybody more familiar than me with this have ideas?
> Fyodor, based on the time stamps and output you've given us, I assume
> you don't have more detailed logs?
> -Greg

Greg, i got this crash again.
Let me tell you the configuration and what is happening:
Configuration:
6 osd servers. 4G RAM, 4*1T hdd (mdadmed to raid0), 2*1G etherchannel 
ethernet, Ubuntu server 11.04/64  with kernel 2.6.39 (hand compiled)
mon+mds server 24G RAM, the same os.

On each OSD Journal placed on 1G tempfs. OSD data - on xfs in this case.

Configuration file:

[global]
         max open files = 131072
         log file = /var/log/ceph/$name.log
         pid file = /var/run/ceph/$name.pid

[mon]
         mon data = /mfs/mon$id

[mon.0]
         mon addr  = 10.5.51.230:6789

[mds]
         keyring = /mfs/mds/keyring.$name

[mds.0]
         host = mds0


[osd]
         osd data = /$name
         osd journal = /journal/$name
         osd journal size = 950
         journal dio = false

[osd.0]
         host = osd0
         cluster addr = 10.5.51.10
         public addr = 10.5.51.140

[osd.1]
         host = osd1
         cluster addr = 10.5.51.11
         public addr = 10.5.51.141

[osd.2]
         host = osd2
         cluster addr = 10.5.51.12
         public addr = 10.5.51.142

[osd.3]
         host = osd3
         cluster addr = 10.5.51.13
         public addr = 10.5.51.143

[osd.4]
         host = osd4
         cluster addr = 10.5.51.14
         public addr = 10.5.51.144

[osd.5]
         host = osd5
         cluster addr = 10.5.51.15
         public addr = 10.5.51.145

What happening:
osd2 was crashed, rebooted, osd data and journal created from scratch by 
"cosd --mkfs -i 2 --monmap /tmp/monmap" and server started.
Additional - on osd2 enables "writeahaed", but I think it's not 
principal in this case.

Well, server start rebalancing:

2011-05-27 15:12:49.323558 7f3b69de5740 ceph version 0.28.1.commit: 
d66c6ca19bbde3c363b135b66072de44e67c6632. process: cosd. pid: 1694
2011-05-27 15:12:49.325331 7f3b69de5740 filestore(/osd.2) mount FIEMAP 
ioctl is NOT supported
2011-05-27 15:12:49.325378 7f3b69de5740 filestore(/osd.2) mount did NOT 
detect btrfs
2011-05-27 15:12:49.325467 7f3b69de5740 filestore(/osd.2) mount found 
snaps <>
2011-05-27 15:12:49.325512 7f3b69de5740 filestore(/osd.2) mount: 
WRITEAHEAD journal mode explicitly enabled in conf
2011-05-27 15:12:49.325526 7f3b69de5740 filestore(/osd.2) mount WARNING: 
not btrfs or ext3; data may be lost
2011-05-27 15:12:49.325606 7f3b69de5740 journal _open /journal/osd.2 fd 
11: 996147200 bytes, block size 4096 bytes, directio = 0
2011-05-27 15:12:49.325641 7f3b69de5740 journal read_entry 4096 : seq 1 
203 bytes
2011-05-27 15:12:49.325698 7f3b69de5740 journal _open /journal/osd.2 fd 
11: 996147200 bytes, block size 4096 bytes, directio = 0
2011-05-27 15:12:49.544716 7f3b59656700 -- 10.5.51.12:6801/1694 >> 
10.5.51.14:6801/5070 pipe(0x1239d20 sd=27 pgs=0 cs=0 l=0).accept we 
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:12:49.544798 7f3b59c5c700 -- 10.5.51.12:6801/1694 >> 
10.5.51.13:6801/5165 pipe(0x104b950 sd=14 pgs=0 cs=0 l=0).accept we 
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:12:49.544864 7f3b59757700 -- 10.5.51.12:6801/1694 >> 
10.5.51.15:6801/1574 pipe(0x11e7cd0 sd=16 pgs=0 cs=0 l=0).accept we 
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:12:49.544909 7f3b59959700 -- 10.5.51.12:6801/1694 >> 
10.5.51.10:6801/6148 pipe(0x11d7d30 sd=15 pgs=0 cs=0 l=0).accept we 
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:13:23.015637 7f3b64579700 journal check_for_full at 
66404352 : JOURNAL FULL 66404352 >= 851967 (max_size 996147200 start 
67256320)
2011-05-27 15:13:25.586081 7f3b5dc6b700 journal throttle: waited for bytes
2011-05-27 15:13:25.601789 7f3b5d46a700 journal throttle: waited for bytes

[...] and after 2 hours:

2011-05-27 17:30:21.355034 7f3b64579700 journal check_for_full at 
415199232 : JOURNAL FULL 415199232 >= 778239 (max_size 996147200 start 
415977472)
2011-05-27 17:30:23.441445 7f3b5d46a700 journal throttle: waited for bytes
2011-05-27 17:30:36.362877 7f3b64579700 journal check_for_full at 
414326784 : JOURNAL FULL 414326784 >= 872447 (max_size 996147200 start 
415199232)
2011-05-27 17:30:38.391372 7f3b5d46a700 journal throttle: waited for bytes
2011-05-27 17:30:50.373936 7f3b64579700 journal check_for_full at 
414314496 : JOURNAL FULL 414314496 >= 12287 (max_size 996147200 start 
414326784)
./include/xlist.h: In function 'void xlist<T>::remove(xlist<T>::item*) 
[with T = PG*]', in thread '0x7f3b5cc69700'
./include/xlist.h: 107: FAILED assert(i->_list == this)
  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
  3: (ThreadPool::worker()+0x10a) [0x65799a]
  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
  5: (()+0x6d8c) [0x7f3b697b5d8c]
  6: (clone()+0x6d) [0x7f3b6866804d]
  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
  3: (ThreadPool::worker()+0x10a) [0x65799a]
  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
  5: (()+0x6d8c) [0x7f3b697b5d8c]
  6: (clone()+0x6d) [0x7f3b6866804d]
*** Caught signal (Aborted) **
  in thread 0x7f3b5cc69700
  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
  1: /usr/bin/cosd() [0x6729f9]
  2: (()+0xfc60) [0x7f3b697bec60]
  3: (gsignal()+0x35) [0x7f3b685b5d05]
  4: (abort()+0x186) [0x7f3b685b9ab6]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f3b68e6c6dd]
  6: (()+0xb9926) [0x7f3b68e6a926]
  7: (()+0xb9953) [0x7f3b68e6a953]
  8: (()+0xb9a5e) [0x7f3b68e6aa5e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x362) [0x655e32]
  10: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
  11: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
  12: (ThreadPool::worker()+0x10a) [0x65799a]
  13: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
  14: (()+0x6d8c) [0x7f3b697b5d8c]
  15: (clone()+0x6d) [0x7f3b6866804d]

I.e. it's not "easy reproduced" bug. While I had less data in the 
cluster - I not seen this error.

I think that I do not have enough space for "full" log for 2-3 hours. Sorry.

WBR,
     Fyodor.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2011-05-27 16:41   ` Fyodor Ustinov
@ 2011-05-27 16:49     ` Gregory Farnum
  2011-05-27 19:18       ` Gregory Farnum
  0 siblings, 1 reply; 28+ messages in thread
From: Gregory Farnum @ 2011-05-27 16:49 UTC (permalink / raw)
  To: Fyodor Ustinov; +Cc: ceph-devel

Okay, one of us should have a patch for you later today to try out;
the only cause I can come up with seems unlikely but there aren't many
options so it should be pretty simple. :)
I opened a tracker issue for it: http://tracker.newdream.net/issues/1116
-Greg

On Fri, May 27, 2011 at 9:41 AM, Fyodor Ustinov <ufm@ufm.su> wrote:
> On 05/27/2011 06:16 PM, Gregory Farnum wrote:
>>
>> This is an interesting one -- the invariant that assert is checking
>> isn't too complicated (that the object lives on the RecoveryWQ's
>> queue) and seems to hold everywhere the RecoveryWQ is called. And the
>> functions modifying the queue are always called under the workqueue
>> lock, and do maintenance if the xlist::item is on a different list.
>> Which makes me think that the problem must be from conflating the
>> RecoveryWQ lock and the PG lock in the few places that modify the
>> PG::recovery_item directly, rather than via RecoveryWQ functions.
>> Anybody more familiar than me with this have ideas?
>> Fyodor, based on the time stamps and output you've given us, I assume
>> you don't have more detailed logs?
>> -Greg
>
> Greg, i got this crash again.
> Let me tell you the configuration and what is happening:
> Configuration:
> 6 osd servers. 4G RAM, 4*1T hdd (mdadmed to raid0), 2*1G etherchannel
> ethernet, Ubuntu server 11.04/64  with kernel 2.6.39 (hand compiled)
> mon+mds server 24G RAM, the same os.
>
> On each OSD Journal placed on 1G tempfs. OSD data - on xfs in this case.
>
> Configuration file:
>
> [global]
>        max open files = 131072
>        log file = /var/log/ceph/$name.log
>        pid file = /var/run/ceph/$name.pid
>
> [mon]
>        mon data = /mfs/mon$id
>
> [mon.0]
>        mon addr  = 10.5.51.230:6789
>
> [mds]
>        keyring = /mfs/mds/keyring.$name
>
> [mds.0]
>        host = mds0
>
>
> [osd]
>        osd data = /$name
>        osd journal = /journal/$name
>        osd journal size = 950
>        journal dio = false
>
> [osd.0]
>        host = osd0
>        cluster addr = 10.5.51.10
>        public addr = 10.5.51.140
>
> [osd.1]
>        host = osd1
>        cluster addr = 10.5.51.11
>        public addr = 10.5.51.141
>
> [osd.2]
>        host = osd2
>        cluster addr = 10.5.51.12
>        public addr = 10.5.51.142
>
> [osd.3]
>        host = osd3
>        cluster addr = 10.5.51.13
>        public addr = 10.5.51.143
>
> [osd.4]
>        host = osd4
>        cluster addr = 10.5.51.14
>        public addr = 10.5.51.144
>
> [osd.5]
>        host = osd5
>        cluster addr = 10.5.51.15
>        public addr = 10.5.51.145
>
> What happening:
> osd2 was crashed, rebooted, osd data and journal created from scratch by
> "cosd --mkfs -i 2 --monmap /tmp/monmap" and server started.
> Additional - on osd2 enables "writeahaed", but I think it's not principal in
> this case.
>
> Well, server start rebalancing:
>
> 2011-05-27 15:12:49.323558 7f3b69de5740 ceph version 0.28.1.commit:
> d66c6ca19bbde3c363b135b66072de44e67c6632. process: cosd. pid: 1694
> 2011-05-27 15:12:49.325331 7f3b69de5740 filestore(/osd.2) mount FIEMAP ioctl
> is NOT supported
> 2011-05-27 15:12:49.325378 7f3b69de5740 filestore(/osd.2) mount did NOT
> detect btrfs
> 2011-05-27 15:12:49.325467 7f3b69de5740 filestore(/osd.2) mount found snaps
> <>
> 2011-05-27 15:12:49.325512 7f3b69de5740 filestore(/osd.2) mount: WRITEAHEAD
> journal mode explicitly enabled in conf
> 2011-05-27 15:12:49.325526 7f3b69de5740 filestore(/osd.2) mount WARNING: not
> btrfs or ext3; data may be lost
> 2011-05-27 15:12:49.325606 7f3b69de5740 journal _open /journal/osd.2 fd 11:
> 996147200 bytes, block size 4096 bytes, directio = 0
> 2011-05-27 15:12:49.325641 7f3b69de5740 journal read_entry 4096 : seq 1 203
> bytes
> 2011-05-27 15:12:49.325698 7f3b69de5740 journal _open /journal/osd.2 fd 11:
> 996147200 bytes, block size 4096 bytes, directio = 0
> 2011-05-27 15:12:49.544716 7f3b59656700 -- 10.5.51.12:6801/1694 >>
> 10.5.51.14:6801/5070 pipe(0x1239d20 sd=27 pgs=0 cs=0 l=0).accept we reset
> (peer sent cseq 2), sending RESETSESSION
> 2011-05-27 15:12:49.544798 7f3b59c5c700 -- 10.5.51.12:6801/1694 >>
> 10.5.51.13:6801/5165 pipe(0x104b950 sd=14 pgs=0 cs=0 l=0).accept we reset
> (peer sent cseq 2), sending RESETSESSION
> 2011-05-27 15:12:49.544864 7f3b59757700 -- 10.5.51.12:6801/1694 >>
> 10.5.51.15:6801/1574 pipe(0x11e7cd0 sd=16 pgs=0 cs=0 l=0).accept we reset
> (peer sent cseq 2), sending RESETSESSION
> 2011-05-27 15:12:49.544909 7f3b59959700 -- 10.5.51.12:6801/1694 >>
> 10.5.51.10:6801/6148 pipe(0x11d7d30 sd=15 pgs=0 cs=0 l=0).accept we reset
> (peer sent cseq 2), sending RESETSESSION
> 2011-05-27 15:13:23.015637 7f3b64579700 journal check_for_full at 66404352 :
> JOURNAL FULL 66404352 >= 851967 (max_size 996147200 start 67256320)
> 2011-05-27 15:13:25.586081 7f3b5dc6b700 journal throttle: waited for bytes
> 2011-05-27 15:13:25.601789 7f3b5d46a700 journal throttle: waited for bytes
>
> [...] and after 2 hours:
>
> 2011-05-27 17:30:21.355034 7f3b64579700 journal check_for_full at 415199232
> : JOURNAL FULL 415199232 >= 778239 (max_size 996147200 start 415977472)
> 2011-05-27 17:30:23.441445 7f3b5d46a700 journal throttle: waited for bytes
> 2011-05-27 17:30:36.362877 7f3b64579700 journal check_for_full at 414326784
> : JOURNAL FULL 414326784 >= 872447 (max_size 996147200 start 415199232)
> 2011-05-27 17:30:38.391372 7f3b5d46a700 journal throttle: waited for bytes
> 2011-05-27 17:30:50.373936 7f3b64579700 journal check_for_full at 414314496
> : JOURNAL FULL 414314496 >= 12287 (max_size 996147200 start 414326784)
> ./include/xlist.h: In function 'void xlist<T>::remove(xlist<T>::item*) [with
> T = PG*]', in thread '0x7f3b5cc69700'
> ./include/xlist.h: 107: FAILED assert(i->_list == this)
>  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
>  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
>  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
>  3: (ThreadPool::worker()+0x10a) [0x65799a]
>  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
>  5: (()+0x6d8c) [0x7f3b697b5d8c]
>  6: (clone()+0x6d) [0x7f3b6866804d]
>  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
>  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
>  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
>  3: (ThreadPool::worker()+0x10a) [0x65799a]
>  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
>  5: (()+0x6d8c) [0x7f3b697b5d8c]
>  6: (clone()+0x6d) [0x7f3b6866804d]
> *** Caught signal (Aborted) **
>  in thread 0x7f3b5cc69700
>  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
>  1: /usr/bin/cosd() [0x6729f9]
>  2: (()+0xfc60) [0x7f3b697bec60]
>  3: (gsignal()+0x35) [0x7f3b685b5d05]
>  4: (abort()+0x186) [0x7f3b685b9ab6]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f3b68e6c6dd]
>  6: (()+0xb9926) [0x7f3b68e6a926]
>  7: (()+0xb9953) [0x7f3b68e6a953]
>  8: (()+0xb9a5e) [0x7f3b68e6aa5e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x362) [0x655e32]
>  10: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
>  11: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
>  12: (ThreadPool::worker()+0x10a) [0x65799a]
>  13: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
>  14: (()+0x6d8c) [0x7f3b697b5d8c]
>  15: (clone()+0x6d) [0x7f3b6866804d]
>
> I.e. it's not "easy reproduced" bug. While I had less data in the cluster -
> I not seen this error.
>
> I think that I do not have enough space for "full" log for 2-3 hours. Sorry.
>
> WBR,
>    Fyodor.
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2011-05-27 16:49     ` Gregory Farnum
@ 2011-05-27 19:18       ` Gregory Farnum
  2011-05-27 19:30         ` Fyodor Ustinov
  2011-05-27 22:52         ` Fyodor Ustinov
  0 siblings, 2 replies; 28+ messages in thread
From: Gregory Farnum @ 2011-05-27 19:18 UTC (permalink / raw)
  To: Fyodor Ustinov; +Cc: ceph-devel

Can you check out the recoverywq_fix branch and see if that prevents
this issue? Or just apply the patch I've included below. :)
-Greg

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 2224e67..41c09d6 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -4789,11 +4789,17 @@ void OSD::do_recovery(PG *pg)
        do_queries(query_map);
       else {
        dout(10) << "do_recovery  no luck, giving up on this pg for
now" << dendl;
+       recovery_wq.lock();
        pg->recovery_item.remove_myself();      // sigh...
+       recovery_wq.unlock();
+
       }
     }
-    else if (started < max)
+    else if (started < max) {
+      recovery_wq.lock();
       pg->recovery_item.remove_myself();
+      recovery_wq.unlock();
+    }

     pg->unlock();
   }


On Fri, May 27, 2011 at 9:49 AM, Gregory Farnum <gregf@hq.newdream.net> wrote:
> Okay, one of us should have a patch for you later today to try out;
> the only cause I can come up with seems unlikely but there aren't many
> options so it should be pretty simple. :)
> I opened a tracker issue for it: http://tracker.newdream.net/issues/1116
> -Greg
>
> On Fri, May 27, 2011 at 9:41 AM, Fyodor Ustinov <ufm@ufm.su> wrote:
>> On 05/27/2011 06:16 PM, Gregory Farnum wrote:
>>>
>>> This is an interesting one -- the invariant that assert is checking
>>> isn't too complicated (that the object lives on the RecoveryWQ's
>>> queue) and seems to hold everywhere the RecoveryWQ is called. And the
>>> functions modifying the queue are always called under the workqueue
>>> lock, and do maintenance if the xlist::item is on a different list.
>>> Which makes me think that the problem must be from conflating the
>>> RecoveryWQ lock and the PG lock in the few places that modify the
>>> PG::recovery_item directly, rather than via RecoveryWQ functions.
>>> Anybody more familiar than me with this have ideas?
>>> Fyodor, based on the time stamps and output you've given us, I assume
>>> you don't have more detailed logs?
>>> -Greg
>>
>> Greg, i got this crash again.
>> Let me tell you the configuration and what is happening:
>> Configuration:
>> 6 osd servers. 4G RAM, 4*1T hdd (mdadmed to raid0), 2*1G etherchannel
>> ethernet, Ubuntu server 11.04/64  with kernel 2.6.39 (hand compiled)
>> mon+mds server 24G RAM, the same os.
>>
>> On each OSD Journal placed on 1G tempfs. OSD data - on xfs in this case.
>>
>> Configuration file:
>>
>> [global]
>>        max open files = 131072
>>        log file = /var/log/ceph/$name.log
>>        pid file = /var/run/ceph/$name.pid
>>
>> [mon]
>>        mon data = /mfs/mon$id
>>
>> [mon.0]
>>        mon addr  = 10.5.51.230:6789
>>
>> [mds]
>>        keyring = /mfs/mds/keyring.$name
>>
>> [mds.0]
>>        host = mds0
>>
>>
>> [osd]
>>        osd data = /$name
>>        osd journal = /journal/$name
>>        osd journal size = 950
>>        journal dio = false
>>
>> [osd.0]
>>        host = osd0
>>        cluster addr = 10.5.51.10
>>        public addr = 10.5.51.140
>>
>> [osd.1]
>>        host = osd1
>>        cluster addr = 10.5.51.11
>>        public addr = 10.5.51.141
>>
>> [osd.2]
>>        host = osd2
>>        cluster addr = 10.5.51.12
>>        public addr = 10.5.51.142
>>
>> [osd.3]
>>        host = osd3
>>        cluster addr = 10.5.51.13
>>        public addr = 10.5.51.143
>>
>> [osd.4]
>>        host = osd4
>>        cluster addr = 10.5.51.14
>>        public addr = 10.5.51.144
>>
>> [osd.5]
>>        host = osd5
>>        cluster addr = 10.5.51.15
>>        public addr = 10.5.51.145
>>
>> What happening:
>> osd2 was crashed, rebooted, osd data and journal created from scratch by
>> "cosd --mkfs -i 2 --monmap /tmp/monmap" and server started.
>> Additional - on osd2 enables "writeahaed", but I think it's not principal in
>> this case.
>>
>> Well, server start rebalancing:
>>
>> 2011-05-27 15:12:49.323558 7f3b69de5740 ceph version 0.28.1.commit:
>> d66c6ca19bbde3c363b135b66072de44e67c6632. process: cosd. pid: 1694
>> 2011-05-27 15:12:49.325331 7f3b69de5740 filestore(/osd.2) mount FIEMAP ioctl
>> is NOT supported
>> 2011-05-27 15:12:49.325378 7f3b69de5740 filestore(/osd.2) mount did NOT
>> detect btrfs
>> 2011-05-27 15:12:49.325467 7f3b69de5740 filestore(/osd.2) mount found snaps
>> <>
>> 2011-05-27 15:12:49.325512 7f3b69de5740 filestore(/osd.2) mount: WRITEAHEAD
>> journal mode explicitly enabled in conf
>> 2011-05-27 15:12:49.325526 7f3b69de5740 filestore(/osd.2) mount WARNING: not
>> btrfs or ext3; data may be lost
>> 2011-05-27 15:12:49.325606 7f3b69de5740 journal _open /journal/osd.2 fd 11:
>> 996147200 bytes, block size 4096 bytes, directio = 0
>> 2011-05-27 15:12:49.325641 7f3b69de5740 journal read_entry 4096 : seq 1 203
>> bytes
>> 2011-05-27 15:12:49.325698 7f3b69de5740 journal _open /journal/osd.2 fd 11:
>> 996147200 bytes, block size 4096 bytes, directio = 0
>> 2011-05-27 15:12:49.544716 7f3b59656700 -- 10.5.51.12:6801/1694 >>
>> 10.5.51.14:6801/5070 pipe(0x1239d20 sd=27 pgs=0 cs=0 l=0).accept we reset
>> (peer sent cseq 2), sending RESETSESSION
>> 2011-05-27 15:12:49.544798 7f3b59c5c700 -- 10.5.51.12:6801/1694 >>
>> 10.5.51.13:6801/5165 pipe(0x104b950 sd=14 pgs=0 cs=0 l=0).accept we reset
>> (peer sent cseq 2), sending RESETSESSION
>> 2011-05-27 15:12:49.544864 7f3b59757700 -- 10.5.51.12:6801/1694 >>
>> 10.5.51.15:6801/1574 pipe(0x11e7cd0 sd=16 pgs=0 cs=0 l=0).accept we reset
>> (peer sent cseq 2), sending RESETSESSION
>> 2011-05-27 15:12:49.544909 7f3b59959700 -- 10.5.51.12:6801/1694 >>
>> 10.5.51.10:6801/6148 pipe(0x11d7d30 sd=15 pgs=0 cs=0 l=0).accept we reset
>> (peer sent cseq 2), sending RESETSESSION
>> 2011-05-27 15:13:23.015637 7f3b64579700 journal check_for_full at 66404352 :
>> JOURNAL FULL 66404352 >= 851967 (max_size 996147200 start 67256320)
>> 2011-05-27 15:13:25.586081 7f3b5dc6b700 journal throttle: waited for bytes
>> 2011-05-27 15:13:25.601789 7f3b5d46a700 journal throttle: waited for bytes
>>
>> [...] and after 2 hours:
>>
>> 2011-05-27 17:30:21.355034 7f3b64579700 journal check_for_full at 415199232
>> : JOURNAL FULL 415199232 >= 778239 (max_size 996147200 start 415977472)
>> 2011-05-27 17:30:23.441445 7f3b5d46a700 journal throttle: waited for bytes
>> 2011-05-27 17:30:36.362877 7f3b64579700 journal check_for_full at 414326784
>> : JOURNAL FULL 414326784 >= 872447 (max_size 996147200 start 415199232)
>> 2011-05-27 17:30:38.391372 7f3b5d46a700 journal throttle: waited for bytes
>> 2011-05-27 17:30:50.373936 7f3b64579700 journal check_for_full at 414314496
>> : JOURNAL FULL 414314496 >= 12287 (max_size 996147200 start 414326784)
>> ./include/xlist.h: In function 'void xlist<T>::remove(xlist<T>::item*) [with
>> T = PG*]', in thread '0x7f3b5cc69700'
>> ./include/xlist.h: 107: FAILED assert(i->_list == this)
>>  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
>>  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
>>  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
>>  3: (ThreadPool::worker()+0x10a) [0x65799a]
>>  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
>>  5: (()+0x6d8c) [0x7f3b697b5d8c]
>>  6: (clone()+0x6d) [0x7f3b6866804d]
>>  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
>>  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
>>  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
>>  3: (ThreadPool::worker()+0x10a) [0x65799a]
>>  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
>>  5: (()+0x6d8c) [0x7f3b697b5d8c]
>>  6: (clone()+0x6d) [0x7f3b6866804d]
>> *** Caught signal (Aborted) **
>>  in thread 0x7f3b5cc69700
>>  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
>>  1: /usr/bin/cosd() [0x6729f9]
>>  2: (()+0xfc60) [0x7f3b697bec60]
>>  3: (gsignal()+0x35) [0x7f3b685b5d05]
>>  4: (abort()+0x186) [0x7f3b685b9ab6]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f3b68e6c6dd]
>>  6: (()+0xb9926) [0x7f3b68e6a926]
>>  7: (()+0xb9953) [0x7f3b68e6a953]
>>  8: (()+0xb9a5e) [0x7f3b68e6aa5e]
>>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x362) [0x655e32]
>>  10: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
>>  11: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
>>  12: (ThreadPool::worker()+0x10a) [0x65799a]
>>  13: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
>>  14: (()+0x6d8c) [0x7f3b697b5d8c]
>>  15: (clone()+0x6d) [0x7f3b6866804d]
>>
>> I.e. it's not "easy reproduced" bug. While I had less data in the cluster -
>> I not seen this error.
>>
>> I think that I do not have enough space for "full" log for 2-3 hours. Sorry.
>>
>> WBR,
>>    Fyodor.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2011-05-27 19:18       ` Gregory Farnum
@ 2011-05-27 19:30         ` Fyodor Ustinov
  2011-05-27 22:52         ` Fyodor Ustinov
  1 sibling, 0 replies; 28+ messages in thread
From: Fyodor Ustinov @ 2011-05-27 19:30 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

On 05/27/2011 10:18 PM, Gregory Farnum wrote:
> Can you check out the recoverywq_fix branch and see if that prevents
> this issue? Or just apply the patch I've included below. :)
> -Greg
>
ok, I will write about the result.

WBR,
     Fyodor.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2011-05-27 19:18       ` Gregory Farnum
  2011-05-27 19:30         ` Fyodor Ustinov
@ 2011-05-27 22:52         ` Fyodor Ustinov
  1 sibling, 0 replies; 28+ messages in thread
From: Fyodor Ustinov @ 2011-05-27 22:52 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

On 05/27/2011 10:18 PM, Gregory Farnum wrote:
> Can you check out the recoverywq_fix branch and see if that prevents
> this issue? Or just apply the patch I've included below. :)
> -Greg

Looks as though this patch has helped.
At least this osd has completd rebalancing.
Great! Thanks!

WBR,
     Fyodor.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* osd crash
@ 2020-09-07 16:42 Kaarlo Lahtela
  0 siblings, 0 replies; 28+ messages in thread
From: Kaarlo Lahtela @ 2020-09-07 16:42 UTC (permalink / raw)
  To: ceph-devel

Hi,
two of my osds on different nodes do not start.  Now I have one pg
that is down. This happened on ceph version 14.2.10 and still happens
after upgrading to 14.2.11. I get this error when starting osd:

===============8<========================
root@prox:~# /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph
--setgroup ceph
2020-09-05 13:53:15.077 7fd0fca43c80 -1 osd.1 6645 log_to_monitors
{default=true}
2020-09-05 13:53:15.189 7fd0f5d4b700 -1 osd.1 6687 set_numa_affinity
unable to identify public interface 'vmbr0' numa node: (2) No such
file or directory
/build/ceph-JY24tx/ceph-14.2.11/src/osd/osd_types.cc: In function
'uint64_t SnapSet::get_clone_bytes(snapid_t) const' thread
7fd0e2524700 time 2020-09-05 13:53:17.980687
/build/ceph-JY24tx/ceph-14.2.11/src/osd/osd_types.cc: 5450: FAILED
ceph_assert(clone_overlap.count(clone))
 ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94)
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x152) [0x562cc5ea83c8]
 2: (()+0x5115a0) [0x562cc5ea85a0]
 3: (SnapSet::get_clone_bytes(snapid_t) const+0xc2) [0x562cc61dc432]
 4: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>,
pg_stat_t*)+0x297) [0x562cc6107fb7]
 5: (PrimaryLogPG::recover_backfill(unsigned long,
ThreadPool::TPHandle&, bool*)+0xfdc) [0x562cc6136a3c]
 6: (PrimaryLogPG::start_recovery_ops(unsigned long,
ThreadPool::TPHandle&, unsigned long*)+0x1173) [0x562cc613ab43]
 7: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x302) [0x562cc5f8b622]
 8: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
ThreadPool::TPHandle&)+0x19) [0x562cc622fac9]
 9: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x7d7) [0x562cc5fa7ba7]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4)
[0x562cc65740c4]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562cc6576ad0]
 12: (()+0x7fa3) [0x7fd0fd487fa3]
 13: (clone()+0x3f) [0x7fd0fd0374cf]
*** Caught signal (Aborted) **
 in thread 7fd0e2524700 thread_name:tp_osd_tp
2020-09-05 13:53:17.977 7fd0e2524700 -1
/build/ceph-JY24tx/ceph-14.2.11/src/osd/osd_types.cc: In function
'uint64_t SnapSet::get_clone_bytes(snapid_t) const' thread
7fd0e2524700 time 2020-09-05 13:53:17.980687
/build/ceph-JY24tx/ceph-14.2.11/src/osd/osd_types.cc: 5450: FAILED
ceph_assert(clone_overlap.count(clone))

 ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94)
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x152) [0x562cc5ea83c8]
 2: (()+0x5115a0) [0x562cc5ea85a0]
 3: (SnapSet::get_clone_bytes(snapid_t) const+0xc2) [0x562cc61dc432]
 4: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>,
pg_stat_t*)+0x297) [0x562cc6107fb7]
 5: (PrimaryLogPG::recover_backfill(unsigned long,
ThreadPool::TPHandle&, bool*)+0xfdc) [0x562cc6136a3c]
 6: (PrimaryLogPG::start_recovery_ops(unsigned long,
ThreadPool::TPHandle&, unsigned long*)+0x1173) [0x562cc613ab43]
 7: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x302) [0x562cc5f8b622]
 8: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
ThreadPool::TPHandle&)+0x19) [0x562cc622fac9]
 9: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x7d7) [0x562cc5fa7ba7]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4)
[0x562cc65740c4]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562cc6576ad0]
 12: (()+0x7fa3) [0x7fd0fd487fa3]
 13: (clone()+0x3f) [0x7fd0fd0374cf]

 ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94)
nautilus (stable)
 1: (()+0x12730) [0x7fd0fd492730]
 2: (gsignal()+0x10b) [0x7fd0fcf757bb]
 3: (abort()+0x121) [0x7fd0fcf60535]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1a3) [0x562cc5ea8419]
 5: (()+0x5115a0) [0x562cc5ea85a0]
 6: (SnapSet::get_clone_bytes(snapid_t) const+0xc2) [0x562cc61dc432]
 7: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>,
pg_stat_t*)+0x297) [0x562cc6107fb7]
 8: (PrimaryLogPG::recover_backfill(unsigned long,
ThreadPool::TPHandle&, bool*)+0xfdc) [0x562cc6136a3c]
 9: (PrimaryLogPG::start_recovery_ops(unsigned long,
ThreadPool::TPHandle&, unsigned long*)+0x1173) [0x562cc613ab43]
 10: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x302) [0x562cc5f8b622]
 11: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
ThreadPool::TPHandle&)+0x19) [0x562cc622fac9]
 12: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x7d7) [0x562cc5fa7ba7]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4)
[0x562cc65740c4]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562cc6576ad0]
 15: (()+0x7fa3) [0x7fd0fd487fa3]
 16: (clone()+0x3f) [0x7fd0fd0374cf]
2020-09-05 13:53:17.981 7fd0e2524700 -1 *** Caught signal (Aborted) **
 in thread 7fd0e2524700 thread_name:tp_osd_tp

 ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94)
nautilus (stable)
 1: (()+0x12730) [0x7fd0fd492730]
 2: (gsignal()+0x10b) [0x7fd0fcf757bb]
 3: (abort()+0x121) [0x7fd0fcf60535]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1a3) [0x562cc5ea8419]
 5: (()+0x5115a0) [0x562cc5ea85a0]
 6: (SnapSet::get_clone_bytes(snapid_t) const+0xc2) [0x562cc61dc432]
 7: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>,
pg_stat_t*)+0x297) [0x562cc6107fb7]
 8: (PrimaryLogPG::recover_backfill(unsigned long,
ThreadPool::TPHandle&, bool*)+0xfdc) [0x562cc6136a3c]
 9: (PrimaryLogPG::start_recovery_ops(unsigned long,
ThreadPool::TPHandle&, unsigned long*)+0x1173) [0x562cc613ab43]
 10: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x302) [0x562cc5f8b622]
 11: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
ThreadPool::TPHandle&)+0x19) [0x562cc622fac9]
 12: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x7d7) [0x562cc5fa7ba7]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4)
[0x562cc65740c4]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562cc6576ad0]
 15: (()+0x7fa3) [0x7fd0fd487fa3]
 16: (clone()+0x3f) [0x7fd0fd0374cf]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

  -469> 2020-09-05 13:53:15.077 7fd0fca43c80 -1 osd.1 6645
log_to_monitors {default=true}
  -195> 2020-09-05 13:53:15.189 7fd0f5d4b700 -1 osd.1 6687
set_numa_affinity unable to identify public interface 'vmbr0' numa
node: (2) No such file or directory
    -1> 2020-09-05 13:53:17.977 7fd0e2524700 -1
/build/ceph-JY24tx/ceph-14.2.11/src/osd/osd_types.cc: In function
'uint64_t SnapSet::get_clone_bytes(snapid_t) const' thread
7fd0e2524700 time 2020-09-05 13:53:17.980687
/build/ceph-JY24tx/ceph-14.2.11/src/osd/osd_types.cc: 5450: FAILED
ceph_assert(clone_overlap.count(clone))

 ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94)
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x152) [0x562cc5ea83c8]
 2: (()+0x5115a0) [0x562cc5ea85a0]
 3: (SnapSet::get_clone_bytes(snapid_t) const+0xc2) [0x562cc61dc432]
 4: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>,
pg_stat_t*)+0x297) [0x562cc6107fb7]
 5: (PrimaryLogPG::recover_backfill(unsigned long,
ThreadPool::TPHandle&, bool*)+0xfdc) [0x562cc6136a3c]
 6: (PrimaryLogPG::start_recovery_ops(unsigned long,
ThreadPool::TPHandle&, unsigned long*)+0x1173) [0x562cc613ab43]
 7: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x302) [0x562cc5f8b622]
 8: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
ThreadPool::TPHandle&)+0x19) [0x562cc622fac9]
 9: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x7d7) [0x562cc5fa7ba7]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4)
[0x562cc65740c4]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562cc6576ad0]
 12: (()+0x7fa3) [0x7fd0fd487fa3]
 13: (clone()+0x3f) [0x7fd0fd0374cf]

     0> 2020-09-05 13:53:17.981 7fd0e2524700 -1 *** Caught signal (Aborted) **
 in thread 7fd0e2524700 thread_name:tp_osd_tp

 ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94)
nautilus (stable)
 1: (()+0x12730) [0x7fd0fd492730]
 2: (gsignal()+0x10b) [0x7fd0fcf757bb]
 3: (abort()+0x121) [0x7fd0fcf60535]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1a3) [0x562cc5ea8419]
 5: (()+0x5115a0) [0x562cc5ea85a0]
 6: (SnapSet::get_clone_bytes(snapid_t) const+0xc2) [0x562cc61dc432]
 7: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>,
pg_stat_t*)+0x297) [0x562cc6107fb7]
 8: (PrimaryLogPG::recover_backfill(unsigned long,
ThreadPool::TPHandle&, bool*)+0xfdc) [0x562cc6136a3c]
 9: (PrimaryLogPG::start_recovery_ops(unsigned long,
ThreadPool::TPHandle&, unsigned long*)+0x1173) [0x562cc613ab43]
 10: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x302) [0x562cc5f8b622]
 11: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
ThreadPool::TPHandle&)+0x19) [0x562cc622fac9]
 12: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x7d7) [0x562cc5fa7ba7]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4)
[0x562cc65740c4]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562cc6576ad0]
 15: (()+0x7fa3) [0x7fd0fd487fa3]
 16: (clone()+0x3f) [0x7fd0fd0374cf]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

  -469> 2020-09-05 13:53:15.077 7fd0fca43c80 -1 osd.1 6645
log_to_monitors {default=true}
  -195> 2020-09-05 13:53:15.189 7fd0f5d4b700 -1 osd.1 6687
set_numa_affinity unable to identify public interface 'vmbr0' numa
node: (2) No such file or directory
    -1> 2020-09-05 13:53:17.977 7fd0e2524700 -1
/build/ceph-JY24tx/ceph-14.2.11/src/osd/osd_types.cc: In function
'uint64_t SnapSet::get_clone_bytes(snapid_t) const' thread
7fd0e2524700 time 2020-09-05 13:53:17.980687
/build/ceph-JY24tx/ceph-14.2.11/src/osd/osd_types.cc: 5450: FAILED
ceph_assert(clone_overlap.count(clone))

 ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94)
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x152) [0x562cc5ea83c8]
 2: (()+0x5115a0) [0x562cc5ea85a0]
 3: (SnapSet::get_clone_bytes(snapid_t) const+0xc2) [0x562cc61dc432]
 4: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>,
pg_stat_t*)+0x297) [0x562cc6107fb7]
 5: (PrimaryLogPG::recover_backfill(unsigned long,
ThreadPool::TPHandle&, bool*)+0xfdc) [0x562cc6136a3c]
 6: (PrimaryLogPG::start_recovery_ops(unsigned long,
ThreadPool::TPHandle&, unsigned long*)+0x1173) [0x562cc613ab43]
 7: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x302) [0x562cc5f8b622]
 8: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
ThreadPool::TPHandle&)+0x19) [0x562cc622fac9]
 9: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x7d7) [0x562cc5fa7ba7]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4)
[0x562cc65740c4]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562cc6576ad0]
 12: (()+0x7fa3) [0x7fd0fd487fa3]
 13: (clone()+0x3f) [0x7fd0fd0374cf]

     0> 2020-09-05 13:53:17.981 7fd0e2524700 -1 *** Caught signal (Aborted) **
 in thread 7fd0e2524700 thread_name:tp_osd_tp

 ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94)
nautilus (stable)
 1: (()+0x12730) [0x7fd0fd492730]
 2: (gsignal()+0x10b) [0x7fd0fcf757bb]
 3: (abort()+0x121) [0x7fd0fcf60535]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1a3) [0x562cc5ea8419]
 5: (()+0x5115a0) [0x562cc5ea85a0]
 6: (SnapSet::get_clone_bytes(snapid_t) const+0xc2) [0x562cc61dc432]
 7: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>,
pg_stat_t*)+0x297) [0x562cc6107fb7]
 8: (PrimaryLogPG::recover_backfill(unsigned long,
ThreadPool::TPHandle&, bool*)+0xfdc) [0x562cc6136a3c]
 9: (PrimaryLogPG::start_recovery_ops(unsigned long,
ThreadPool::TPHandle&, unsigned long*)+0x1173) [0x562cc613ab43]
 10: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x302) [0x562cc5f8b622]
 11: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
ThreadPool::TPHandle&)+0x19) [0x562cc622fac9]
 12: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x7d7) [0x562cc5fa7ba7]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4)
[0x562cc65740c4]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562cc6576ad0]
 15: (()+0x7fa3) [0x7fd0fd487fa3]
 16: (clone()+0x3f) [0x7fd0fd0374cf]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

Aborted
===============8<========================

What can I do to recover my osds?

-- 
</kaarlo>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* OSD Crash
       [not found] <8566685.312.1362419807745.JavaMail.dspano@it1>
@ 2013-03-04 18:02 ` Dave Spano
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Spano @ 2013-03-04 18:02 UTC (permalink / raw)
  To: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 525 bytes --]

I had one of my OSDs crash yesterday. I'm using ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5). 

The part of the log file where the crash happened is attached. Not really sure what lead up to it, but I did get an alert from my server monitor telling me my swap space got really low around the time it crashed. 

The OSD reconnected after restarting the service. Currently, I'm waiting patiently as 1 of my 400 pgs gets out of active+clean+scrubbing status. 

Dave Spano 
Optogenics 
Systems Administrator 



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: osd.0.log --]
[-- Type: text/x-log; name=osd.0.log, Size: 6851 bytes --]

   -17> 2013-03-03 13:02:13.478152 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.13039.0:6860359, seq: 5393222, time: 2013-03-03 13:02:13.478134, event: write_thread_in_journal_buffer, request: osd_sub_op(client.13039.0:6860359 3.0 a10c17c8/rb.0.2dd7.16d28c4f.00000000002f/head//3 [] v 411'1980074 snapset=0=[]:[] snapc=0=[]) v7
   -16> 2013-03-03 13:02:13.478153 7f5d559ab700  1 -- 192.168.3.11:6801/4500 --> osd.1 192.168.3.12:6802/2467 -- osd_sub_op_reply(client.14000.1:570700 0.16 5e01a96/100003797f2.00000000/head//0 [] ondisk, result = 0) v1 -- ?+0 0xc45cc80
   -15> 2013-03-03 13:02:13.478184 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.14000.1:570701, seq: 5393223, time: 2013-03-03 13:02:13.478184, event: write_thread_in_journal_buffer, request: osd_sub_op(client.14000.1:570701 0.22 40dccca2/100001164ca.00000002/head//0 [] v 411'447369 snapset=0=[]:[] snapc=0=[]) v7
   -14> 2013-03-03 13:02:13.478209 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625658, seq: 5393225, time: 2013-03-03 13:02:13.478209, event: write_thread_in_journal_buffer, request: osd_sub_op(client.11755.0:2625658 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095529 snapset=0=[]:[] snapc=0=[]) v7
   -13> 2013-03-03 13:02:13.478234 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625659, seq: 5393226, time: 2013-03-03 13:02:13.478234, event: write_thread_in_journal_buffer, request: osd_sub_op(client.11755.0:2625659 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095530 snapset=0=[]:[] snapc=0=[]) v7
   -12> 2013-03-03 13:02:13.484696 7f5d549a9700  1 -- 192.168.3.11:6800/4500 <== client.11755 192.168.1.64:0/1062411 90128 ==== ping v1 ==== 0+0+0 (0 0 0) 0xff4e000 con 0x307a6e0
   -11> 2013-03-03 13:02:13.489457 7f5d4f99f700  5 --OSD::tracker-- reqid: client.11755.0:2625660, seq: 5393227, time: 2013-03-03 13:02:13.489457, event: started, request: osd_sub_op(client.11755.0:2625660 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095531 snapset=0=[]:[] snapc=0=[]) v7
   -10> 2013-03-03 13:02:13.489503 7f5d4f99f700  5 --OSD::tracker-- reqid: client.11755.0:2625660, seq: 5393227, time: 2013-03-03 13:02:13.489503, event: commit_queued_for_journal_write, request: osd_sub_op(client.11755.0:2625660 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095531 snapset=0=[]:[] snapc=0=[]) v7
    -9> 2013-03-03 13:02:13.571632 7f5d501a0700  5 --OSD::tracker-- reqid: client.11755.0:2625657, seq: 5393224, time: 2013-03-03 13:02:13.571631, event: started, request: osd_op(client.11755.0:2625657 rb.0.2ea4.614c277f.00000000003d [write 1253376~4096] 3.c7bd6ff1) v4
    -8> 2013-03-03 13:02:13.571661 7f5d501a0700  5 --OSD::tracker-- reqid: client.11755.0:2625657, seq: 5393224, time: 2013-03-03 13:02:13.571661, event: started, request: osd_op(client.11755.0:2625657 rb.0.2ea4.614c277f.00000000003d [write 1253376~4096] 3.c7bd6ff1) v4
    -7> 2013-03-03 13:02:13.571733 7f5d501a0700  5 --OSD::tracker-- reqid: client.11755.0:2625657, seq: 5393224, time: 2013-03-03 13:02:13.571733, event: waiting for subops from [1], request: osd_op(client.11755.0:2625657 rb.0.2ea4.614c277f.00000000003d [write 1253376~4096] 3.c7bd6ff1) v4
    -6> 2013-03-03 13:02:13.598028 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.13039.0:6860359, seq: 5393222, time: 2013-03-03 13:02:13.598027, event: journaled_completion_queued, request: osd_sub_op(client.13039.0:6860359 3.0 a10c17c8/rb.0.2dd7.16d28c4f.00000000002f/head//3 [] v 411'1980074 snapset=0=[]:[] snapc=0=[]) v7
    -5> 2013-03-03 13:02:13.598061 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.14000.1:570701, seq: 5393223, time: 2013-03-03 13:02:13.598061, event: journaled_completion_queued, request: osd_sub_op(client.14000.1:570701 0.22 40dccca2/100001164ca.00000002/head//0 [] v 411'447369 snapset=0=[]:[] snapc=0=[]) v7
    -4> 2013-03-03 13:02:13.598081 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625658, seq: 5393225, time: 2013-03-03 13:02:13.598081, event: journaled_completion_queued, request: osd_sub_op(client.11755.0:2625658 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095529 snapset=0=[]:[] snapc=0=[]) v7
    -3> 2013-03-03 13:02:13.598098 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625659, seq: 5393226, time: 2013-03-03 13:02:13.598098, event: journaled_completion_queued, request: osd_sub_op(client.11755.0:2625659 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095530 snapset=0=[]:[] snapc=0=[]) v7
    -2> 2013-03-03 13:02:13.598134 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625660, seq: 5393227, time: 2013-03-03 13:02:13.598134, event: write_thread_in_journal_buffer, request: osd_sub_op(client.11755.0:2625660 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095531 snapset=0=[]:[] snapc=0=[]) v7
    -1> 2013-03-03 13:02:13.598257 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625660, seq: 5393227, time: 2013-03-03 13:02:13.598257, event: journaled_completion_queued, request: osd_sub_op(client.11755.0:2625660 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095531 snapset=0=[]:[] snapc=0=[]) v7
     0> 2013-03-03 13:02:13.753064 7f5d4c097700 -1 *** Caught signal (Aborted) **
 in thread 7f5d4c097700

 ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
 1: /usr/bin/ceph-osd() [0x78430a]
 2: (()+0xfcb0) [0x7f5d60fc3cb0]
 3: (gsignal()+0x35) [0x7f5d5f982425]
 4: (abort()+0x17b) [0x7f5d5f985b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f5d602d469d]
 6: (()+0xb5846) [0x7f5d602d2846]
 7: (()+0xb5873) [0x7f5d602d2873]
 8: (()+0xb596e) [0x7f5d602d296e]
 9: (ceph::buffer::create_page_aligned(unsigned int)+0x95) [0x82ef25]
 10: (Pipe::read_message(Message**)+0x2421) [0x8d3591]
 11: (Pipe::reader()+0x8c2) [0x8e3db2]
 12: (Pipe::Reader::entry()+0xd) [0x8e668d]
 13: (()+0x7e9a) [0x7f5d60fbbe9a]
 14: (clone()+0x6d) [0x7f5d5fa3fcbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent    100000
  max_new         1000
  log_file /var/log/ceph/osd.0.log
--- end dump of recent events ---
root@ha1:/var/log/ceph# 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2012-09-04  8:13           ` Andrey Korolyov
@ 2012-09-04 15:32             ` Sage Weil
  0 siblings, 0 replies; 28+ messages in thread
From: Sage Weil @ 2012-09-04 15:32 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: Gregory Farnum, ceph-devel

On Tue, 4 Sep 2012, Andrey Korolyov wrote:
> Hi,
> 
> Almost always one or more osd dies when doing overlapped recovery -
> e.g. add new crushmap and remove some newly added osds from cluster
> some minutes later during remap or inject two slightly different
> crushmaps after a short time(surely preserving at least one of
> replicas online). Seems that osd dying on excessive amount of
> operations in queue because under normal test, e.g. rados, iowait does
> not break one percent barrier but during recovery it may raise up to
> ten percents(2108 w/ cache, splitted disks as R0 each).
> 
> #0  0x00007f62f193a445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f62f193db9b in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007f62f2236665 in __gnu_cxx::__verbose_terminate_handler() ()
> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #3  0x00007f62f2234796 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #4  0x00007f62f22347c3 in std::terminate() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #5  0x00007f62f22349ee in __cxa_throw () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #6  0x0000000000844e11 in ceph::__ceph_assert_fail(char const*, char
> const*, int, char const*) ()
> #7  0x000000000073148f in
> FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long,
> int) ()

Can you install debug symbols to see what line number this is one (e.g. 
apt-get install ceph-dbg), or check in the log file to see what the assert 
failure is?

Thanks!
sage


> #8  0x000000000073484e in
> FileStore::do_transactions(std::list<ObjectStore::Transaction*,
> std::allocator<ObjectStore::Transaction*> >&, unsigned long) ()
> #9  0x000000000070c680 in FileStore::_do_op(FileStore::OpSequencer*) ()
> #10 0x000000000083ce01 in ThreadPool::worker() ()
> #11 0x00000000006823ed in ThreadPool::WorkThread::entry() ()
> #12 0x00007f62f345ee9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #13 0x00007f62f19f64cd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #14 0x0000000000000000 in ?? ()
> ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
> 
> On Sun, Aug 26, 2012 at 8:52 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> > During recovery, following crash happens(simular to
> > http://tracker.newdream.net/issues/2126 which marked resolved long
> > ago):
> >
> > http://xdel.ru/downloads/ceph-log/osd-2012-08-26.txt
> >
> > On Sat, Aug 25, 2012 at 12:30 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> >> On Thu, Aug 23, 2012 at 4:09 AM, Gregory Farnum <greg@inktank.com> wrote:
> >>> The tcmalloc backtrace on the OSD suggests this may be unrelated, but
> >>> what's the fd limit on your monitor process? You may be approaching
> >>> that limit if you've got 500 OSDs and a similar number of clients.
> >>>
> >>
> >> Thanks! I didn`t measured a # of connection because of bearing in mind
> >> 1 conn per client, raising limit did the thing. Previously mentioned
> >> qemu-kvm zombie does not related to rbd itself - it can be created by
> >> destroying libvirt domain which is in saving state or vice-versa, so
> >> I`ll put a workaround on this. Right now I am faced different problem
> >> - osds dying silently, e.g. not leaving a core, I`ll check logs on the
> >> next testing phase.
> >>
> >>> On Wed, Aug 22, 2012 at 6:55 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> >>>> On Thu, Aug 23, 2012 at 2:33 AM, Sage Weil <sage@inktank.com> wrote:
> >>>>> On Thu, 23 Aug 2012, Andrey Korolyov wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> today during heavy test a pair of osds and one mon died, resulting to
> >>>>>> hard lockup of some kvm processes - they went unresponsible and was
> >>>>>> killed leaving zombie processes ([kvm] <defunct>). Entire cluster
> >>>>>> contain sixteen osd on eight nodes and three mons, on first and last
> >>>>>> node and on vm outside cluster.
> >>>>>>
> >>>>>> osd bt:
> >>>>>> #0  0x00007fc37d490be3 in
> >>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> >>>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
> >>>>>> (gdb) bt
> >>>>>> #0  0x00007fc37d490be3 in
> >>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> >>>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
> >>>>>> #1  0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
> >>>>>> /usr/lib/libtcmalloc.so.4
> >>>>>> #2  0x00007fc37d4a2287 in tc_delete () from /usr/lib/libtcmalloc.so.4
> >>>>>> #3  0x00000000008b1224 in _M_dispose (__a=..., this=0x6266d80) at
> >>>>>> /usr/include/c++/4.7/bits/basic_string.h:246
> >>>>>> #4  ~basic_string (this=0x7fc3736639d0, __in_chrg=<optimized out>) at
> >>>>>> /usr/include/c++/4.7/bits/basic_string.h:536
> >>>>>> #5  ~basic_stringbuf (this=0x7fc373663988, __in_chrg=<optimized out>)
> >>>>>> at /usr/include/c++/4.7/sstream:60
> >>>>>> #6  ~basic_ostringstream (this=0x7fc373663980, __in_chrg=<optimized
> >>>>>> out>, __vtt_parm=<optimized out>) at /usr/include/c++/4.7/sstream:439
> >>>>>> #7  pretty_version_to_str () at common/version.cc:40
> >>>>>> #8  0x0000000000791630 in ceph::BackTrace::print (this=0x7fc373663d10,
> >>>>>> out=...) at common/BackTrace.cc:19
> >>>>>> #9  0x000000000078f450 in handle_fatal_signal (signum=11) at
> >>>>>> global/signal_handler.cc:91
> >>>>>> #10 <signal handler called>
> >>>>>> #11 0x00007fc37d490be3 in
> >>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> >>>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
> >>>>>> #12 0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
> >>>>>> /usr/lib/libtcmalloc.so.4
> >>>>>> #13 0x00007fc37d49eb97 in tc_free () from /usr/lib/libtcmalloc.so.4
> >>>>>> #14 0x00007fc37d1c6670 in __gnu_cxx::__verbose_terminate_handler() ()
> >>>>>> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >>>>>> #15 0x00007fc37d1c4796 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >>>>>> #16 0x00007fc37d1c47c3 in std::terminate() () from
> >>>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >>>>>> #17 0x00007fc37d1c49ee in __cxa_throw () from
> >>>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >>>>>> #18 0x0000000000844e11 in ceph::__ceph_assert_fail (assertion=0x90c01c
> >>>>>> "0 == \"unexpected error\"", file=<optimized out>, line=3007,
> >>>>>>     func=0x90ef80 "unsigned int
> >>>>>> FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)")
> >>>>>> at common/assert.cc:77
> >>>>>
> >>>>> This means it got an unexpected error when talking to the file system.  If
> >>>>> you look in the osd log, it may tell you what that was.  (It may
> >>>>> not--there isn't usually the other tcmalloc stuff triggered from the
> >>>>> assert handler.)
> >>>>>
> >>>>> What happens if you restart that ceph-osd daemon?
> >>>>>
> >>>>> sage
> >>>>>
> >>>>>
> >>>>
> >>>> Unfortunately I have completely disabled logs during test, so there
> >>>> are no suggestion of assert_fail. The main problem was revealed -
> >>>> created VMs was pointed to one monitor instead set of three, so there
> >>>> may be some unusual things(btw, crashed mon isn`t one from above, but
> >>>> a neighbor of crashed osds on first node). After IPMI reset node
> >>>> returns back well and cluster behavior seems to be okay - stuck kvm
> >>>> I/O somehow prevented even other module load|unload on this node, so I
> >>>> finally decided to do hard reset. Despite I`m using almost generic
> >>>> wheezy, glibc was updated to 2.15, may be because of this my trace
> >>>> appears first time ever. I`m almost sure that fs does not triggered
> >>>> this crash and mainly suspecting stuck kvm processes. I`ll rerun test
> >>>> with same conditions tomorrow(~500 vms pointed to one mon and very
> >>>> high I/O, but with osd logging).
> >>>>
> >>>>>> #19 0x000000000073148f in FileStore::_do_transaction
> >>>>>> (this=this@entry=0x2cde000, t=..., op_seq=op_seq@entry=429545,
> >>>>>> trans_num=trans_num@entry=0) at os/FileStore.cc:3007
> >>>>>> #20 0x000000000073484e in FileStore::do_transactions (this=0x2cde000,
> >>>>>> tls=..., op_seq=429545) at os/FileStore.cc:2436
> >>>>>> #21 0x000000000070c680 in FileStore::_do_op (this=0x2cde000,
> >>>>>> osr=<optimized out>) at os/FileStore.cc:2259
> >>>>>> #22 0x000000000083ce01 in ThreadPool::worker (this=0x2cde828) at
> >>>>>> common/WorkQueue.cc:54
> >>>>>> #23 0x00000000006823ed in ThreadPool::WorkThread::entry
> >>>>>> (this=<optimized out>) at ./common/WorkQueue.h:126
> >>>>>> #24 0x00007fc37e3eee9a in start_thread () from
> >>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
> >>>>>> #25 0x00007fc37c9864cd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> >>>>>> #26 0x0000000000000000 in ?? ()
> >>>>>>
> >>>>>> mon bt was exactly the same as in http://tracker.newdream.net/issues/2762
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>
> >>>>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2012-08-26 16:52         ` Andrey Korolyov
  2012-08-26 20:44           ` Sage Weil
@ 2012-09-04  8:13           ` Andrey Korolyov
  2012-09-04 15:32             ` Sage Weil
  1 sibling, 1 reply; 28+ messages in thread
From: Andrey Korolyov @ 2012-09-04  8:13 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, ceph-devel

Hi,

Almost always one or more osd dies when doing overlapped recovery -
e.g. add new crushmap and remove some newly added osds from cluster
some minutes later during remap or inject two slightly different
crushmaps after a short time(surely preserving at least one of
replicas online). Seems that osd dying on excessive amount of
operations in queue because under normal test, e.g. rados, iowait does
not break one percent barrier but during recovery it may raise up to
ten percents(2108 w/ cache, splitted disks as R0 each).

#0  0x00007f62f193a445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f62f193db9b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f62f2236665 in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f62f2234796 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f62f22347c3 in std::terminate() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f62f22349ee in __cxa_throw () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x0000000000844e11 in ceph::__ceph_assert_fail(char const*, char
const*, int, char const*) ()
#7  0x000000000073148f in
FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long,
int) ()
#8  0x000000000073484e in
FileStore::do_transactions(std::list<ObjectStore::Transaction*,
std::allocator<ObjectStore::Transaction*> >&, unsigned long) ()
#9  0x000000000070c680 in FileStore::_do_op(FileStore::OpSequencer*) ()
#10 0x000000000083ce01 in ThreadPool::worker() ()
#11 0x00000000006823ed in ThreadPool::WorkThread::entry() ()
#12 0x00007f62f345ee9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007f62f19f64cd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000000000 in ?? ()
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)

On Sun, Aug 26, 2012 at 8:52 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> During recovery, following crash happens(simular to
> http://tracker.newdream.net/issues/2126 which marked resolved long
> ago):
>
> http://xdel.ru/downloads/ceph-log/osd-2012-08-26.txt
>
> On Sat, Aug 25, 2012 at 12:30 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> On Thu, Aug 23, 2012 at 4:09 AM, Gregory Farnum <greg@inktank.com> wrote:
>>> The tcmalloc backtrace on the OSD suggests this may be unrelated, but
>>> what's the fd limit on your monitor process? You may be approaching
>>> that limit if you've got 500 OSDs and a similar number of clients.
>>>
>>
>> Thanks! I didn`t measured a # of connection because of bearing in mind
>> 1 conn per client, raising limit did the thing. Previously mentioned
>> qemu-kvm zombie does not related to rbd itself - it can be created by
>> destroying libvirt domain which is in saving state or vice-versa, so
>> I`ll put a workaround on this. Right now I am faced different problem
>> - osds dying silently, e.g. not leaving a core, I`ll check logs on the
>> next testing phase.
>>
>>> On Wed, Aug 22, 2012 at 6:55 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>>>> On Thu, Aug 23, 2012 at 2:33 AM, Sage Weil <sage@inktank.com> wrote:
>>>>> On Thu, 23 Aug 2012, Andrey Korolyov wrote:
>>>>>> Hi,
>>>>>>
>>>>>> today during heavy test a pair of osds and one mon died, resulting to
>>>>>> hard lockup of some kvm processes - they went unresponsible and was
>>>>>> killed leaving zombie processes ([kvm] <defunct>). Entire cluster
>>>>>> contain sixteen osd on eight nodes and three mons, on first and last
>>>>>> node and on vm outside cluster.
>>>>>>
>>>>>> osd bt:
>>>>>> #0  0x00007fc37d490be3 in
>>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>>>>> (gdb) bt
>>>>>> #0  0x00007fc37d490be3 in
>>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>>>>> #1  0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
>>>>>> /usr/lib/libtcmalloc.so.4
>>>>>> #2  0x00007fc37d4a2287 in tc_delete () from /usr/lib/libtcmalloc.so.4
>>>>>> #3  0x00000000008b1224 in _M_dispose (__a=..., this=0x6266d80) at
>>>>>> /usr/include/c++/4.7/bits/basic_string.h:246
>>>>>> #4  ~basic_string (this=0x7fc3736639d0, __in_chrg=<optimized out>) at
>>>>>> /usr/include/c++/4.7/bits/basic_string.h:536
>>>>>> #5  ~basic_stringbuf (this=0x7fc373663988, __in_chrg=<optimized out>)
>>>>>> at /usr/include/c++/4.7/sstream:60
>>>>>> #6  ~basic_ostringstream (this=0x7fc373663980, __in_chrg=<optimized
>>>>>> out>, __vtt_parm=<optimized out>) at /usr/include/c++/4.7/sstream:439
>>>>>> #7  pretty_version_to_str () at common/version.cc:40
>>>>>> #8  0x0000000000791630 in ceph::BackTrace::print (this=0x7fc373663d10,
>>>>>> out=...) at common/BackTrace.cc:19
>>>>>> #9  0x000000000078f450 in handle_fatal_signal (signum=11) at
>>>>>> global/signal_handler.cc:91
>>>>>> #10 <signal handler called>
>>>>>> #11 0x00007fc37d490be3 in
>>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>>>>> #12 0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
>>>>>> /usr/lib/libtcmalloc.so.4
>>>>>> #13 0x00007fc37d49eb97 in tc_free () from /usr/lib/libtcmalloc.so.4
>>>>>> #14 0x00007fc37d1c6670 in __gnu_cxx::__verbose_terminate_handler() ()
>>>>>> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>>> #15 0x00007fc37d1c4796 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>>> #16 0x00007fc37d1c47c3 in std::terminate() () from
>>>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>>> #17 0x00007fc37d1c49ee in __cxa_throw () from
>>>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>>> #18 0x0000000000844e11 in ceph::__ceph_assert_fail (assertion=0x90c01c
>>>>>> "0 == \"unexpected error\"", file=<optimized out>, line=3007,
>>>>>>     func=0x90ef80 "unsigned int
>>>>>> FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)")
>>>>>> at common/assert.cc:77
>>>>>
>>>>> This means it got an unexpected error when talking to the file system.  If
>>>>> you look in the osd log, it may tell you what that was.  (It may
>>>>> not--there isn't usually the other tcmalloc stuff triggered from the
>>>>> assert handler.)
>>>>>
>>>>> What happens if you restart that ceph-osd daemon?
>>>>>
>>>>> sage
>>>>>
>>>>>
>>>>
>>>> Unfortunately I have completely disabled logs during test, so there
>>>> are no suggestion of assert_fail. The main problem was revealed -
>>>> created VMs was pointed to one monitor instead set of three, so there
>>>> may be some unusual things(btw, crashed mon isn`t one from above, but
>>>> a neighbor of crashed osds on first node). After IPMI reset node
>>>> returns back well and cluster behavior seems to be okay - stuck kvm
>>>> I/O somehow prevented even other module load|unload on this node, so I
>>>> finally decided to do hard reset. Despite I`m using almost generic
>>>> wheezy, glibc was updated to 2.15, may be because of this my trace
>>>> appears first time ever. I`m almost sure that fs does not triggered
>>>> this crash and mainly suspecting stuck kvm processes. I`ll rerun test
>>>> with same conditions tomorrow(~500 vms pointed to one mon and very
>>>> high I/O, but with osd logging).
>>>>
>>>>>> #19 0x000000000073148f in FileStore::_do_transaction
>>>>>> (this=this@entry=0x2cde000, t=..., op_seq=op_seq@entry=429545,
>>>>>> trans_num=trans_num@entry=0) at os/FileStore.cc:3007
>>>>>> #20 0x000000000073484e in FileStore::do_transactions (this=0x2cde000,
>>>>>> tls=..., op_seq=429545) at os/FileStore.cc:2436
>>>>>> #21 0x000000000070c680 in FileStore::_do_op (this=0x2cde000,
>>>>>> osr=<optimized out>) at os/FileStore.cc:2259
>>>>>> #22 0x000000000083ce01 in ThreadPool::worker (this=0x2cde828) at
>>>>>> common/WorkQueue.cc:54
>>>>>> #23 0x00000000006823ed in ThreadPool::WorkThread::entry
>>>>>> (this=<optimized out>) at ./common/WorkQueue.h:126
>>>>>> #24 0x00007fc37e3eee9a in start_thread () from
>>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>>>> #25 0x00007fc37c9864cd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>>>>> #26 0x0000000000000000 in ?? ()
>>>>>>
>>>>>> mon bt was exactly the same as in http://tracker.newdream.net/issues/2762
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2012-08-26 16:52         ` Andrey Korolyov
@ 2012-08-26 20:44           ` Sage Weil
  2012-09-04  8:13           ` Andrey Korolyov
  1 sibling, 0 replies; 28+ messages in thread
From: Sage Weil @ 2012-08-26 20:44 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: Gregory Farnum, ceph-devel

On Sun, 26 Aug 2012, Andrey Korolyov wrote:
> During recovery, following crash happens(simular to
> http://tracker.newdream.net/issues/2126 which marked resolved long
> ago):
> 
> http://xdel.ru/downloads/ceph-log/osd-2012-08-26.txt

This looks very similar to something we hit last week.  The fix is still 
pending (wip_recovery_complete).  Do you have the last few lines of the 
log, that include the failed assertion message?

sage

> 
> On Sat, Aug 25, 2012 at 12:30 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> > On Thu, Aug 23, 2012 at 4:09 AM, Gregory Farnum <greg@inktank.com> wrote:
> >> The tcmalloc backtrace on the OSD suggests this may be unrelated, but
> >> what's the fd limit on your monitor process? You may be approaching
> >> that limit if you've got 500 OSDs and a similar number of clients.
> >>
> >
> > Thanks! I didn`t measured a # of connection because of bearing in mind
> > 1 conn per client, raising limit did the thing. Previously mentioned
> > qemu-kvm zombie does not related to rbd itself - it can be created by
> > destroying libvirt domain which is in saving state or vice-versa, so
> > I`ll put a workaround on this. Right now I am faced different problem
> > - osds dying silently, e.g. not leaving a core, I`ll check logs on the
> > next testing phase.
> >
> >> On Wed, Aug 22, 2012 at 6:55 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> >>> On Thu, Aug 23, 2012 at 2:33 AM, Sage Weil <sage@inktank.com> wrote:
> >>>> On Thu, 23 Aug 2012, Andrey Korolyov wrote:
> >>>>> Hi,
> >>>>>
> >>>>> today during heavy test a pair of osds and one mon died, resulting to
> >>>>> hard lockup of some kvm processes - they went unresponsible and was
> >>>>> killed leaving zombie processes ([kvm] <defunct>). Entire cluster
> >>>>> contain sixteen osd on eight nodes and three mons, on first and last
> >>>>> node and on vm outside cluster.
> >>>>>
> >>>>> osd bt:
> >>>>> #0  0x00007fc37d490be3 in
> >>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> >>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
> >>>>> (gdb) bt
> >>>>> #0  0x00007fc37d490be3 in
> >>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> >>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
> >>>>> #1  0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
> >>>>> /usr/lib/libtcmalloc.so.4
> >>>>> #2  0x00007fc37d4a2287 in tc_delete () from /usr/lib/libtcmalloc.so.4
> >>>>> #3  0x00000000008b1224 in _M_dispose (__a=..., this=0x6266d80) at
> >>>>> /usr/include/c++/4.7/bits/basic_string.h:246
> >>>>> #4  ~basic_string (this=0x7fc3736639d0, __in_chrg=<optimized out>) at
> >>>>> /usr/include/c++/4.7/bits/basic_string.h:536
> >>>>> #5  ~basic_stringbuf (this=0x7fc373663988, __in_chrg=<optimized out>)
> >>>>> at /usr/include/c++/4.7/sstream:60
> >>>>> #6  ~basic_ostringstream (this=0x7fc373663980, __in_chrg=<optimized
> >>>>> out>, __vtt_parm=<optimized out>) at /usr/include/c++/4.7/sstream:439
> >>>>> #7  pretty_version_to_str () at common/version.cc:40
> >>>>> #8  0x0000000000791630 in ceph::BackTrace::print (this=0x7fc373663d10,
> >>>>> out=...) at common/BackTrace.cc:19
> >>>>> #9  0x000000000078f450 in handle_fatal_signal (signum=11) at
> >>>>> global/signal_handler.cc:91
> >>>>> #10 <signal handler called>
> >>>>> #11 0x00007fc37d490be3 in
> >>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> >>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
> >>>>> #12 0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
> >>>>> /usr/lib/libtcmalloc.so.4
> >>>>> #13 0x00007fc37d49eb97 in tc_free () from /usr/lib/libtcmalloc.so.4
> >>>>> #14 0x00007fc37d1c6670 in __gnu_cxx::__verbose_terminate_handler() ()
> >>>>> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >>>>> #15 0x00007fc37d1c4796 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >>>>> #16 0x00007fc37d1c47c3 in std::terminate() () from
> >>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >>>>> #17 0x00007fc37d1c49ee in __cxa_throw () from
> >>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >>>>> #18 0x0000000000844e11 in ceph::__ceph_assert_fail (assertion=0x90c01c
> >>>>> "0 == \"unexpected error\"", file=<optimized out>, line=3007,
> >>>>>     func=0x90ef80 "unsigned int
> >>>>> FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)")
> >>>>> at common/assert.cc:77
> >>>>
> >>>> This means it got an unexpected error when talking to the file system.  If
> >>>> you look in the osd log, it may tell you what that was.  (It may
> >>>> not--there isn't usually the other tcmalloc stuff triggered from the
> >>>> assert handler.)
> >>>>
> >>>> What happens if you restart that ceph-osd daemon?
> >>>>
> >>>> sage
> >>>>
> >>>>
> >>>
> >>> Unfortunately I have completely disabled logs during test, so there
> >>> are no suggestion of assert_fail. The main problem was revealed -
> >>> created VMs was pointed to one monitor instead set of three, so there
> >>> may be some unusual things(btw, crashed mon isn`t one from above, but
> >>> a neighbor of crashed osds on first node). After IPMI reset node
> >>> returns back well and cluster behavior seems to be okay - stuck kvm
> >>> I/O somehow prevented even other module load|unload on this node, so I
> >>> finally decided to do hard reset. Despite I`m using almost generic
> >>> wheezy, glibc was updated to 2.15, may be because of this my trace
> >>> appears first time ever. I`m almost sure that fs does not triggered
> >>> this crash and mainly suspecting stuck kvm processes. I`ll rerun test
> >>> with same conditions tomorrow(~500 vms pointed to one mon and very
> >>> high I/O, but with osd logging).
> >>>
> >>>>> #19 0x000000000073148f in FileStore::_do_transaction
> >>>>> (this=this@entry=0x2cde000, t=..., op_seq=op_seq@entry=429545,
> >>>>> trans_num=trans_num@entry=0) at os/FileStore.cc:3007
> >>>>> #20 0x000000000073484e in FileStore::do_transactions (this=0x2cde000,
> >>>>> tls=..., op_seq=429545) at os/FileStore.cc:2436
> >>>>> #21 0x000000000070c680 in FileStore::_do_op (this=0x2cde000,
> >>>>> osr=<optimized out>) at os/FileStore.cc:2259
> >>>>> #22 0x000000000083ce01 in ThreadPool::worker (this=0x2cde828) at
> >>>>> common/WorkQueue.cc:54
> >>>>> #23 0x00000000006823ed in ThreadPool::WorkThread::entry
> >>>>> (this=<optimized out>) at ./common/WorkQueue.h:126
> >>>>> #24 0x00007fc37e3eee9a in start_thread () from
> >>>>> /lib/x86_64-linux-gnu/libpthread.so.0
> >>>>> #25 0x00007fc37c9864cd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> >>>>> #26 0x0000000000000000 in ?? ()
> >>>>>
> >>>>> mon bt was exactly the same as in http://tracker.newdream.net/issues/2762
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>>>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2012-08-25  8:30       ` Andrey Korolyov
@ 2012-08-26 16:52         ` Andrey Korolyov
  2012-08-26 20:44           ` Sage Weil
  2012-09-04  8:13           ` Andrey Korolyov
  0 siblings, 2 replies; 28+ messages in thread
From: Andrey Korolyov @ 2012-08-26 16:52 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, ceph-devel

During recovery, following crash happens(simular to
http://tracker.newdream.net/issues/2126 which marked resolved long
ago):

http://xdel.ru/downloads/ceph-log/osd-2012-08-26.txt

On Sat, Aug 25, 2012 at 12:30 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> On Thu, Aug 23, 2012 at 4:09 AM, Gregory Farnum <greg@inktank.com> wrote:
>> The tcmalloc backtrace on the OSD suggests this may be unrelated, but
>> what's the fd limit on your monitor process? You may be approaching
>> that limit if you've got 500 OSDs and a similar number of clients.
>>
>
> Thanks! I didn`t measured a # of connection because of bearing in mind
> 1 conn per client, raising limit did the thing. Previously mentioned
> qemu-kvm zombie does not related to rbd itself - it can be created by
> destroying libvirt domain which is in saving state or vice-versa, so
> I`ll put a workaround on this. Right now I am faced different problem
> - osds dying silently, e.g. not leaving a core, I`ll check logs on the
> next testing phase.
>
>> On Wed, Aug 22, 2012 at 6:55 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>>> On Thu, Aug 23, 2012 at 2:33 AM, Sage Weil <sage@inktank.com> wrote:
>>>> On Thu, 23 Aug 2012, Andrey Korolyov wrote:
>>>>> Hi,
>>>>>
>>>>> today during heavy test a pair of osds and one mon died, resulting to
>>>>> hard lockup of some kvm processes - they went unresponsible and was
>>>>> killed leaving zombie processes ([kvm] <defunct>). Entire cluster
>>>>> contain sixteen osd on eight nodes and three mons, on first and last
>>>>> node and on vm outside cluster.
>>>>>
>>>>> osd bt:
>>>>> #0  0x00007fc37d490be3 in
>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>>>> (gdb) bt
>>>>> #0  0x00007fc37d490be3 in
>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>>>> #1  0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
>>>>> /usr/lib/libtcmalloc.so.4
>>>>> #2  0x00007fc37d4a2287 in tc_delete () from /usr/lib/libtcmalloc.so.4
>>>>> #3  0x00000000008b1224 in _M_dispose (__a=..., this=0x6266d80) at
>>>>> /usr/include/c++/4.7/bits/basic_string.h:246
>>>>> #4  ~basic_string (this=0x7fc3736639d0, __in_chrg=<optimized out>) at
>>>>> /usr/include/c++/4.7/bits/basic_string.h:536
>>>>> #5  ~basic_stringbuf (this=0x7fc373663988, __in_chrg=<optimized out>)
>>>>> at /usr/include/c++/4.7/sstream:60
>>>>> #6  ~basic_ostringstream (this=0x7fc373663980, __in_chrg=<optimized
>>>>> out>, __vtt_parm=<optimized out>) at /usr/include/c++/4.7/sstream:439
>>>>> #7  pretty_version_to_str () at common/version.cc:40
>>>>> #8  0x0000000000791630 in ceph::BackTrace::print (this=0x7fc373663d10,
>>>>> out=...) at common/BackTrace.cc:19
>>>>> #9  0x000000000078f450 in handle_fatal_signal (signum=11) at
>>>>> global/signal_handler.cc:91
>>>>> #10 <signal handler called>
>>>>> #11 0x00007fc37d490be3 in
>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>>>> #12 0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
>>>>> /usr/lib/libtcmalloc.so.4
>>>>> #13 0x00007fc37d49eb97 in tc_free () from /usr/lib/libtcmalloc.so.4
>>>>> #14 0x00007fc37d1c6670 in __gnu_cxx::__verbose_terminate_handler() ()
>>>>> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>> #15 0x00007fc37d1c4796 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>> #16 0x00007fc37d1c47c3 in std::terminate() () from
>>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>> #17 0x00007fc37d1c49ee in __cxa_throw () from
>>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>> #18 0x0000000000844e11 in ceph::__ceph_assert_fail (assertion=0x90c01c
>>>>> "0 == \"unexpected error\"", file=<optimized out>, line=3007,
>>>>>     func=0x90ef80 "unsigned int
>>>>> FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)")
>>>>> at common/assert.cc:77
>>>>
>>>> This means it got an unexpected error when talking to the file system.  If
>>>> you look in the osd log, it may tell you what that was.  (It may
>>>> not--there isn't usually the other tcmalloc stuff triggered from the
>>>> assert handler.)
>>>>
>>>> What happens if you restart that ceph-osd daemon?
>>>>
>>>> sage
>>>>
>>>>
>>>
>>> Unfortunately I have completely disabled logs during test, so there
>>> are no suggestion of assert_fail. The main problem was revealed -
>>> created VMs was pointed to one monitor instead set of three, so there
>>> may be some unusual things(btw, crashed mon isn`t one from above, but
>>> a neighbor of crashed osds on first node). After IPMI reset node
>>> returns back well and cluster behavior seems to be okay - stuck kvm
>>> I/O somehow prevented even other module load|unload on this node, so I
>>> finally decided to do hard reset. Despite I`m using almost generic
>>> wheezy, glibc was updated to 2.15, may be because of this my trace
>>> appears first time ever. I`m almost sure that fs does not triggered
>>> this crash and mainly suspecting stuck kvm processes. I`ll rerun test
>>> with same conditions tomorrow(~500 vms pointed to one mon and very
>>> high I/O, but with osd logging).
>>>
>>>>> #19 0x000000000073148f in FileStore::_do_transaction
>>>>> (this=this@entry=0x2cde000, t=..., op_seq=op_seq@entry=429545,
>>>>> trans_num=trans_num@entry=0) at os/FileStore.cc:3007
>>>>> #20 0x000000000073484e in FileStore::do_transactions (this=0x2cde000,
>>>>> tls=..., op_seq=429545) at os/FileStore.cc:2436
>>>>> #21 0x000000000070c680 in FileStore::_do_op (this=0x2cde000,
>>>>> osr=<optimized out>) at os/FileStore.cc:2259
>>>>> #22 0x000000000083ce01 in ThreadPool::worker (this=0x2cde828) at
>>>>> common/WorkQueue.cc:54
>>>>> #23 0x00000000006823ed in ThreadPool::WorkThread::entry
>>>>> (this=<optimized out>) at ./common/WorkQueue.h:126
>>>>> #24 0x00007fc37e3eee9a in start_thread () from
>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>>> #25 0x00007fc37c9864cd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>>>> #26 0x0000000000000000 in ?? ()
>>>>>
>>>>> mon bt was exactly the same as in http://tracker.newdream.net/issues/2762
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2012-08-23  0:09     ` Gregory Farnum
@ 2012-08-25  8:30       ` Andrey Korolyov
  2012-08-26 16:52         ` Andrey Korolyov
  0 siblings, 1 reply; 28+ messages in thread
From: Andrey Korolyov @ 2012-08-25  8:30 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, ceph-devel

On Thu, Aug 23, 2012 at 4:09 AM, Gregory Farnum <greg@inktank.com> wrote:
> The tcmalloc backtrace on the OSD suggests this may be unrelated, but
> what's the fd limit on your monitor process? You may be approaching
> that limit if you've got 500 OSDs and a similar number of clients.
>

Thanks! I didn`t measured a # of connection because of bearing in mind
1 conn per client, raising limit did the thing. Previously mentioned
qemu-kvm zombie does not related to rbd itself - it can be created by
destroying libvirt domain which is in saving state or vice-versa, so
I`ll put a workaround on this. Right now I am faced different problem
- osds dying silently, e.g. not leaving a core, I`ll check logs on the
next testing phase.

> On Wed, Aug 22, 2012 at 6:55 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> On Thu, Aug 23, 2012 at 2:33 AM, Sage Weil <sage@inktank.com> wrote:
>>> On Thu, 23 Aug 2012, Andrey Korolyov wrote:
>>>> Hi,
>>>>
>>>> today during heavy test a pair of osds and one mon died, resulting to
>>>> hard lockup of some kvm processes - they went unresponsible and was
>>>> killed leaving zombie processes ([kvm] <defunct>). Entire cluster
>>>> contain sixteen osd on eight nodes and three mons, on first and last
>>>> node and on vm outside cluster.
>>>>
>>>> osd bt:
>>>> #0  0x00007fc37d490be3 in
>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>>> (gdb) bt
>>>> #0  0x00007fc37d490be3 in
>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>>> #1  0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
>>>> /usr/lib/libtcmalloc.so.4
>>>> #2  0x00007fc37d4a2287 in tc_delete () from /usr/lib/libtcmalloc.so.4
>>>> #3  0x00000000008b1224 in _M_dispose (__a=..., this=0x6266d80) at
>>>> /usr/include/c++/4.7/bits/basic_string.h:246
>>>> #4  ~basic_string (this=0x7fc3736639d0, __in_chrg=<optimized out>) at
>>>> /usr/include/c++/4.7/bits/basic_string.h:536
>>>> #5  ~basic_stringbuf (this=0x7fc373663988, __in_chrg=<optimized out>)
>>>> at /usr/include/c++/4.7/sstream:60
>>>> #6  ~basic_ostringstream (this=0x7fc373663980, __in_chrg=<optimized
>>>> out>, __vtt_parm=<optimized out>) at /usr/include/c++/4.7/sstream:439
>>>> #7  pretty_version_to_str () at common/version.cc:40
>>>> #8  0x0000000000791630 in ceph::BackTrace::print (this=0x7fc373663d10,
>>>> out=...) at common/BackTrace.cc:19
>>>> #9  0x000000000078f450 in handle_fatal_signal (signum=11) at
>>>> global/signal_handler.cc:91
>>>> #10 <signal handler called>
>>>> #11 0x00007fc37d490be3 in
>>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>>> #12 0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
>>>> /usr/lib/libtcmalloc.so.4
>>>> #13 0x00007fc37d49eb97 in tc_free () from /usr/lib/libtcmalloc.so.4
>>>> #14 0x00007fc37d1c6670 in __gnu_cxx::__verbose_terminate_handler() ()
>>>> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>> #15 0x00007fc37d1c4796 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>> #16 0x00007fc37d1c47c3 in std::terminate() () from
>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>> #17 0x00007fc37d1c49ee in __cxa_throw () from
>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>> #18 0x0000000000844e11 in ceph::__ceph_assert_fail (assertion=0x90c01c
>>>> "0 == \"unexpected error\"", file=<optimized out>, line=3007,
>>>>     func=0x90ef80 "unsigned int
>>>> FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)")
>>>> at common/assert.cc:77
>>>
>>> This means it got an unexpected error when talking to the file system.  If
>>> you look in the osd log, it may tell you what that was.  (It may
>>> not--there isn't usually the other tcmalloc stuff triggered from the
>>> assert handler.)
>>>
>>> What happens if you restart that ceph-osd daemon?
>>>
>>> sage
>>>
>>>
>>
>> Unfortunately I have completely disabled logs during test, so there
>> are no suggestion of assert_fail. The main problem was revealed -
>> created VMs was pointed to one monitor instead set of three, so there
>> may be some unusual things(btw, crashed mon isn`t one from above, but
>> a neighbor of crashed osds on first node). After IPMI reset node
>> returns back well and cluster behavior seems to be okay - stuck kvm
>> I/O somehow prevented even other module load|unload on this node, so I
>> finally decided to do hard reset. Despite I`m using almost generic
>> wheezy, glibc was updated to 2.15, may be because of this my trace
>> appears first time ever. I`m almost sure that fs does not triggered
>> this crash and mainly suspecting stuck kvm processes. I`ll rerun test
>> with same conditions tomorrow(~500 vms pointed to one mon and very
>> high I/O, but with osd logging).
>>
>>>> #19 0x000000000073148f in FileStore::_do_transaction
>>>> (this=this@entry=0x2cde000, t=..., op_seq=op_seq@entry=429545,
>>>> trans_num=trans_num@entry=0) at os/FileStore.cc:3007
>>>> #20 0x000000000073484e in FileStore::do_transactions (this=0x2cde000,
>>>> tls=..., op_seq=429545) at os/FileStore.cc:2436
>>>> #21 0x000000000070c680 in FileStore::_do_op (this=0x2cde000,
>>>> osr=<optimized out>) at os/FileStore.cc:2259
>>>> #22 0x000000000083ce01 in ThreadPool::worker (this=0x2cde828) at
>>>> common/WorkQueue.cc:54
>>>> #23 0x00000000006823ed in ThreadPool::WorkThread::entry
>>>> (this=<optimized out>) at ./common/WorkQueue.h:126
>>>> #24 0x00007fc37e3eee9a in start_thread () from
>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>> #25 0x00007fc37c9864cd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>>> #26 0x0000000000000000 in ?? ()
>>>>
>>>> mon bt was exactly the same as in http://tracker.newdream.net/issues/2762
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2012-08-22 22:55   ` Andrey Korolyov
@ 2012-08-23  0:09     ` Gregory Farnum
  2012-08-25  8:30       ` Andrey Korolyov
  0 siblings, 1 reply; 28+ messages in thread
From: Gregory Farnum @ 2012-08-23  0:09 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: Sage Weil, ceph-devel

The tcmalloc backtrace on the OSD suggests this may be unrelated, but
what's the fd limit on your monitor process? You may be approaching
that limit if you've got 500 OSDs and a similar number of clients.

On Wed, Aug 22, 2012 at 6:55 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> On Thu, Aug 23, 2012 at 2:33 AM, Sage Weil <sage@inktank.com> wrote:
>> On Thu, 23 Aug 2012, Andrey Korolyov wrote:
>>> Hi,
>>>
>>> today during heavy test a pair of osds and one mon died, resulting to
>>> hard lockup of some kvm processes - they went unresponsible and was
>>> killed leaving zombie processes ([kvm] <defunct>). Entire cluster
>>> contain sixteen osd on eight nodes and three mons, on first and last
>>> node and on vm outside cluster.
>>>
>>> osd bt:
>>> #0  0x00007fc37d490be3 in
>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>> (gdb) bt
>>> #0  0x00007fc37d490be3 in
>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>> #1  0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
>>> /usr/lib/libtcmalloc.so.4
>>> #2  0x00007fc37d4a2287 in tc_delete () from /usr/lib/libtcmalloc.so.4
>>> #3  0x00000000008b1224 in _M_dispose (__a=..., this=0x6266d80) at
>>> /usr/include/c++/4.7/bits/basic_string.h:246
>>> #4  ~basic_string (this=0x7fc3736639d0, __in_chrg=<optimized out>) at
>>> /usr/include/c++/4.7/bits/basic_string.h:536
>>> #5  ~basic_stringbuf (this=0x7fc373663988, __in_chrg=<optimized out>)
>>> at /usr/include/c++/4.7/sstream:60
>>> #6  ~basic_ostringstream (this=0x7fc373663980, __in_chrg=<optimized
>>> out>, __vtt_parm=<optimized out>) at /usr/include/c++/4.7/sstream:439
>>> #7  pretty_version_to_str () at common/version.cc:40
>>> #8  0x0000000000791630 in ceph::BackTrace::print (this=0x7fc373663d10,
>>> out=...) at common/BackTrace.cc:19
>>> #9  0x000000000078f450 in handle_fatal_signal (signum=11) at
>>> global/signal_handler.cc:91
>>> #10 <signal handler called>
>>> #11 0x00007fc37d490be3 in
>>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>>> #12 0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
>>> /usr/lib/libtcmalloc.so.4
>>> #13 0x00007fc37d49eb97 in tc_free () from /usr/lib/libtcmalloc.so.4
>>> #14 0x00007fc37d1c6670 in __gnu_cxx::__verbose_terminate_handler() ()
>>> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>> #15 0x00007fc37d1c4796 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>> #16 0x00007fc37d1c47c3 in std::terminate() () from
>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>> #17 0x00007fc37d1c49ee in __cxa_throw () from
>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>> #18 0x0000000000844e11 in ceph::__ceph_assert_fail (assertion=0x90c01c
>>> "0 == \"unexpected error\"", file=<optimized out>, line=3007,
>>>     func=0x90ef80 "unsigned int
>>> FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)")
>>> at common/assert.cc:77
>>
>> This means it got an unexpected error when talking to the file system.  If
>> you look in the osd log, it may tell you what that was.  (It may
>> not--there isn't usually the other tcmalloc stuff triggered from the
>> assert handler.)
>>
>> What happens if you restart that ceph-osd daemon?
>>
>> sage
>>
>>
>
> Unfortunately I have completely disabled logs during test, so there
> are no suggestion of assert_fail. The main problem was revealed -
> created VMs was pointed to one monitor instead set of three, so there
> may be some unusual things(btw, crashed mon isn`t one from above, but
> a neighbor of crashed osds on first node). After IPMI reset node
> returns back well and cluster behavior seems to be okay - stuck kvm
> I/O somehow prevented even other module load|unload on this node, so I
> finally decided to do hard reset. Despite I`m using almost generic
> wheezy, glibc was updated to 2.15, may be because of this my trace
> appears first time ever. I`m almost sure that fs does not triggered
> this crash and mainly suspecting stuck kvm processes. I`ll rerun test
> with same conditions tomorrow(~500 vms pointed to one mon and very
> high I/O, but with osd logging).
>
>>> #19 0x000000000073148f in FileStore::_do_transaction
>>> (this=this@entry=0x2cde000, t=..., op_seq=op_seq@entry=429545,
>>> trans_num=trans_num@entry=0) at os/FileStore.cc:3007
>>> #20 0x000000000073484e in FileStore::do_transactions (this=0x2cde000,
>>> tls=..., op_seq=429545) at os/FileStore.cc:2436
>>> #21 0x000000000070c680 in FileStore::_do_op (this=0x2cde000,
>>> osr=<optimized out>) at os/FileStore.cc:2259
>>> #22 0x000000000083ce01 in ThreadPool::worker (this=0x2cde828) at
>>> common/WorkQueue.cc:54
>>> #23 0x00000000006823ed in ThreadPool::WorkThread::entry
>>> (this=<optimized out>) at ./common/WorkQueue.h:126
>>> #24 0x00007fc37e3eee9a in start_thread () from
>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>> #25 0x00007fc37c9864cd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #26 0x0000000000000000 in ?? ()
>>>
>>> mon bt was exactly the same as in http://tracker.newdream.net/issues/2762
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2012-08-22 22:33 ` Sage Weil
@ 2012-08-22 22:55   ` Andrey Korolyov
  2012-08-23  0:09     ` Gregory Farnum
  0 siblings, 1 reply; 28+ messages in thread
From: Andrey Korolyov @ 2012-08-22 22:55 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Thu, Aug 23, 2012 at 2:33 AM, Sage Weil <sage@inktank.com> wrote:
> On Thu, 23 Aug 2012, Andrey Korolyov wrote:
>> Hi,
>>
>> today during heavy test a pair of osds and one mon died, resulting to
>> hard lockup of some kvm processes - they went unresponsible and was
>> killed leaving zombie processes ([kvm] <defunct>). Entire cluster
>> contain sixteen osd on eight nodes and three mons, on first and last
>> node and on vm outside cluster.
>>
>> osd bt:
>> #0  0x00007fc37d490be3 in
>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>> (gdb) bt
>> #0  0x00007fc37d490be3 in
>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>> #1  0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
>> /usr/lib/libtcmalloc.so.4
>> #2  0x00007fc37d4a2287 in tc_delete () from /usr/lib/libtcmalloc.so.4
>> #3  0x00000000008b1224 in _M_dispose (__a=..., this=0x6266d80) at
>> /usr/include/c++/4.7/bits/basic_string.h:246
>> #4  ~basic_string (this=0x7fc3736639d0, __in_chrg=<optimized out>) at
>> /usr/include/c++/4.7/bits/basic_string.h:536
>> #5  ~basic_stringbuf (this=0x7fc373663988, __in_chrg=<optimized out>)
>> at /usr/include/c++/4.7/sstream:60
>> #6  ~basic_ostringstream (this=0x7fc373663980, __in_chrg=<optimized
>> out>, __vtt_parm=<optimized out>) at /usr/include/c++/4.7/sstream:439
>> #7  pretty_version_to_str () at common/version.cc:40
>> #8  0x0000000000791630 in ceph::BackTrace::print (this=0x7fc373663d10,
>> out=...) at common/BackTrace.cc:19
>> #9  0x000000000078f450 in handle_fatal_signal (signum=11) at
>> global/signal_handler.cc:91
>> #10 <signal handler called>
>> #11 0x00007fc37d490be3 in
>> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
>> #12 0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
>> /usr/lib/libtcmalloc.so.4
>> #13 0x00007fc37d49eb97 in tc_free () from /usr/lib/libtcmalloc.so.4
>> #14 0x00007fc37d1c6670 in __gnu_cxx::__verbose_terminate_handler() ()
>> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> #15 0x00007fc37d1c4796 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> #16 0x00007fc37d1c47c3 in std::terminate() () from
>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> #17 0x00007fc37d1c49ee in __cxa_throw () from
>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> #18 0x0000000000844e11 in ceph::__ceph_assert_fail (assertion=0x90c01c
>> "0 == \"unexpected error\"", file=<optimized out>, line=3007,
>>     func=0x90ef80 "unsigned int
>> FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)")
>> at common/assert.cc:77
>
> This means it got an unexpected error when talking to the file system.  If
> you look in the osd log, it may tell you what that was.  (It may
> not--there isn't usually the other tcmalloc stuff triggered from the
> assert handler.)
>
> What happens if you restart that ceph-osd daemon?
>
> sage
>
>

Unfortunately I have completely disabled logs during test, so there
are no suggestion of assert_fail. The main problem was revealed -
created VMs was pointed to one monitor instead set of three, so there
may be some unusual things(btw, crashed mon isn`t one from above, but
a neighbor of crashed osds on first node). After IPMI reset node
returns back well and cluster behavior seems to be okay - stuck kvm
I/O somehow prevented even other module load|unload on this node, so I
finally decided to do hard reset. Despite I`m using almost generic
wheezy, glibc was updated to 2.15, may be because of this my trace
appears first time ever. I`m almost sure that fs does not triggered
this crash and mainly suspecting stuck kvm processes. I`ll rerun test
with same conditions tomorrow(~500 vms pointed to one mon and very
high I/O, but with osd logging).

>> #19 0x000000000073148f in FileStore::_do_transaction
>> (this=this@entry=0x2cde000, t=..., op_seq=op_seq@entry=429545,
>> trans_num=trans_num@entry=0) at os/FileStore.cc:3007
>> #20 0x000000000073484e in FileStore::do_transactions (this=0x2cde000,
>> tls=..., op_seq=429545) at os/FileStore.cc:2436
>> #21 0x000000000070c680 in FileStore::_do_op (this=0x2cde000,
>> osr=<optimized out>) at os/FileStore.cc:2259
>> #22 0x000000000083ce01 in ThreadPool::worker (this=0x2cde828) at
>> common/WorkQueue.cc:54
>> #23 0x00000000006823ed in ThreadPool::WorkThread::entry
>> (this=<optimized out>) at ./common/WorkQueue.h:126
>> #24 0x00007fc37e3eee9a in start_thread () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #25 0x00007fc37c9864cd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #26 0x0000000000000000 in ?? ()
>>
>> mon bt was exactly the same as in http://tracker.newdream.net/issues/2762
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2012-08-22 20:31 OSD crash Andrey Korolyov
@ 2012-08-22 22:33 ` Sage Weil
  2012-08-22 22:55   ` Andrey Korolyov
  0 siblings, 1 reply; 28+ messages in thread
From: Sage Weil @ 2012-08-22 22:33 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: ceph-devel

On Thu, 23 Aug 2012, Andrey Korolyov wrote:
> Hi,
> 
> today during heavy test a pair of osds and one mon died, resulting to
> hard lockup of some kvm processes - they went unresponsible and was
> killed leaving zombie processes ([kvm] <defunct>). Entire cluster
> contain sixteen osd on eight nodes and three mons, on first and last
> node and on vm outside cluster.
> 
> osd bt:
> #0  0x00007fc37d490be3 in
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
> (gdb) bt
> #0  0x00007fc37d490be3 in
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
> #1  0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
> /usr/lib/libtcmalloc.so.4
> #2  0x00007fc37d4a2287 in tc_delete () from /usr/lib/libtcmalloc.so.4
> #3  0x00000000008b1224 in _M_dispose (__a=..., this=0x6266d80) at
> /usr/include/c++/4.7/bits/basic_string.h:246
> #4  ~basic_string (this=0x7fc3736639d0, __in_chrg=<optimized out>) at
> /usr/include/c++/4.7/bits/basic_string.h:536
> #5  ~basic_stringbuf (this=0x7fc373663988, __in_chrg=<optimized out>)
> at /usr/include/c++/4.7/sstream:60
> #6  ~basic_ostringstream (this=0x7fc373663980, __in_chrg=<optimized
> out>, __vtt_parm=<optimized out>) at /usr/include/c++/4.7/sstream:439
> #7  pretty_version_to_str () at common/version.cc:40
> #8  0x0000000000791630 in ceph::BackTrace::print (this=0x7fc373663d10,
> out=...) at common/BackTrace.cc:19
> #9  0x000000000078f450 in handle_fatal_signal (signum=11) at
> global/signal_handler.cc:91
> #10 <signal handler called>
> #11 0x00007fc37d490be3 in
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
> unsigned long, int) () from /usr/lib/libtcmalloc.so.4
> #12 0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
> /usr/lib/libtcmalloc.so.4
> #13 0x00007fc37d49eb97 in tc_free () from /usr/lib/libtcmalloc.so.4
> #14 0x00007fc37d1c6670 in __gnu_cxx::__verbose_terminate_handler() ()
> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #15 0x00007fc37d1c4796 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #16 0x00007fc37d1c47c3 in std::terminate() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #17 0x00007fc37d1c49ee in __cxa_throw () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #18 0x0000000000844e11 in ceph::__ceph_assert_fail (assertion=0x90c01c
> "0 == \"unexpected error\"", file=<optimized out>, line=3007,
>     func=0x90ef80 "unsigned int
> FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)")
> at common/assert.cc:77

This means it got an unexpected error when talking to the file system.  If 
you look in the osd log, it may tell you what that was.  (It may 
not--there isn't usually the other tcmalloc stuff triggered from the 
assert handler.)

What happens if you restart that ceph-osd daemon?

sage


> #19 0x000000000073148f in FileStore::_do_transaction
> (this=this@entry=0x2cde000, t=..., op_seq=op_seq@entry=429545,
> trans_num=trans_num@entry=0) at os/FileStore.cc:3007
> #20 0x000000000073484e in FileStore::do_transactions (this=0x2cde000,
> tls=..., op_seq=429545) at os/FileStore.cc:2436
> #21 0x000000000070c680 in FileStore::_do_op (this=0x2cde000,
> osr=<optimized out>) at os/FileStore.cc:2259
> #22 0x000000000083ce01 in ThreadPool::worker (this=0x2cde828) at
> common/WorkQueue.cc:54
> #23 0x00000000006823ed in ThreadPool::WorkThread::entry
> (this=<optimized out>) at ./common/WorkQueue.h:126
> #24 0x00007fc37e3eee9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #25 0x00007fc37c9864cd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #26 0x0000000000000000 in ?? ()
> 
> mon bt was exactly the same as in http://tracker.newdream.net/issues/2762
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* OSD crash
@ 2012-08-22 20:31 Andrey Korolyov
  2012-08-22 22:33 ` Sage Weil
  0 siblings, 1 reply; 28+ messages in thread
From: Andrey Korolyov @ 2012-08-22 20:31 UTC (permalink / raw)
  To: ceph-devel

Hi,

today during heavy test a pair of osds and one mon died, resulting to
hard lockup of some kvm processes - they went unresponsible and was
killed leaving zombie processes ([kvm] <defunct>). Entire cluster
contain sixteen osd on eight nodes and three mons, on first and last
node and on vm outside cluster.

osd bt:
#0  0x00007fc37d490be3 in
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
unsigned long, int) () from /usr/lib/libtcmalloc.so.4
(gdb) bt
#0  0x00007fc37d490be3 in
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
unsigned long, int) () from /usr/lib/libtcmalloc.so.4
#1  0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
/usr/lib/libtcmalloc.so.4
#2  0x00007fc37d4a2287 in tc_delete () from /usr/lib/libtcmalloc.so.4
#3  0x00000000008b1224 in _M_dispose (__a=..., this=0x6266d80) at
/usr/include/c++/4.7/bits/basic_string.h:246
#4  ~basic_string (this=0x7fc3736639d0, __in_chrg=<optimized out>) at
/usr/include/c++/4.7/bits/basic_string.h:536
#5  ~basic_stringbuf (this=0x7fc373663988, __in_chrg=<optimized out>)
at /usr/include/c++/4.7/sstream:60
#6  ~basic_ostringstream (this=0x7fc373663980, __in_chrg=<optimized
out>, __vtt_parm=<optimized out>) at /usr/include/c++/4.7/sstream:439
#7  pretty_version_to_str () at common/version.cc:40
#8  0x0000000000791630 in ceph::BackTrace::print (this=0x7fc373663d10,
out=...) at common/BackTrace.cc:19
#9  0x000000000078f450 in handle_fatal_signal (signum=11) at
global/signal_handler.cc:91
#10 <signal handler called>
#11 0x00007fc37d490be3 in
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
unsigned long, int) () from /usr/lib/libtcmalloc.so.4
#12 0x00007fc37d490eb4 in tcmalloc::ThreadCache::Scavenge() () from
/usr/lib/libtcmalloc.so.4
#13 0x00007fc37d49eb97 in tc_free () from /usr/lib/libtcmalloc.so.4
#14 0x00007fc37d1c6670 in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#15 0x00007fc37d1c4796 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#16 0x00007fc37d1c47c3 in std::terminate() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007fc37d1c49ee in __cxa_throw () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x0000000000844e11 in ceph::__ceph_assert_fail (assertion=0x90c01c
"0 == \"unexpected error\"", file=<optimized out>, line=3007,
    func=0x90ef80 "unsigned int
FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int)")
at common/assert.cc:77
#19 0x000000000073148f in FileStore::_do_transaction
(this=this@entry=0x2cde000, t=..., op_seq=op_seq@entry=429545,
trans_num=trans_num@entry=0) at os/FileStore.cc:3007
#20 0x000000000073484e in FileStore::do_transactions (this=0x2cde000,
tls=..., op_seq=429545) at os/FileStore.cc:2436
#21 0x000000000070c680 in FileStore::_do_op (this=0x2cde000,
osr=<optimized out>) at os/FileStore.cc:2259
#22 0x000000000083ce01 in ThreadPool::worker (this=0x2cde828) at
common/WorkQueue.cc:54
#23 0x00000000006823ed in ThreadPool::WorkThread::entry
(this=<optimized out>) at ./common/WorkQueue.h:126
#24 0x00007fc37e3eee9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#25 0x00007fc37c9864cd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#26 0x0000000000000000 in ?? ()

mon bt was exactly the same as in http://tracker.newdream.net/issues/2762

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2012-06-17 21:16   ` Sage Weil
@ 2012-06-18  6:41     ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-18  6:41 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Am 17.06.2012 23:16, schrieb Sage Weil:
> Hi Stefan,
>
> I opened http://tracker.newdream.net/issues/2599 to track this, but the
> dump strangely does not include the ceph version or commit sha1.  What
> version were you running?
Sorry that was my build system it accidently removed the .git dir while 
builing so the version string couldn't be compiled in.

It was 5efaa8d7799347dfae38333b1fd6e1a87dc76b28

Stefan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2012-06-16 13:34 ` Stefan Priebe
@ 2012-06-17 21:16   ` Sage Weil
  2012-06-18  6:41     ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 28+ messages in thread
From: Sage Weil @ 2012-06-17 21:16 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: ceph-devel

Hi Stefan,

I opened http://tracker.newdream.net/issues/2599 to track this, but the 
dump strangely does not include the ceph version or commit sha1.  What 
version were you running?

Thanks!
sage


On Sat, 16 Jun 2012, Stefan Priebe wrote:

> and another crash again ;-(
> 
> 
>      0> 2012-06-16 15:31:32.524369 7fd8935c4700 -1 ./common/Mutex.h: In
> function 'void Mutex::Lock(bool)' thread 7fd8935c4700 time 2012-06-16
> 15:31:32.522446
> ./common/Mutex.h: 110: FAILED assert(r == 0)
> 
>  ceph version  (commit:)
>  1: /usr/bin/ceph-osd() [0x51a07d]
>  2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c5a]
>  3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2e4) [0x684374]
>  4: (ThreadPool::worker()+0xbb7) [0x7bc087]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x5f144d]
>  6: (()+0x68ca) [0x7fd89db3a8ca]
>  7: (clone()+0x6d) [0x7fd89c1bec0d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
> 
> --- end dump of recent events ---
> 2012-06-16 15:31:32.531567 7fd8935c4700 -1 *** Caught signal (Aborted) **
>  in thread 7fd8935c4700
> 
>  ceph version  (commit:)
>  1: /usr/bin/ceph-osd() [0x70e4b9]
>  2: (()+0xeff0) [0x7fd89db42ff0]
>  3: (gsignal()+0x35) [0x7fd89c121225]
>  4: (abort()+0x180) [0x7fd89c124030]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd89c9b5dc5]
>  6: (()+0xcb166) [0x7fd89c9b4166]
>  7: (()+0xcb193) [0x7fd89c9b4193]
>  8: (()+0xcb28e) [0x7fd89c9b428e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x940) [0x78af20]
>  10: /usr/bin/ceph-osd() [0x51a07d]
>  11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c5a]
>  12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2e4) [0x684374]
>  13: (ThreadPool::worker()+0xbb7) [0x7bc087]
>  14: (ThreadPool::WorkThread::entry()+0xd) [0x5f144d]
>  15: (()+0x68ca) [0x7fd89db3a8ca]
>  16: (clone()+0x6d) [0x7fd89c1bec0d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
> 
> --- begin dump of recent events ---
>      0> 2012-06-16 15:31:32.531567 7fd8935c4700 -1 *** Caught signal (Aborted)
> **
>  in thread 7fd8935c4700
> 
>  ceph version  (commit:)
>  1: /usr/bin/ceph-osd() [0x70e4b9]
>  2: (()+0xeff0) [0x7fd89db42ff0]
>  3: (gsignal()+0x35) [0x7fd89c121225]
>  4: (abort()+0x180) [0x7fd89c124030]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd89c9b5dc5]
>  6: (()+0xcb166) [0x7fd89c9b4166]
>  7: (()+0xcb193) [0x7fd89c9b4193]
>  8: (()+0xcb28e) [0x7fd89c9b428e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x940) [0x78af20]
>  10: /usr/bin/ceph-osd() [0x51a07d]
>  11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c5a]
>  12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2e4) [0x684374]
>  13: (ThreadPool::worker()+0xbb7) [0x7bc087]
>  14: (ThreadPool::WorkThread::entry()+0xd) [0x5f144d]
>  15: (()+0x68ca) [0x7fd89db3a8ca]
>  16: (clone()+0x6d) [0x7fd89c1bec0d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
> 
> --- end dump of recent events ---
> 
> Am 16.06.2012 14:57, schrieb Stefan Priebe:
> > Hi,
> > 
> > today i got another osd crash ;-( Strangely the osd logs are all empty.
> > It seems the logrotate hasn't reloaded the daemons but i still have the
> > core dump file? What's next?
> > 
> > Stefan
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD crash
  2012-06-16 12:57 Stefan Priebe
@ 2012-06-16 13:34 ` Stefan Priebe
  2012-06-17 21:16   ` Sage Weil
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Priebe @ 2012-06-16 13:34 UTC (permalink / raw)
  To: ceph-devel

and another crash again ;-(


      0> 2012-06-16 15:31:32.524369 7fd8935c4700 -1 ./common/Mutex.h: In 
function 'void Mutex::Lock(bool)' thread 7fd8935c4700 time 2012-06-16 
15:31:32.522446
./common/Mutex.h: 110: FAILED assert(r == 0)

  ceph version  (commit:)
  1: /usr/bin/ceph-osd() [0x51a07d]
  2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c5a]
  3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2e4) [0x684374]
  4: (ThreadPool::worker()+0xbb7) [0x7bc087]
  5: (ThreadPool::WorkThread::entry()+0xd) [0x5f144d]
  6: (()+0x68ca) [0x7fd89db3a8ca]
  7: (clone()+0x6d) [0x7fd89c1bec0d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- end dump of recent events ---
2012-06-16 15:31:32.531567 7fd8935c4700 -1 *** Caught signal (Aborted) **
  in thread 7fd8935c4700

  ceph version  (commit:)
  1: /usr/bin/ceph-osd() [0x70e4b9]
  2: (()+0xeff0) [0x7fd89db42ff0]
  3: (gsignal()+0x35) [0x7fd89c121225]
  4: (abort()+0x180) [0x7fd89c124030]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd89c9b5dc5]
  6: (()+0xcb166) [0x7fd89c9b4166]
  7: (()+0xcb193) [0x7fd89c9b4193]
  8: (()+0xcb28e) [0x7fd89c9b428e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x940) [0x78af20]
  10: /usr/bin/ceph-osd() [0x51a07d]
  11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c5a]
  12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2e4) [0x684374]
  13: (ThreadPool::worker()+0xbb7) [0x7bc087]
  14: (ThreadPool::WorkThread::entry()+0xd) [0x5f144d]
  15: (()+0x68ca) [0x7fd89db3a8ca]
  16: (clone()+0x6d) [0x7fd89c1bec0d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- begin dump of recent events ---
      0> 2012-06-16 15:31:32.531567 7fd8935c4700 -1 *** Caught signal 
(Aborted) **
  in thread 7fd8935c4700

  ceph version  (commit:)
  1: /usr/bin/ceph-osd() [0x70e4b9]
  2: (()+0xeff0) [0x7fd89db42ff0]
  3: (gsignal()+0x35) [0x7fd89c121225]
  4: (abort()+0x180) [0x7fd89c124030]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd89c9b5dc5]
  6: (()+0xcb166) [0x7fd89c9b4166]
  7: (()+0xcb193) [0x7fd89c9b4193]
  8: (()+0xcb28e) [0x7fd89c9b428e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x940) [0x78af20]
  10: /usr/bin/ceph-osd() [0x51a07d]
  11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c5a]
  12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2e4) [0x684374]
  13: (ThreadPool::worker()+0xbb7) [0x7bc087]
  14: (ThreadPool::WorkThread::entry()+0xd) [0x5f144d]
  15: (()+0x68ca) [0x7fd89db3a8ca]
  16: (clone()+0x6d) [0x7fd89c1bec0d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- end dump of recent events ---

Am 16.06.2012 14:57, schrieb Stefan Priebe:
> Hi,
>
> today i got another osd crash ;-( Strangely the osd logs are all empty.
> It seems the logrotate hasn't reloaded the daemons but i still have the
> core dump file? What's next?
>
> Stefan
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* OSD crash
@ 2012-06-16 12:57 Stefan Priebe
  2012-06-16 13:34 ` Stefan Priebe
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Priebe @ 2012-06-16 12:57 UTC (permalink / raw)
  To: ceph-devel

Hi,

today i got another osd crash ;-( Strangely the osd logs are all empty. 
It seems the logrotate hasn't reloaded the daemons but i still have the 
core dump file? What's next?

Stefan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: OSD Crash
  2011-05-13 17:03   ` Mark Nigh
@ 2011-05-13 18:34     ` Sage Weil
  0 siblings, 0 replies; 28+ messages in thread
From: Sage Weil @ 2011-05-13 18:34 UTC (permalink / raw)
  To: Mark Nigh; +Cc: Colin McCabe, ceph-devel

On Fri, 13 May 2011, Mark Nigh wrote:
> be handled with these type of HBAs? I assume that a manual --mkfs 
> --mkjournal for the osd is required in addition to the a restart of the 
> particular cosd?

Yes (though --mkjournal is implied by --mkfs).  I just wrote a quick wiki 
page for this:

	http://ceph.newdream.net/wiki/Replacing_a_failed_disk/OSD

Please let us know if anything is unclear or incorrect.

sage

> 
> Mark Nigh
> Systems Architect
> mnigh@netelligent.com
> Netelligent Corporation
> 
> 
> -----Original Message-----
> From: rarecactus@gmail.com [mailto:rarecactus@gmail.com] On Behalf Of Colin McCabe
> Sent: Wednesday, May 11, 2011 4:39 PM
> To: Mark Nigh
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: OSD Crash
> 
> On Wed, May 11, 2011 at 1:47 PM, Mark Nigh <mnigh@netelligent.com> wrote:
> > Some additional testing shows that the underlying filesystem btrfs does fail thus the daemon appropriately fails.
> >
> > The way I am simulating a failed HDD is by removing the HDD. The failure is working,
> > but the problem is when I reinsert the HDD. I think I see the BTRFS filesystem
> > recovery (btrfs filesystem show) and I can start the correct osd daemon that corresponds
> > to the mount point but I do not see the osd come up and in (ceph -s).
> 
> I am assuming that you are using serial ATA (SATA), because that is
> what comes standard on most PCs these days. The last time I used SATA
> hotplug, which was in 2009, the driver support was still pretty flaky
> at the chipset level. Things may have improved since then, but I
> suspect that SATA hotplug is still an uncommon and poorly tested
> operation.
> 
> If you really want to get hotplug working, start with the basics--
> just see whether it works at the driver and block device level. Then
> get btrfs involved, and finally if all that works, try Ceph.
> 
> However, it might be simpler just to simulate failures by randomly
> killing cosd processes.
> When cosd hangs on a sync for too long, it will deliver a SIGABRT to
> itself anyway. So the only part of the proecss you would be bypassing
> is the hang. You're putting your SATA driver and btrfs's error
> handling paths through a workout, but not really testing Ceph per se.
> 
> Colin
> 
> 
> >
> >  ceph version 0.27.commit: 793034c62c8e9ffab4af675ca97135fd1b193c9c. process: cosd. pid: 2702
> > 2011-05-11 15:13:58.650515 7fc6a349d760 filestore(/mnt/osd2) mount FIEMAP ioctl is NOT supported
> > 2011-05-11 15:13:58.650754 7fc6a349d760 filestore(/mnt/osd2) mount detected btrfs
> > 2011-05-11 15:13:58.650768 7fc6a349d760 filestore(/mnt/osd2) mount btrfs CLONE_RANGE ioctl is supported
> >
> > If I try to restart the osd daemon, it is unable to kill the process and repeats trying to kill it.
> >
> > Is the underlying file system not recovery like I think? I guess removing and inserting the HDD isn't the correct way to simulate a dead HDD.? Show I following the process of removing the osd, initializing the osd data dir and then restart the osd daemon?
> >
> > Thanks.
> >
> > Mark Nigh
> > Systems Architect
> > Netelligent Corporation
> > mnigh@netelligent.com
> >
> >
> >
> > -----Original Message-----
> > From: Mark Nigh
> > Sent: Wednesday, May 11, 2011 8:12 AM
> > To: 'ceph-devel@vger.kernel.org'
> > Subject: OSD Crash
> >
> > I was performing a few failure test with the osd by removing a HDD from one of the osd host. All was well, the cluster noticed the failure and re-balanced data but when I replace the HDD into the host, the cosd crashed.
> >
> > Here is my setup. 6 osd host with 4 HDDs each (4 cosd daemons running for each host). 1 mon and 2 mds (separate host).
> >
> > Here is the log from the osd0
> >
> > 2011-05-10 16:25:02.776151 7f9e16d36700 -- 10.6.1.92:6800/15566 >> 10.6.1.63:0/2322371038 pipe(0x4315a00 sd=14 pgs=0 cs=0 l=0).accept peer addr is really 10.6.1.63:0/2322371038 (socket is 10.6.1.63:42299/0)
> > os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e22577700'
> > os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
> >  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
> >  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
> >  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
> >  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
> >  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
> >  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
> >  6: (()+0x6d8c) [0x7f9e29434d8c]
> >  7: (clone()+0x6d) [0x7f9e2808204d]
> >  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
> >  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
> >  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
> >  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
> >  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
> >  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
> >  6: (()+0x6d8c) [0x7f9e29434d8c]
> >  7: (clone()+0x6d) [0x7f9e2808204d]
> > os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e21d76700'
> > os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
> >  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
> >  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
> >  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
> >  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
> >  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
> >  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
> >  6: (()+0x6d8c) [0x7f9e29434d8c]
> >  7: (clone()+0x6d) [0x7f9e2808204d]
> >  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
> >  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
> >  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
> >  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
> >  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
> >  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
> >  6: (()+0x6d8c) [0x7f9e29434d8c]
> >  7: (clone()+0x6d) [0x7f9e2808204d]
> > *** Caught signal (Aborted) **
> >  in thread 0x7f9e22577700
> > ceph version 0.27.commit: 793034c62c8e9ffab4af675ca97135fd1b193c9c. process: cosd. pid: 1414
> > 2011-05-10 22:01:13.762083 7f0620492760 filestore(/mnt/osd0) mount FIEMAP ioctl is NOT supported
> > 2011-05-10 22:01:13.762276 7f0620492760 filestore(/mnt/osd0) mount detected btrfs
> > 2011-05-10 22:01:13.762288 7f0620492760 filestore(/mnt/osd0) mount btrfs CLONE_RANGE ioctl is supported
> > *** Caught signal (Terminated) **
> >  in thread 0x7f061e7b4700. Shutting down.
> >
> > As you can see with the attached log, I try to restart the cosd at 22:01. The service is started but ceph -s doesn't include the osd.
> >
> > Thanks for your help.
> >
> > Mark Nigh
> > Systems Architect
> > Netelligent Corporation
> > mnigh@netelligent.com
> >
> >
> >
> > This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: OSD Crash
  2011-05-11 21:39 ` Colin McCabe
@ 2011-05-13 17:03   ` Mark Nigh
  2011-05-13 18:34     ` Sage Weil
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Nigh @ 2011-05-13 17:03 UTC (permalink / raw)
  To: Colin McCabe; +Cc: ceph-devel

Great information. I appreciate it. Actually, I am running SAS RAID controllers (probably not what I will do in production). Each disk is configured as raid 0 so that it shows up in Linux as a single drive each. I think 100% of the problem is with the RAID controller re-insertion of a disk in a RAID 0 group.

Would it be better to use a HBA (without RAID) for SATA/SAS2 16 port? Does anyone know how a failure and an a reinsertion of a new drive will be handled with these type of HBAs? I assume that a manual --mkfs --mkjournal for the osd is required in addition to the a restart of the particular cosd?

Mark Nigh
Systems Architect
mnigh@netelligent.com
Netelligent Corporation


-----Original Message-----
From: rarecactus@gmail.com [mailto:rarecactus@gmail.com] On Behalf Of Colin McCabe
Sent: Wednesday, May 11, 2011 4:39 PM
To: Mark Nigh
Cc: ceph-devel@vger.kernel.org
Subject: Re: OSD Crash

On Wed, May 11, 2011 at 1:47 PM, Mark Nigh <mnigh@netelligent.com> wrote:
> Some additional testing shows that the underlying filesystem btrfs does fail thus the daemon appropriately fails.
>
> The way I am simulating a failed HDD is by removing the HDD. The failure is working,
> but the problem is when I reinsert the HDD. I think I see the BTRFS filesystem
> recovery (btrfs filesystem show) and I can start the correct osd daemon that corresponds
> to the mount point but I do not see the osd come up and in (ceph -s).

I am assuming that you are using serial ATA (SATA), because that is
what comes standard on most PCs these days. The last time I used SATA
hotplug, which was in 2009, the driver support was still pretty flaky
at the chipset level. Things may have improved since then, but I
suspect that SATA hotplug is still an uncommon and poorly tested
operation.

If you really want to get hotplug working, start with the basics--
just see whether it works at the driver and block device level. Then
get btrfs involved, and finally if all that works, try Ceph.

However, it might be simpler just to simulate failures by randomly
killing cosd processes.
When cosd hangs on a sync for too long, it will deliver a SIGABRT to
itself anyway. So the only part of the proecss you would be bypassing
is the hang. You're putting your SATA driver and btrfs's error
handling paths through a workout, but not really testing Ceph per se.

Colin


>
>  ceph version 0.27.commit: 793034c62c8e9ffab4af675ca97135fd1b193c9c. process: cosd. pid: 2702
> 2011-05-11 15:13:58.650515 7fc6a349d760 filestore(/mnt/osd2) mount FIEMAP ioctl is NOT supported
> 2011-05-11 15:13:58.650754 7fc6a349d760 filestore(/mnt/osd2) mount detected btrfs
> 2011-05-11 15:13:58.650768 7fc6a349d760 filestore(/mnt/osd2) mount btrfs CLONE_RANGE ioctl is supported
>
> If I try to restart the osd daemon, it is unable to kill the process and repeats trying to kill it.
>
> Is the underlying file system not recovery like I think? I guess removing and inserting the HDD isn't the correct way to simulate a dead HDD.? Show I following the process of removing the osd, initializing the osd data dir and then restart the osd daemon?
>
> Thanks.
>
> Mark Nigh
> Systems Architect
> Netelligent Corporation
> mnigh@netelligent.com
>
>
>
> -----Original Message-----
> From: Mark Nigh
> Sent: Wednesday, May 11, 2011 8:12 AM
> To: 'ceph-devel@vger.kernel.org'
> Subject: OSD Crash
>
> I was performing a few failure test with the osd by removing a HDD from one of the osd host. All was well, the cluster noticed the failure and re-balanced data but when I replace the HDD into the host, the cosd crashed.
>
> Here is my setup. 6 osd host with 4 HDDs each (4 cosd daemons running for each host). 1 mon and 2 mds (separate host).
>
> Here is the log from the osd0
>
> 2011-05-10 16:25:02.776151 7f9e16d36700 -- 10.6.1.92:6800/15566 >> 10.6.1.63:0/2322371038 pipe(0x4315a00 sd=14 pgs=0 cs=0 l=0).accept peer addr is really 10.6.1.63:0/2322371038 (socket is 10.6.1.63:42299/0)
> os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e22577700'
> os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
> os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e21d76700'
> os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
> *** Caught signal (Aborted) **
>  in thread 0x7f9e22577700
> ceph version 0.27.commit: 793034c62c8e9ffab4af675ca97135fd1b193c9c. process: cosd. pid: 1414
> 2011-05-10 22:01:13.762083 7f0620492760 filestore(/mnt/osd0) mount FIEMAP ioctl is NOT supported
> 2011-05-10 22:01:13.762276 7f0620492760 filestore(/mnt/osd0) mount detected btrfs
> 2011-05-10 22:01:13.762288 7f0620492760 filestore(/mnt/osd0) mount btrfs CLONE_RANGE ioctl is supported
> *** Caught signal (Terminated) **
>  in thread 0x7f061e7b4700. Shutting down.
>
> As you can see with the attached log, I try to restart the cosd at 22:01. The service is started but ceph -s doesn't include the osd.
>
> Thanks for your help.
>
> Mark Nigh
> Systems Architect
> Netelligent Corporation
> mnigh@netelligent.com
>
>
>
> This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: OSD Crash
  2011-05-11 20:47 OSD Crash Mark Nigh
  2011-05-11 21:06 ` Sage Weil
@ 2011-05-11 21:39 ` Colin McCabe
  2011-05-13 17:03   ` Mark Nigh
  1 sibling, 1 reply; 28+ messages in thread
From: Colin McCabe @ 2011-05-11 21:39 UTC (permalink / raw)
  To: Mark Nigh; +Cc: ceph-devel

On Wed, May 11, 2011 at 1:47 PM, Mark Nigh <mnigh@netelligent.com> wrote:
> Some additional testing shows that the underlying filesystem btrfs does fail thus the daemon appropriately fails.
>
> The way I am simulating a failed HDD is by removing the HDD. The failure is working,
> but the problem is when I reinsert the HDD. I think I see the BTRFS filesystem
> recovery (btrfs filesystem show) and I can start the correct osd daemon that corresponds
> to the mount point but I do not see the osd come up and in (ceph -s).

I am assuming that you are using serial ATA (SATA), because that is
what comes standard on most PCs these days. The last time I used SATA
hotplug, which was in 2009, the driver support was still pretty flaky
at the chipset level. Things may have improved since then, but I
suspect that SATA hotplug is still an uncommon and poorly tested
operation.

If you really want to get hotplug working, start with the basics--
just see whether it works at the driver and block device level. Then
get btrfs involved, and finally if all that works, try Ceph.

However, it might be simpler just to simulate failures by randomly
killing cosd processes.
When cosd hangs on a sync for too long, it will deliver a SIGABRT to
itself anyway. So the only part of the proecss you would be bypassing
is the hang. You're putting your SATA driver and btrfs's error
handling paths through a workout, but not really testing Ceph per se.

Colin


>
>  ceph version 0.27.commit: 793034c62c8e9ffab4af675ca97135fd1b193c9c. process: cosd. pid: 2702
> 2011-05-11 15:13:58.650515 7fc6a349d760 filestore(/mnt/osd2) mount FIEMAP ioctl is NOT supported
> 2011-05-11 15:13:58.650754 7fc6a349d760 filestore(/mnt/osd2) mount detected btrfs
> 2011-05-11 15:13:58.650768 7fc6a349d760 filestore(/mnt/osd2) mount btrfs CLONE_RANGE ioctl is supported
>
> If I try to restart the osd daemon, it is unable to kill the process and repeats trying to kill it.
>
> Is the underlying file system not recovery like I think? I guess removing and inserting the HDD isn't the correct way to simulate a dead HDD.? Show I following the process of removing the osd, initializing the osd data dir and then restart the osd daemon?
>
> Thanks.
>
> Mark Nigh
> Systems Architect
> Netelligent Corporation
> mnigh@netelligent.com
>
>
>
> -----Original Message-----
> From: Mark Nigh
> Sent: Wednesday, May 11, 2011 8:12 AM
> To: 'ceph-devel@vger.kernel.org'
> Subject: OSD Crash
>
> I was performing a few failure test with the osd by removing a HDD from one of the osd host. All was well, the cluster noticed the failure and re-balanced data but when I replace the HDD into the host, the cosd crashed.
>
> Here is my setup. 6 osd host with 4 HDDs each (4 cosd daemons running for each host). 1 mon and 2 mds (separate host).
>
> Here is the log from the osd0
>
> 2011-05-10 16:25:02.776151 7f9e16d36700 -- 10.6.1.92:6800/15566 >> 10.6.1.63:0/2322371038 pipe(0x4315a00 sd=14 pgs=0 cs=0 l=0).accept peer addr is really 10.6.1.63:0/2322371038 (socket is 10.6.1.63:42299/0)
> os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e22577700'
> os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
> os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e21d76700'
> os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
> *** Caught signal (Aborted) **
>  in thread 0x7f9e22577700
> ceph version 0.27.commit: 793034c62c8e9ffab4af675ca97135fd1b193c9c. process: cosd. pid: 1414
> 2011-05-10 22:01:13.762083 7f0620492760 filestore(/mnt/osd0) mount FIEMAP ioctl is NOT supported
> 2011-05-10 22:01:13.762276 7f0620492760 filestore(/mnt/osd0) mount detected btrfs
> 2011-05-10 22:01:13.762288 7f0620492760 filestore(/mnt/osd0) mount btrfs CLONE_RANGE ioctl is supported
> *** Caught signal (Terminated) **
>  in thread 0x7f061e7b4700. Shutting down.
>
> As you can see with the attached log, I try to restart the cosd at 22:01. The service is started but ceph -s doesn't include the osd.
>
> Thanks for your help.
>
> Mark Nigh
> Systems Architect
> Netelligent Corporation
> mnigh@netelligent.com
>
>
>
> This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: OSD Crash
  2011-05-11 20:47 OSD Crash Mark Nigh
@ 2011-05-11 21:06 ` Sage Weil
  2011-05-11 21:39 ` Colin McCabe
  1 sibling, 0 replies; 28+ messages in thread
From: Sage Weil @ 2011-05-11 21:06 UTC (permalink / raw)
  To: Mark Nigh; +Cc: ceph-devel

On Wed, 11 May 2011, Mark Nigh wrote:
> Some additional testing shows that the underlying filesystem btrfs does 
> fail thus the daemon appropriately fails.
> 
> The way I am simulating a failed HDD is by removing the HDD. The failure 
> is working, but the problem is when I reinsert the HDD. I think I see 
> the BTRFS filesystem recovery (btrfs filesystem show) and I can start 
> the correct osd daemon that corresponds to the mount point but I do not 
> see the osd come up and in (ceph -s). The log is limited to
> 
>  ceph version 0.27.commit: 793034c62c8e9ffab4af675ca97135fd1b193c9c. process: cosd. pid: 2702
> 2011-05-11 15:13:58.650515 7fc6a349d760 filestore(/mnt/osd2) mount FIEMAP ioctl is NOT supported
> 2011-05-11 15:13:58.650754 7fc6a349d760 filestore(/mnt/osd2) mount detected btrfs
> 2011-05-11 15:13:58.650768 7fc6a349d760 filestore(/mnt/osd2) mount btrfs CLONE_RANGE ioctl is supported
> 
> If I try to restart the osd daemon, it is unable to kill the process and 
> repeats trying to kill it.

So:
 - cosd is running fine
 - you pull the drive
 - cosd hangs, cluster marks it down, recovers
 - reinsert the drive
and then
 - cosd gets EIO
  or
 - cosd won't restart
?

It sounds like the problem is that btrfs isn't handling the online 
reinsertion of the disk.  If you restart the machine things should come 
up.

I'm not sure whether handling those kinds of transient disk errors is 
something btrfs is intended to handle any time soon (without a reboot).  
This is one downside to multiple osds and btrfs volumes on the same node: 
if any one btrfs volume hangs up for some reason, the whole node is 
affected (one kernel!) and needs to be restarted.

sage




> 
> Is the underlying file system not recovery like I think? I guess 
> removing and inserting the HDD isn't the correct way to simulate a dead 
> HDD.? Show I following the process of removing the osd, initializing the 
> osd data dir and then restart the osd daemon?
> 
> Thanks.
> 
> Mark Nigh
> Systems Architect
> Netelligent Corporation
> mnigh@netelligent.com
> 
> 
> 
> -----Original Message-----
> From: Mark Nigh
> Sent: Wednesday, May 11, 2011 8:12 AM
> To: 'ceph-devel@vger.kernel.org'
> Subject: OSD Crash
> 
> I was performing a few failure test with the osd by removing a HDD from one of the osd host. All was well, the cluster noticed the failure and re-balanced data but when I replace the HDD into the host, the cosd crashed.
> 
> Here is my setup. 6 osd host with 4 HDDs each (4 cosd daemons running for each host). 1 mon and 2 mds (separate host).
> 
> Here is the log from the osd0
> 
> 2011-05-10 16:25:02.776151 7f9e16d36700 -- 10.6.1.92:6800/15566 >> 10.6.1.63:0/2322371038 pipe(0x4315a00 sd=14 pgs=0 cs=0 l=0).accept peer addr is really 10.6.1.63:0/2322371038 (socket is 10.6.1.63:42299/0)
> os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e22577700'
> os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
> os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e21d76700'
> os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
>  ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
>  1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
>  2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
>  3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
>  4: (ThreadPool::worker()+0x2a2) [0x626fa2]
>  5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
>  6: (()+0x6d8c) [0x7f9e29434d8c]
>  7: (clone()+0x6d) [0x7f9e2808204d]
> *** Caught signal (Aborted) **
>  in thread 0x7f9e22577700
> ceph version 0.27.commit: 793034c62c8e9ffab4af675ca97135fd1b193c9c. process: cosd. pid: 1414
> 2011-05-10 22:01:13.762083 7f0620492760 filestore(/mnt/osd0) mount FIEMAP ioctl is NOT supported
> 2011-05-10 22:01:13.762276 7f0620492760 filestore(/mnt/osd0) mount detected btrfs
> 2011-05-10 22:01:13.762288 7f0620492760 filestore(/mnt/osd0) mount btrfs CLONE_RANGE ioctl is supported
> *** Caught signal (Terminated) **
>  in thread 0x7f061e7b4700. Shutting down.
> 
> As you can see with the attached log, I try to restart the cosd at 22:01. The service is started but ceph -s doesn't include the osd.
> 
> Thanks for your help.
> 
> Mark Nigh
> Systems Architect
> Netelligent Corporation
> mnigh@netelligent.com
> 
> 
> 
> This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: OSD Crash
@ 2011-05-11 20:47 Mark Nigh
  2011-05-11 21:06 ` Sage Weil
  2011-05-11 21:39 ` Colin McCabe
  0 siblings, 2 replies; 28+ messages in thread
From: Mark Nigh @ 2011-05-11 20:47 UTC (permalink / raw)
  To: Mark Nigh, ceph-devel

Some additional testing shows that the underlying filesystem btrfs does fail thus the daemon appropriately fails.

The way I am simulating a failed HDD is by removing the HDD. The failure is working, but the problem is when I reinsert the HDD. I think I see the BTRFS filesystem recovery (btrfs filesystem show) and I can start the correct osd daemon that corresponds to the mount point but I do not see the osd come up and in (ceph -s). The log is limited to

 ceph version 0.27.commit: 793034c62c8e9ffab4af675ca97135fd1b193c9c. process: cosd. pid: 2702
2011-05-11 15:13:58.650515 7fc6a349d760 filestore(/mnt/osd2) mount FIEMAP ioctl is NOT supported
2011-05-11 15:13:58.650754 7fc6a349d760 filestore(/mnt/osd2) mount detected btrfs
2011-05-11 15:13:58.650768 7fc6a349d760 filestore(/mnt/osd2) mount btrfs CLONE_RANGE ioctl is supported

If I try to restart the osd daemon, it is unable to kill the process and repeats trying to kill it.

Is the underlying file system not recovery like I think? I guess removing and inserting the HDD isn't the correct way to simulate a dead HDD.? Show I following the process of removing the osd, initializing the osd data dir and then restart the osd daemon?

Thanks.

Mark Nigh
Systems Architect
Netelligent Corporation
mnigh@netelligent.com



-----Original Message-----
From: Mark Nigh
Sent: Wednesday, May 11, 2011 8:12 AM
To: 'ceph-devel@vger.kernel.org'
Subject: OSD Crash

I was performing a few failure test with the osd by removing a HDD from one of the osd host. All was well, the cluster noticed the failure and re-balanced data but when I replace the HDD into the host, the cosd crashed.

Here is my setup. 6 osd host with 4 HDDs each (4 cosd daemons running for each host). 1 mon and 2 mds (separate host).

Here is the log from the osd0

2011-05-10 16:25:02.776151 7f9e16d36700 -- 10.6.1.92:6800/15566 >> 10.6.1.63:0/2322371038 pipe(0x4315a00 sd=14 pgs=0 cs=0 l=0).accept peer addr is really 10.6.1.63:0/2322371038 (socket is 10.6.1.63:42299/0)
os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e22577700'
os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
 ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
 4: (ThreadPool::worker()+0x2a2) [0x626fa2]
 5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
 6: (()+0x6d8c) [0x7f9e29434d8c]
 7: (clone()+0x6d) [0x7f9e2808204d]
 ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
 4: (ThreadPool::worker()+0x2a2) [0x626fa2]
 5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
 6: (()+0x6d8c) [0x7f9e29434d8c]
 7: (clone()+0x6d) [0x7f9e2808204d]
os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e21d76700'
os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
 ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
 4: (ThreadPool::worker()+0x2a2) [0x626fa2]
 5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
 6: (()+0x6d8c) [0x7f9e29434d8c]
 7: (clone()+0x6d) [0x7f9e2808204d]
 ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
 4: (ThreadPool::worker()+0x2a2) [0x626fa2]
 5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
 6: (()+0x6d8c) [0x7f9e29434d8c]
 7: (clone()+0x6d) [0x7f9e2808204d]
*** Caught signal (Aborted) **
 in thread 0x7f9e22577700
ceph version 0.27.commit: 793034c62c8e9ffab4af675ca97135fd1b193c9c. process: cosd. pid: 1414
2011-05-10 22:01:13.762083 7f0620492760 filestore(/mnt/osd0) mount FIEMAP ioctl is NOT supported
2011-05-10 22:01:13.762276 7f0620492760 filestore(/mnt/osd0) mount detected btrfs
2011-05-10 22:01:13.762288 7f0620492760 filestore(/mnt/osd0) mount btrfs CLONE_RANGE ioctl is supported
*** Caught signal (Terminated) **
 in thread 0x7f061e7b4700. Shutting down.

As you can see with the attached log, I try to restart the cosd at 22:01. The service is started but ceph -s doesn't include the osd.

Thanks for your help.

Mark Nigh
Systems Architect
Netelligent Corporation
mnigh@netelligent.com



This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* OSD Crash
@ 2011-05-11 13:12 Mark Nigh
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Nigh @ 2011-05-11 13:12 UTC (permalink / raw)
  To: ceph-devel

I was performing a few failure test with the osd by removing a HDD from one of the osd host. All was well, the cluster noticed the failure and re-balanced data but when I replace the HDD into the host, the cosd crashed.

Here is my setup. 6 osd host with 4 HDDs each (4 cosd daemons running for each host). 1 mon and 2 mds (separate host).

Here is the log from the osd0

2011-05-10 16:25:02.776151 7f9e16d36700 -- 10.6.1.92:6800/15566 >> 10.6.1.63:0/2322371038 pipe(0x4315a00 sd=14 pgs=0 cs=0 l=0).accept peer addr is really 10.6.1.63:0/2322371038 (socket is 10.6.1.63:42299/0)
os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e22577700'
os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
 ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
 4: (ThreadPool::worker()+0x2a2) [0x626fa2]
 5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
 6: (()+0x6d8c) [0x7f9e29434d8c]
 7: (clone()+0x6d) [0x7f9e2808204d]
 ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
 4: (ThreadPool::worker()+0x2a2) [0x626fa2]
 5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
 6: (()+0x6d8c) [0x7f9e29434d8c]
 7: (clone()+0x6d) [0x7f9e2808204d]
os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7f9e21d76700'
os/FileStore.cc: 2120: FAILED assert(0 == "EIO handling not implemented")
 ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
 4: (ThreadPool::worker()+0x2a2) [0x626fa2]
 5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
 6: (()+0x6d8c) [0x7f9e29434d8c]
 7: (clone()+0x6d) [0x7f9e2808204d]
 ceph version 0.27 (commit:793034c62c8e9ffab4af675ca97135fd1b193c9c)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x194) [0x5a0c84]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x156) [0x5a3536]
 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x13e) [0x598ebe]
 4: (ThreadPool::worker()+0x2a2) [0x626fa2]
 5: (ThreadPool::WorkThread::entry()+0xd) [0x529f1d]
 6: (()+0x6d8c) [0x7f9e29434d8c]
 7: (clone()+0x6d) [0x7f9e2808204d]
*** Caught signal (Aborted) **
 in thread 0x7f9e22577700
ceph version 0.27.commit: 793034c62c8e9ffab4af675ca97135fd1b193c9c. process: cosd. pid: 1414
2011-05-10 22:01:13.762083 7f0620492760 filestore(/mnt/osd0) mount FIEMAP ioctl is NOT supported
2011-05-10 22:01:13.762276 7f0620492760 filestore(/mnt/osd0) mount detected btrfs
2011-05-10 22:01:13.762288 7f0620492760 filestore(/mnt/osd0) mount btrfs CLONE_RANGE ioctl is supported
*** Caught signal (Terminated) **
 in thread 0x7f061e7b4700. Shutting down.

As you can see with the attached log, I try to restart the cosd at 22:01. The service is started but ceph -s doesn't include the osd.

Thanks for your help.

Mark Nigh
Systems Architect
Netelligent Corporation
mnigh@netelligent.com



This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-09-07 16:42 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-27  0:12 OSD crash Fyodor Ustinov
2011-05-27 15:16 ` Gregory Farnum
2011-05-27 16:41   ` Fyodor Ustinov
2011-05-27 16:49     ` Gregory Farnum
2011-05-27 19:18       ` Gregory Farnum
2011-05-27 19:30         ` Fyodor Ustinov
2011-05-27 22:52         ` Fyodor Ustinov
  -- strict thread matches above, loose matches on Subject: below --
2020-09-07 16:42 osd crash Kaarlo Lahtela
     [not found] <8566685.312.1362419807745.JavaMail.dspano@it1>
2013-03-04 18:02 ` OSD Crash Dave Spano
2012-08-22 20:31 OSD crash Andrey Korolyov
2012-08-22 22:33 ` Sage Weil
2012-08-22 22:55   ` Andrey Korolyov
2012-08-23  0:09     ` Gregory Farnum
2012-08-25  8:30       ` Andrey Korolyov
2012-08-26 16:52         ` Andrey Korolyov
2012-08-26 20:44           ` Sage Weil
2012-09-04  8:13           ` Andrey Korolyov
2012-09-04 15:32             ` Sage Weil
2012-06-16 12:57 Stefan Priebe
2012-06-16 13:34 ` Stefan Priebe
2012-06-17 21:16   ` Sage Weil
2012-06-18  6:41     ` Stefan Priebe - Profihost AG
2011-05-11 20:47 OSD Crash Mark Nigh
2011-05-11 21:06 ` Sage Weil
2011-05-11 21:39 ` Colin McCabe
2011-05-13 17:03   ` Mark Nigh
2011-05-13 18:34     ` Sage Weil
2011-05-11 13:12 Mark Nigh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.