All of lore.kernel.org
 help / color / mirror / Atom feed
* OSD crashes
@ 2017-10-10 14:36 Wyllys Ingersoll
  2017-10-10 14:54 ` kefu chai
  0 siblings, 1 reply; 7+ messages in thread
From: Wyllys Ingersoll @ 2017-10-10 14:36 UTC (permalink / raw)
  To: Ceph Development

Im seeing the following OSD crashes on my system that is in a heavy
recovery state.

Ceph 10.2.9
Ubuntu 16.04.2
XFS disks with both journal and data on the same dmcrypt protected devices.


   -13> 2017-10-10 10:33:44.202555 7f49da1158c0  5 osd.78 pg_epoch:
288706 pg[23.3bc(unlocked)] enter Initial
   -12> 2017-10-10 10:33:44.204120 7f49da1158c0  5 osd.78 pg_epoch:
288706 pg[23.3bc( v 29854'429 (0'0,29854'429] local-les=285261 n=4
ec=19254 les/c/f 285261/285281/0 285343/285343/285343) [101,39,100]
r=-1 lpr=0 pi=203138-285342/152 crt=29854'429 lcod 0'0 inactive NOTIFY
NIBBLEWISE] exit Initial 0.001559 0 0.000000
   -11> 2017-10-10 10:33:44.204139 7f49da1158c0  5 osd.78 pg_epoch:
288706 pg[23.3bc( v 29854'429 (0'0,29854'429] local-les=285261 n=4
ec=19254 les/c/f 285261/285281/0 285343/285343/285343) [101,39,100]
r=-1 lpr=0 pi=203138-285342/152 crt=29854'429 lcod 0'0 inactive NOTIFY
NIBBLEWISE] enter Reset
   -10> 2017-10-10 10:33:44.233836 7f49da1158c0  5 osd.78 pg_epoch:
288730 pg[9.8(unlocked)] enter Initial
    -9> 2017-10-10 10:33:44.245781 7f49da1158c0  5 osd.78 pg_epoch:
288730 pg[9.8( v 113941'62509 (35637'59509,113941'62509]
local-les=288727 n=26 ec=1076 les/c/f 288727/288730/0
288719/288725/279537) [78,81,100] r=0 lpr=0 crt=113941'62509 lcod 0'0
mlcod 0'0 inactive NIBBLEWISE] exit Initial 0.011945 0 0.000000
    -8> 2017-10-10 10:33:44.245803 7f49da1158c0  5 osd.78 pg_epoch:
288730 pg[9.8( v 113941'62509 (35637'59509,113941'62509]
local-les=288727 n=26 ec=1076 les/c/f 288727/288730/0
288719/288725/279537) [78,81,100] r=0 lpr=0 crt=113941'62509 lcod 0'0
mlcod 0'0 inactive NIBBLEWISE] enter Reset
    -7> 2017-10-10 10:33:44.509240 7f49da1158c0  5 osd.78 pg_epoch:
288753 pg[1.5e7(unlocked)] enter Initial
    -6> 2017-10-10 10:33:47.185265 7f49da1158c0  5 osd.78 pg_epoch:
288753 pg[1.5e7( v 286018'307337 (208416'292664,286018'307337]
local-les=279555 n=8426 ec=23117 les/c/f 279555/279564/0
279532/279544/279544) [78,34,30] r=0 lpr=0 crt=286018'307337 lcod 0'0
mlcod 0'0 inactive NIBBLEWISE] exit Initial 2.676025 0 0.000000
    -5> 2017-10-10 10:33:47.185302 7f49da1158c0  5 osd.78 pg_epoch:
288753 pg[1.5e7( v 286018'307337 (208416'292664,286018'307337]
local-les=279555 n=8426 ec=23117 les/c/f 279555/279564/0
279532/279544/279544) [78,34,30] r=0 lpr=0 crt=286018'307337 lcod 0'0
mlcod 0'0 inactive NIBBLEWISE] enter Reset
    -4> 2017-10-10 10:33:47.345265 7f49da1158c0  5 osd.78 pg_epoch:
288706 pg[2.36a(unlocked)] enter Initial
    -3> 2017-10-10 10:33:47.360864 7f49da1158c0  5 osd.78 pg_epoch:
288706 pg[2.36a( v 279380'86262 (36401'83241,279380'86262]
local-les=285038 n=56 ec=23131 les/c/f 285038/285160/0
284933/284985/284985) [2,78,59] r=1 lpr=0 pi=284823-284984/2
crt=279380'86262 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial
0.015599 0 0.000000
    -2> 2017-10-10 10:33:47.360893 7f49da1158c0  5 osd.78 pg_epoch:
288706 pg[2.36a( v 279380'86262 (36401'83241,279380'86262]
local-les=285038 n=56 ec=23131 les/c/f 285038/285160/0
284933/284985/284985) [2,78,59] r=1 lpr=0 pi=284823-284984/2
crt=279380'86262 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
    -1> 2017-10-10 10:33:47.589722 7f49da1158c0  5 osd.78 pg_epoch:
288663 pg[1.2ad(unlocked)] enter Initial
     0> 2017-10-10 10:33:48.931168 7f49da1158c0 -1 *** Caught signal
(Aborted) **
 in thread 7f49da1158c0 thread_name:ceph-osd

 ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
 1: (()+0x984c4e) [0x5597b21e7c4e]
 2: (()+0x11390) [0x7f49d8fd3390]
 3: (gsignal()+0x38) [0x7f49d6f71428]
 4: (abort()+0x16a) [0x7f49d6f7302a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f49d78b384d]
 6: (()+0x8d6b6) [0x7f49d78b16b6]
 7: (()+0x8d701) [0x7f49d78b1701]
 8: (()+0x8d919) [0x7f49d78b1919]
 9: (ceph::buffer::create_aligned(unsigned int, unsigned int)+0x146)
[0x5597b22f0f86]
 10: (ceph::buffer::copy(char const*, unsigned int)+0x15) [0x5597b22f10f5]
 11: (ceph::buffer::ptr::ptr(char const*, unsigned int)+0x18) [0x5597b22f1128]
 12: (LevelDBStore::to_bufferlist(leveldb::Slice)+0x75) [0x5597b20a09b5]
 13: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::value()+0x32)
[0x5597b20a4232]
 14: (KeyValueDB::IteratorImpl::value()+0x22) [0x5597b1c843f2]
 15: (DBObjectMap::DBObjectMapIteratorImpl::value()+0x25) [0x5597b204cbd5]
 16: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t,
pg_info_t const&, std::map<eversion_t, hobject_t,
std::less<eversion_t>, std::allocator<std::pair<eversion_t const,
hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&,
std::__cxx11::basic_ostringstream<char, std::char_traits<char>,
std::allocator<char> >&, bool, DoutPrefixProvider const*,
std::set<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > >,
std::allocator<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > > >*)+0xb99)
[0x5597b1e92a19]
 17: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x313) [0x5597b1cc0fb3]
 18: (OSD::load_pgs()+0x87a) [0x5597b1bfb96a]
 19: (OSD::init()+0x2026) [0x5597b1c06c56]
 20: (main()+0x2ef1) [0x5597b1b78391]
 21: (__libc_start_main()+0xf0) [0x7f49d6f5c830]
 22: (_start()+0x29) [0x5597b1bb9b99]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

^ permalink raw reply	[flat|nested] 7+ messages in thread
* OSD crashes
@ 2017-09-18 18:24 Wyllys Ingersoll
  0 siblings, 0 replies; 7+ messages in thread
From: Wyllys Ingersoll @ 2017-09-18 18:24 UTC (permalink / raw)
  To: Ceph Development

We have a cluster going through a heavy rebalance operation, but its
hampered by several OSDs that keep crashing and restarting.

Jewel 10.2.7
Ubuntu 16.04.2

Here is a dump of one of the crashing OSD logs:



     0> 2017-09-18 14:08:18.631931 7f481207d8c0 -1 *** Caught signal
(Aborted) **
 in thread 7f481207d8c0 thread_name:ceph-osd

 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (()+0x9770ae) [0x557bbc5fc0ae]
 2: (()+0x11390) [0x7f4810f3b390]
 3: (gsignal()+0x38) [0x7f480eed9428]
 4: (abort()+0x16a) [0x7f480eedb02a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f480f81b84d]
 6: (()+0x8d6b6) [0x7f480f8196b6]
 7: (()+0x8d701) [0x7f480f819701]
 8: (()+0x8d919) [0x7f480f819919]
 9: (()+0x1230f) [0x7f4811c1330f]
 10: (operator new[](unsigned long)+0x4e7) [0x7f4811c374b7]
 11: (leveldb::ReadBlock(leveldb::RandomAccessFile*,
leveldb::ReadOptions const&, leveldb::BlockHandle const&,
leveldb::BlockContents*)+0x313) [0x7f48115c4e63]
 12: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&,
leveldb::Slice const&)+0x276) [0x7f48115c9426]
 13: (()+0x421be) [0x7f48115cd1be]
 14: (()+0x42240) [0x7f48115cd240]
 15: (()+0x4261e) [0x7f48115cd61e]
 16: (()+0x3d835) [0x7f48115c8835]
 17: (()+0x1fffb) [0x7f48115aaffb]
 18: (_ZN12LevelDBStore29LevelDBWholeSpaceIteratorImpl4nextEv()+0x8f)
[0x557bbc4b7a3f]
 19: (_ZN11DBObjectMap23DBObjectMapIteratorImpl4nextEb()+0x34) [0x557bbc46bb24]
 20: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t,
pg_info_t const&, std::map<eversion_t, hobject_t,
std::less<eversion_t>, std::allocator<std::pair<eversion_t const,
hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&,
std::__cxx11::basic_ostringstream<char, std::char_traits<char>,
std::allocator<char> >&, DoutPrefixProvider const*,
std::set<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > >,
std::allocator<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > > >*)+0xac3)
[0x557bbc2a9653]
 21: (_ZN2PG10read_stateEP11ObjectStoreRN4ceph6buffer4listE()+0x2f6)
[0x557bbc0db306]
 22: (OSD::load_pgs()+0x87a) [0x557bbc016f0a]
 23: (OSD::init()+0x2026) [0x557bbc0221f6]
 24: (main()+0x2ea5) [0x557bbbf93dc5]
 25: (__libc_start_main()+0xf0) [0x7f480eec4830]
 26: (_start()+0x29) [0x557bbbfd5459]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 1 ms
   0/ 1 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   0/ 1 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  99/99 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.70.log
--- end dump of recent events ---

^ permalink raw reply	[flat|nested] 7+ messages in thread
* OSD crashes
@ 2012-03-06  6:54 Borodin Vladimir
  0 siblings, 0 replies; 7+ messages in thread
From: Borodin Vladimir @ 2012-03-06  6:54 UTC (permalink / raw)
  To: ceph-devel

Hi all.

One of my OSD crashes when I try to start it. I've turned on "debug
osd = 20" in ceph.conf for this node and put the log here:
http://simply.name/osd.47.log. The ceph.conf file is here:
http://simply.name/ceph.conf. Is there any other information I should
show?

I've updated to 0.43 recently, but there were no problems after it.
Actually, I don't know, when this problem appeared.

Regards,
Vladimir.

^ permalink raw reply	[flat|nested] 7+ messages in thread
* OSD crashes
@ 2011-10-11 16:02 Christian Brunner
  2011-10-11 18:09 ` Gregory Farnum
  0 siblings, 1 reply; 7+ messages in thread
From: Christian Brunner @ 2011-10-11 16:02 UTC (permalink / raw)
  To: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 377 bytes --]

Here is another one...

I've now run mkcephfs and started importing all our data from a
backup. However after two days, two of our OSDs are crashing right
after the start again.

It all started with a "hit suicide timeout". Now I can't start it any
longer. Here is what I have in the logs. I'm sending the complete log
because I' getting different messages.

Thanks,
Christian

[-- Attachment #2: log.txt.gz --]
[-- Type: application/x-gzip, Size: 34679 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-10-10 15:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 14:36 OSD crashes Wyllys Ingersoll
2017-10-10 14:54 ` kefu chai
2017-10-10 15:00   ` Wyllys Ingersoll
  -- strict thread matches above, loose matches on Subject: below --
2017-09-18 18:24 Wyllys Ingersoll
2012-03-06  6:54 Borodin Vladimir
2011-10-11 16:02 Christian Brunner
2011-10-11 18:09 ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.