All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yann Dupont <Yann.Dupont@univ-nantes.fr>
To: Sam Just <sam.just@inktank.com>
Cc: Tommi Virtanen <tv@inktank.com>, ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: domino-style OSD crash
Date: Tue, 03 Jul 2012 10:40:11 +0200	[thread overview]
Message-ID: <4FF2AFEB.1010403@univ-nantes.fr> (raw)
In-Reply-To: <CA+4uBUYoDFWcYhmd_EacQgJSf+i=WcA7x-PNWZ0EerD+_fTAjg@mail.gmail.com>

Le 04/06/2012 19:40, Sam Just a écrit :
> Can you send the osd logs?  The merge_log crashes are probably fixable
> if I can see the logs.
>

Well I'm sorry - As I send in private mail I was away from computer for 
a long time.
I can't send those logs anymore, they are rotated now...

Anyway. Now that I'm back, I try to restart where I stopped, and tried 
to restart the failed nodes.

Upgraded the kernel to 3.5.0-rc4 + some patches, seems btrfs is OK right 
now.

Tried to restart osd with 0.47.3, then next branch, and today with 0.48.

4 of 8 nodes fails with the same message :

ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030)
  1: /usr/bin/ceph-osd() [0x701929]
  2: (()+0xf030) [0x7fe5b4777030]
  3: (gsignal()+0x35) [0x7fe5b33fc4f5]
  4: (abort()+0x180) [0x7fe5b33ff770]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe5b3c4f68d]
  6: (()+0x63796) [0x7fe5b3c4d796]
  7: (()+0x637c3) [0x7fe5b3c4d7c3]
  8: (()+0x639ee) [0x7fe5b3c4d9ee]
  9: (std::__throw_length_error(char const*)+0x5d) [0x7fe5b3c9f5ed]
  10: (()+0xbfad2) [0x7fe5b3ca9ad2]
  11: (char* std::string::_S_construct<char const*>(char const*, char 
const*, std::allocator<char> const&, std::forward_iterator_tag)+0x35) 
[0x7fe5b3cab4a5]
  12: (std::basic_string<char, std::char_traits<char>, 
std::allocator<char> >::basic_string(char const*, unsigned long, 
std::allocator<char> const&)+0x1d) [0x7fe5b3cab5bd]
  13: 
(leveldb::InternalKeyComparator::FindShortestSeparator(std::string*, 
leveldb::Slice const&) const+0x4d) [0x6e811d]
  14: (leveldb::TableBuilder::Add(leveldb::Slice const&, leveldb::Slice 
const&)+0x9f) [0x6f681f]
  15: 
(leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*)+0x4d3) 
[0x6e3643]
  16: (leveldb::DBImpl::BackgroundCompaction()+0x222) [0x6e45a2]
  17: (leveldb::DBImpl::BackgroundCall()+0x68) [0x6e4e18]
  18: /usr/bin/ceph-osd() [0x6fd401]
  19: (()+0x6b50) [0x7fe5b476eb50]
  20: (clone()+0x6d) [0x7fe5b34a278d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

ceph-osd is from the debian package (64 bits)
I have a core dump, but I'm afraid it won't help much :

gdb /usr/bin/ceph-osd core
GNU gdb (GDB) 7.0.1-debian

....

Core was generated by `/usr/bin/ceph-osd -i 2 --pid-file 
/var/run/ceph/osd.2.pid -c /etc/ceph/ceph.con'.
Program terminated with signal 6, Aborted.
---Type <return> to continue, or q <return> to quit---
#0  0x00007fe5b4776efb in raise () from 
/lib/x86_64-linux-gnu/libpthread.so.0

This time I REALLY CAN (knock on wood) furnish logs & core.

Granted, this crash was very probably caused by corruption on btrfs, but 
it could be great if there's a way to recover the crashed osd node.

Cheers,

-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-07-03  8:40 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-04  8:44 domino-style OSD crash Yann Dupont
2012-06-04 16:16 ` Tommi Virtanen
2012-06-04 17:40   ` Sam Just
2012-06-04 18:34     ` Greg Farnum
2012-07-03  8:40     ` Yann Dupont [this message]
2012-07-03 19:42       ` Tommi Virtanen
2012-07-03 20:54         ` Yann Dupont
2012-07-03 21:38           ` Tommi Virtanen
2012-07-04  8:06             ` Yann Dupont
2012-07-04 16:21               ` Gregory Farnum
2012-07-04 17:53                 ` Yann Dupont
2012-07-05 21:32                   ` Gregory Farnum
2012-07-06  7:19                     ` Yann Dupont
2012-07-06 17:01                       ` Gregory Farnum
2012-07-07  8:19                         ` Yann Dupont
2012-07-09 17:14                           ` Samuel Just
2012-07-10  9:46                             ` Yann Dupont
2012-07-10 15:56                               ` Tommi Virtanen
2012-07-10 16:39                                 ` Yann Dupont
2012-07-10 17:11                                   ` Tommi Virtanen
2012-07-10 17:36                                     ` Yann Dupont
2012-07-10 18:16                                       ` Tommi Virtanen
2012-07-09 17:43               ` Tommi Virtanen
2012-07-09 19:05                 ` Yann Dupont
2012-07-09 19:48                   ` Tommi Virtanen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FF2AFEB.1010403@univ-nantes.fr \
    --to=yann.dupont@univ-nantes.fr \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sam.just@inktank.com \
    --cc=tv@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.