All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Deneau, Tom" <tom.deneau@amd.com>
To: ceph-devel <ceph-devel@vger.kernel.org>
Subject: seg fault in ceph-osd on aarch64
Date: Thu, 26 Mar 2015 17:10:04 +0000	[thread overview]
Message-ID: <BC97738F8E7C8742BABED7F06FB9DF91664108DD@SATLEXDAG01.amd.com> (raw)

I've been exercising the the 64-bit arm (aarch64) version of ceph.
This is from self-built rpms from the v0.93 snapshot.
The "cluster" is a single system with 6 hard drives, one osd each.
I've been letting it run with some rados bench and rados load-gen loops
and running bonnie++ on an rbd mount.

Occasionally (in the latest case after 2 days) I've seen ceph-osd crashes
like the one shown below.  (showing last 10 events as well).
If I am reading the objdump correctly this is from the while loop
in the following code in Pipe::connect

I assume this is not seen on ceph builds from other architectures?

What is the recommended way to get more information on this osd crash?
(looks like osd log levels are 0/5)

-- Tom Deneau, AMD



      if (reply.tag == CEPH_MSGR_TAG_SEQ) {
        ldout(msgr->cct,10) << "got CEPH_MSGR_TAG_SEQ, reading acked_seq and writing in_seq" << dendl;
        uint64_t newly_acked_seq = 0;
        if (tcp_read((char*)&newly_acked_seq, sizeof(newly_acked_seq)) < 0) {
          ldout(msgr->cct,2) << "connect read error on newly_acked_seq" << dendl;
          goto fail_locked;
        }
        ldout(msgr->cct,2) << " got newly_acked_seq " << newly_acked_seq
                           << " vs out_seq " << out_seq << dendl;
        while (newly_acked_seq > out_seq) {
          Message *m = _get_next_outgoing();
          assert(m);
          ldout(msgr->cct,2) << " discarding previously sent " << m->get_seq()
                             << " " << *m << dendl;
          assert(m->get_seq() <= newly_acked_seq);
          m->put();
          ++out_seq;
        }
        if (tcp_write((char*)&in_seq, sizeof(in_seq)) < 0) {
          ldout(msgr->cct,2) << "connect write error on in_seq" << dendl;
          goto fail_locked;
        }
      }




  -10> 2015-03-25 09:41:11.950684 3ff8f05f010  5 -- op tracker -- seq: 3499479, time: 2015-03-25 09:41:11.950683, event: done, op: osd_op(c\
lient.8322.0:1640 benchmark_data_b0c-upstairs_5647_object343 [read 0~4194304] 1.5c587e9e ack+read+known_if_redirected e316)
    -9> 2015-03-25 09:41:11.951356 3ff8659f010  1 -- 10.236.136.224:6804/4928 <== client.8322 10.236.136.224:0/1020871 256 ==== osd_op(clien\
t.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316) v5 ==== 201+0+0 (280\
2495612 0 0) 0x1e67cd80 con 0x71f4c80
    -8> 2015-03-25 09:41:11.951397 3ff8659f010  5 -- op tracker -- seq: 3499480, time: 2015-03-25 09:41:11.951205, event: header_read, op: o\
sd_op(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
    -7> 2015-03-25 09:41:11.951411 3ff8659f010  5 -- op tracker -- seq: 3499480, time: 2015-03-25 09:41:11.951214, event: throttled, op: osd\
_op(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
    -6> 2015-03-25 09:41:11.951420 3ff8659f010  5 -- op tracker -- seq: 3499480, time: 2015-03-25 09:41:11.951351, event: all_read, op: osd_\
op(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
    -5> 2015-03-25 09:41:11.951429 3ff8659f010  5 -- op tracker -- seq: 3499480, time: 0.000000, event: dispatched, op: osd_op(client.8322.0\
:1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
    -4> 2015-03-25 09:41:11.951561 3ff9205f010  5 -- op tracker -- seq: 3499480, time: 2015-03-25 09:41:11.951560, event: reached_pg, op: os\
d_op(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
    -3> 2015-03-25 09:41:11.951627 3ff9205f010  5 -- op tracker -- seq: 3499480, time: 2015-03-25 09:41:11.951627, event: started, op: osd_o\
p(client.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
    -2> 2015-03-25 09:41:11.961959 3ff9205f010  1 -- 10.236.136.224:6804/4928 --> 10.236.136.224:0/1020871 -- osd_op_reply(1642 benchmark_da\
ta_b0c-upstairs_5647_object411 [read 0~4194304] v0'0 uv2 ondisk = 0) v6 -- ?+0 0x3b39340 con 0x71f4c80
    -1> 2015-03-25 09:41:11.962043 3ff9205f010  5 -- op tracker -- seq: 3499480, time: 2015-03-25 09:41:11.962043, event: done, op: osd_op(c\
lient.8322.0:1642 benchmark_data_b0c-upstairs_5647_object411 [read 0~4194304] 1.f2b5749d ack+read+known_if_redirected e316)
     0> 2015-03-25 09:41:12.030725 3ff8619f010 -1 *** Caught signal (Segmentation fault) **
 in thread 3ff8619f010

 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
 1: /usr/bin/ceph-osd() [0xacf140]
 2: [0x3ffa9520510]
 3: (Pipe::connect()+0x301c) [0xc8c37c]
 4: (Pipe::Writer::entry()+0x10) [0xc96b9c]
 5: (Thread::entry_wrapper()+0x50) [0xba3bec]
 6: (()+0x6f30) [0x3ffa9116f30]
 7: (()+0xdd910) [0x3ffa8d8d910]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


             reply	other threads:[~2015-03-26 17:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-26 17:10 Deneau, Tom [this message]
2015-03-26 17:16 ` seg fault in ceph-osd on aarch64 Sage Weil
2015-03-26 18:05   ` Deneau, Tom
2015-03-26 18:11     ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BC97738F8E7C8742BABED7F06FB9DF91664108DD@SATLEXDAG01.amd.com \
    --to=tom.deneau@amd.com \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.