All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem Jan Withagen <wjw@digiware.nl>
To: Ceph Development <ceph-devel@vger.kernel.org>
Subject: OSD not coming back up again
Date: Thu, 11 Aug 2016 02:40:45 +0200	[thread overview]
Message-ID: <bdb93e87-8338-e485-a68d-f988d00f6785@digiware.nl> (raw)

Hi

During testing with cephtool-test-mon.sh

3 OSDs are started, and then the code executes:
====
  ceph osd set noup
  ceph osd down 0
  ceph osd dump | grep 'osd.0 down'
  ceph osd unset noup
====

And in 1000 secs osd.0 is not coming back up.

Below some details, but where should I start looking?

Thanx
--WjW


ceph -s gives:

    cluster 9b2500f8-44fb-40d1-91bc-ed522e9db5c6
     health HEALTH_WARN
            8 pgs degraded
            8 pgs stuck unclean
            8 pgs undersized
     monmap e1: 3 mons at
{a=127.0.0.1:7202/0,b=127.0.0.1:7203/0,c=127.0.0.1:7204/0}
            election epoch 6, quorum 0,1,2 a,b,c
     osdmap e179: 3 osds: 2 up, 2 in; 8 remapped pgs
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v384: 8 pgs, 1 pools, 0 bytes data, 0 objects
            248 GB used, 198 GB / 446 GB avail
                   8 active+undersized+degraded

And the pgmap version is slowly growing.....

This set of lines is repeated over and over in the osd.0.log

2016-08-11 02:31:48.710152 b2f4d00  1 -- 127.0.0.1:0/25528 -->
127.0.0.1:6806/25709 -- osd_ping(ping e175 stamp 2016-08-11
02:31:48.710144) v2 -- ?+0 0xb42bc00 con 0xb12ba40
2016-08-11 02:31:48.710188 b2f4d00  1 -- 127.0.0.1:0/25528 -->
127.0.0.1:6807/25709 -- osd_ping(ping e175 stamp 2016-08-11
02:31:48.710144) v2 -- ?+0 0xb42cc00 con 0xb12bb20
2016-08-11 02:31:48.710214 b2f4d00  1 -- 127.0.0.1:0/25528 -->
127.0.0.1:6810/25910 -- osd_ping(ping e175 stamp 2016-08-11
02:31:48.710144) v2 -- ?+0 0xb42a400 con 0xb12bc00
2016-08-11 02:31:48.710240 b2f4d00  1 -- 127.0.0.1:0/25528 -->
127.0.0.1:6811/25910 -- osd_ping(ping e175 stamp 2016-08-11
02:31:48.710144) v2 -- ?+0 0xb42c000 con 0xb12c140
2016-08-11 02:31:48.710604 b412480  1 -- 127.0.0.1:0/25528 <== osd.1
127.0.0.1:6806/25709 284 ==== osd_ping(ping_reply e179 stamp 2016-08-11
02:31:48.710144) v2 ==== 47+0+0 (281956571 0 0) 0xb42d800 con 0xb12ba40
2016-08-11 02:31:48.710665 b486900  1 -- 127.0.0.1:0/25528 <== osd.2
127.0.0.1:6810/25910 283 ==== osd_ping(ping_reply e179 stamp 2016-08-11
02:31:48.710144) v2 ==== 47+0+0 (281956571 0 0) 0xb42d200 con 0xb12bc00
2016-08-11 02:31:48.710683 b412480  1 -- 127.0.0.1:0/25528 <== osd.1
127.0.0.1:6806/25709 285 ==== osd_ping(you_died e179 stamp 2016-08-11
02:31:48.710144) v2 ==== 47+0+0 (1545205378 0 0) 0xb42d800 con 0xb12ba40
2016-08-11 02:31:48.710780 b412000  1 -- 127.0.0.1:0/25528 <== osd.1
127.0.0.1:6807/25709 284 ==== osd_ping(ping_reply e179 stamp 2016-08-11
02:31:48.710144) v2 ==== 47+0+0 (281956571 0 0) 0xb42da00 con 0xb12bb20
2016-08-11 02:31:48.710789 b486900  1 -- 127.0.0.1:0/25528 <== osd.2
127.0.0.1:6810/25910 284 ==== osd_ping(you_died e179 stamp 2016-08-11
02:31:48.710144) v2 ==== 47+0+0 (1545205378 0 0) 0xb42d200 con 0xb12bc00
2016-08-11 02:31:48.710821 b486d80  1 -- 127.0.0.1:0/25528 <== osd.2
127.0.0.1:6811/25910 283 ==== osd_ping(ping_reply e179 stamp 2016-08-11
02:31:48.710144) v2 ==== 47+0+0 (281956571 0 0) 0xb42d400 con 0xb12c140
2016-08-11 02:31:48.710973 b412000  1 -- 127.0.0.1:0/25528 <== osd.1
127.0.0.1:6807/25709 285 ==== osd_ping(you_died e179 stamp 2016-08-11
02:31:48.710144) v2 ==== 47+0+0 (1545205378 0 0) 0xb42da00 con 0xb12bb20
2016-08-11 02:31:48.711028 b486d80  1 -- 127.0.0.1:0/25528 <== osd.2
127.0.0.1:6811/25910 284 ==== osd_ping(you_died e179 stamp 2016-08-11
02:31:48.710144) v2 ==== 47+0+0 (1545205378 0 0) 0xb42d400 con 0xb12c140

             reply	other threads:[~2016-08-11  0:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-11  0:40 Willem Jan Withagen [this message]
2016-08-11  6:26 ` OSD not coming back up again Wido den Hollander
2016-08-11  8:46   ` Willem Jan Withagen
2016-08-11 11:02     ` Wido den Hollander
2016-08-11 11:13       ` Willem Jan Withagen
2016-08-11 11:39   ` Willem Jan Withagen
2016-08-11 11:44     ` Wido den Hollander
2016-08-11 12:11       ` Willem Jan Withagen
2016-08-11 14:36       ` Willem Jan Withagen
2016-08-11 15:46         ` Willem Jan Withagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bdb93e87-8338-e485-a68d-f988d00f6785@digiware.nl \
    --to=wjw@digiware.nl \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.