Toying with a FreeBSD cluster results in a crash

* Toying with a FreeBSD cluster results in a crash
@ 2017-04-07 14:34 Willem Jan Withagen
  2017-04-08  3:33 ` kefu chai
  0 siblings, 1 reply; 5+ messages in thread
From: Willem Jan Withagen @ 2017-04-07 14:34 UTC (permalink / raw)
  To: Ceph Development

Hi,

I'm playing with my/a FreeBSD test cluster.
It is full with different types of disks, and sometimes they are not
very new.

The deepscrub on it showed things like:
 filestore(/var/lib/ceph/osd/osd.7) error creating
#-1:4962ce63:::inc_osdmap.705:0#
(/var/lib/ceph/osd/osd.7/current/meta/inc\uosdmap
.705__0_C6734692__none) in index: (87) Attribute not found

I've build the cluster with:
	osd pool default size      = 1

Created some pools, and then increased
	osd pool default size      = 3

Restarted the pools, but 1 pool does not want to reboot, so now I wonder
if the restarting problem is due to issue like quoted above?

And how do I cleanup this mess, without wiping the cluster and
restarting. :) Note that it is just practice for me doing somewhat more
tricky work.

Thanx,
--WjW

    -6> 2017-04-07 16:04:57.530301 806e16000  0 osd.7 733 crush map has
features 2200130813952, adjusting msgr requires for clients
    -5> 2017-04-07 16:04:57.530314 806e16000  0 osd.7 733 crush map has
features 2200130813952 was 8705, adjusting msgr requires for mons
    -4> 2017-04-07 16:04:57.530321 806e16000  0 osd.7 733 crush map has
features 2200130813952, adjusting msgr requires for osds
    -3> 2017-04-07 16:04:57.552968 806e16000  0 osd.7 733 load_pgs
    -2> 2017-04-07 16:04:57.553479 806e16000 -1 osd.7 0 failed to load
OSD map for epoch 714, got 0 bytes
    -1> 2017-04-07 16:04:57.553493 806e16000 -1 osd.7 733 load_pgs: have
pgid 8.e9 at epoch 714, but missing map.  Crashing.
     0> 2017-04-07 16:04:57.554157 806e16000 -1
/usr/ports/net/ceph/work/ceph-wip.FreeBSD/src/osd/OSD.cc: In function
'void OSD::load_pgs()' thread 806e16000 time 2017-04-0
7 16:04:57.553497
/usr/ports/net/ceph/work/ceph-wip.FreeBSD/src/osd/OSD.cc: 3360: FAILED
assert(0 == "Missing map in load_pgs")

Most of the pools are in "oke" state:
[/var/log/ceph] wjw@cephtest> ceph -s
    cluster 746e196d-e344-11e6-b4b7-0025903744dc
     health HEALTH_ERR
            45 pgs are stuck inactive for more than 300 seconds
            7 pgs down
            38 pgs stale
            7 pgs stuck inactive
            38 pgs stuck stale
            7 pgs stuck unclean
            pool cephfsdata has many more objects per pg than average
(too few pgs?)
     monmap e5: 3 mons at
{a=192.168.10.70:6789/0,b=192.168.9.79:6789/0,c=192.168.8.79:6789/0}
            election epoch 114, quorum 0,1,2 c,b,a
      fsmap e755: 1/1/1 up {0=alpha=up:active}
        mgr active: admin
     osdmap e877: 8 osds: 7 up, 7 in; 6 remapped pgs
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v681735: 1864 pgs, 7 pools, 12416 MB data, 354 kobjects
            79963 MB used, 7837 GB / 7915 GB avail
                1819 active+clean
                  38 stale+active+clean
                   6 down
                   1 down+remapped

Just the ones that were only on the OSD that doesn't want to come up.

^ permalink raw reply	[flat|nested] 5+ messages in thread