All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
       [not found]                               ` <f9adb4b2dcada947f418b6f95ad7a8d1@mail.meizo.com>
@ 2015-04-28 20:19                                 ` Sage Weil
       [not found]                                   ` <alpine.DEB.2.00.1504281256440.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2015-04-28 20:19 UTC (permalink / raw)
  To: Tuomas Juntunen; +Cc: ceph-users, ceph-devel

[adding ceph-devel]

Okay, I see the problem.  This seems to be unrelated ot the giant -> 
hammer move... it's a result of the tiering changes you made:

> > > > > > The following:
> > > > > > 
> > > > > > ceph osd tier add img images --force-nonempty
> > > > > > ceph osd tier cache-mode images forward 
> > > > > > ceph osd tier set-overlay img images

Specifically, --force-nonempty bypassed important safety checks.

1. images had snapshots (and removed_snaps)

2. images was added as a tier *of* img, and img's removed_snaps was copied 
to images, clobbering the removed_snaps value (see 
OSDMap::Incremental::propagate_snaps_to_tiers)

3. tiering relation was undone, but removed_snaps was still gone

4. on OSD startup, when we load the PG, removed_snaps is initialized with 
the older map.  later, in PGPool::update(), we assume that removed_snaps 
alwasy grows (never shrinks) and we trigger an assert.

To fix this I think we need to do 2 things:

1. make the OSD forgiving out removed_snaps getting smaller.  This is 
probably a good thing anyway: once we know snaps are removed on all OSDs 
we can prune the interval_set in the OSDMap.  Maybe.

2. Fix the mon to prevent this from happening, *even* when 
--force-nonempty is specified.  (This is the root cause.)

I've opened http://tracker.ceph.com/issues/11493 to track this.

sage

    

> > > > > > 
> > > > > > Idea was to make images as a tier to img, move data to img 
> > > > > > then change
> > > > > clients to use the new img pool.
> > > > > > 
> > > > > > Br,
> > > > > > Tuomas
> > > > > > 
> > > > > > > Can you explain exactly what you mean by:
> > > > > > >
> > > > > > > "Also I created one pool for tier to be able to move data 
> > > > > > > without
> > > > > outage."
> > > > > > >
> > > > > > > -Sam
> > > > > > > ----- Original Message -----
> > > > > > > From: "tuomas juntunen" <tuomas.juntunen@databasement.fi>
> > > > > > > To: "Ian Colle" <icolle@redhat.com>
> > > > > > > Cc: ceph-users@lists.ceph.com
> > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and 
> > > > > > > after some basic operations most of the OSD's went down
> > > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > Any solution for this yet?
> > > > > > >
> > > > > > > Br,
> > > > > > > Tuomas
> > > > > > >
> > > > > > >> It looks like you may have hit
> > > > > > >> http://tracker.ceph.com/issues/7915
> > > > > > >>
> > > > > > >> Ian R. Colle
> > > > > > >> Global Director
> > > > > > >> of Software Engineering
> > > > > > >> Red Hat (Inktank is now part of Red Hat!) 
> > > > > > >> http://www.linkedin.com/in/ircolle
> > > > > > >> http://www.twitter.com/ircolle
> > > > > > >> Cell: +1.303.601.7713
> > > > > > >> Email: icolle@redhat.com
> > > > > > >>
> > > > > > >> ----- Original Message -----
> > > > > > >> From: "tuomas juntunen" <tuomas.juntunen@databasement.fi>
> > > > > > >> To: ceph-users@lists.ceph.com
> > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and 
> > > > > > >> after some basic operations most of the OSD's went down
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> > > > > > >>
> > > > > > >> Then created new pools and deleted some old ones. Also I 
> > > > > > >> created one pool for tier to be able to move data without
> outage.
> > > > > > >>
> > > > > > >> After these operations all but 10 OSD's are down and 
> > > > > > >> creating this kind of messages to logs, I get more than 
> > > > > > >> 100gb of these in a
> > > > night:
> > > > > > >>
> > > > > > >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23
> pg_epoch:
> > 
> > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > >> ec=1 les/c
> > > > > > >> 16609/16659
> > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 
> > > > > > >> pi=15659-16589/42
> > > > > > >> crt=8480'7 lcod
> > > > > > >> 0'0 inactive NOTIFY] enter Started
> > > > > > >>    -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > >> ec=1 les/c
> > > > > > >> 16609/16659
> > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 
> > > > > > >> pi=15659-16589/42
> > > > > > >> crt=8480'7 lcod
> > > > > > >> 0'0 inactive NOTIFY] enter Start
> > > > > > >>    -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > >> ec=1 les/c
> > > > > > >> 16609/16659
> > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 
> > > > > > >> pi=15659-16589/42
> > > > > > >> crt=8480'7 lcod
> > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> > > > > > >>    -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > >> ec=1 les/c
> > > > > > >> 16609/16659
> > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 
> > > > > > >> pi=15659-16589/42
> > > > > > >> crt=8480'7 lcod
> > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> > > > > > >>    -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > >> ec=1 les/c
> > > > > > >> 16609/16659
> > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 
> > > > > > >> pi=15659-16589/42
> > > > > > >> crt=8480'7 lcod
> > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> > > > > > >>    -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > >> NOTIFY] exit Reset 0.119467 4 0.000037
> > > > > > >>    -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > >> NOTIFY] enter Started
> > > > > > >>    -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > >> NOTIFY] enter Start
> > > > > > >>    -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > >> NOTIFY]
> > > > > > >> state<Start>: transitioning to Stray
> > > > > > >>    -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > >> NOTIFY] exit Start 0.000020 0 0.000000
> > > > > > >>     -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > >> NOTIFY] enter Started/Stray
> > > > > > >>     -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > >> inactive] exit Reset 7.511623 45 0.000165
> > > > > > >>     -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > >> inactive] enter Started
> > > > > > >>     -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > >> inactive] enter Start
> > > > > > >>     -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > >> inactive]
> > > > > > >> state<Start>: transitioning to Primary
> > > > > > >>     -4> 2015-04-27 10:17:08.809479 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > >> inactive] exit Start 0.000023 0 0.000000
> > > > > > >>     -3> 2015-04-27 10:17:08.809492 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > >> inactive] enter Started/Primary
> > > > > > >>     -2> 2015-04-27 10:17:08.809502 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > >> inactive] enter Started/Primary/Peering
> > > > > > >>     -1> 2015-04-27 10:17:08.809513 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > >> peering] enter Started/Primary/Peering/GetInfo
> > > > > > >>      0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> > > > > ./include/interval_set.h:
> > > > > > >> In
> > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
> snapid_t]' 
> > > > > > >> thread
> > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >= 0)
> > > > > > >>
> > > > > > >>  ceph version 0.94.1
> > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> > > > > > >>  1: (ceph::__ceph_assert_fail(char const*, char const*, 
> > > > > > >> int, char
> > > > > > >> const*)+0x8b)
> > > > > > >> [0xbc271b]
> > > > > > >>  2: 
> > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t>
> > > > > > >> const&)+0xb0) [0x82cd50]
> > > > > > >>  3: (PGPool::update(std::tr1::shared_ptr<OSDMap
> > > > > > >> const>)+0x52e) [0x80113e]
> > > > > > >>  4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>, 
> > > > > > >> const>std::vector<int,
> > > > > > >> std::allocator<int> >&, int, std::vector<int, 
> > > > > > >> std::allocator<int>
> > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> > > > > > >>  5: (OSD::advance_pg(unsigned int, PG*, 
> > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*, 
> > > > > > >> std::set<boost::intrusive_ptr<PG>,
> > > > > > >> std::less<boost::intrusive_ptr<PG> >, 
> > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3) 
> > > > > > >> [0x6b0e43]
> > > > > > >>  6: (OSD::process_peering_events(std::list<PG*,
> > > > > > >> std::allocator<PG*>
> > > > > > >> > const&,
> > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> > > > > > >>  7: (OSD::PeeringWQ::_process(std::list<PG*,
> > > > > > >> std::allocator<PG*>
> > > > > > >> > const&,
> > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> > > > > > >>  8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> > > > > > >> [0xbb38ae]
> > > > > > >>  9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> > > > > > >>  10: (()+0x8182) [0x7fd906946182]
> > > > > > >>  11: (clone()+0x6d) [0x7fd904eb147d]
> > > > > > >>
> > > > > > >> Also by monitoring (ceph -w) I get the following messages, 
> > > > > > >> also lots of
> > > > > them.
> > > > > > >>
> > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> > > > > 10.20.0.13:0/1174409'
> > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush create-or-move",
> > "args":
> > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]: 
> > > > > > >> dispatch
> > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> > > > > 10.20.0.13:0/1174483'
> > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush create-or-move",
> > "args":
> > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]: 
> > > > > > >> dispatch
> > > > > > >>
> > > > > > >>
> > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are also 
> > > > > > >> mons and mds's to save servers. All run Ubuntu 14.04.2.
> > > > > > >>
> > > > > > >> I have pretty much tried everything I could think of.
> > > > > > >>
> > > > > > >> Restarting daemons doesn't help.
> > > > > > >>
> > > > > > >> Any help would be appreciated. I can also provide more logs 
> > > > > > >> if necessary. They just seem to get pretty large in few
> moments.
> > > > > > >>
> > > > > > >> Thank you
> > > > > > >> Tuomas
> > > > > > >>
> > > > > > >>
> > > > > > >> _______________________________________________
> > > > > > >> ceph-users mailing list
> > > > > > >> ceph-users@lists.ceph.com
> > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list
> > > > > > > ceph-users@lists.ceph.com
> > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > 
> > > > > > 
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list
> > > > > > ceph-users@lists.ceph.com
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list
> > > > > > ceph-users@lists.ceph.com
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
       [not found]                                   ` <alpine.DEB.2.00.1504281256440.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2015-04-28 20:57                                     ` Sage Weil
       [not found]                                       ` <alpine.DEB.2.00.1504281355130.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2015-04-28 20:57 UTC (permalink / raw)
  To: Tuomas Juntunen
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA

Hi Tuomas,

I've pushed an updated wip-hammer-snaps branch.  Can you please try it?  
The build will appear here

	http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e2eb514067f72afda11bcde286

(or a similar url; adjust for your distro).

Thanks!
sage


On Tue, 28 Apr 2015, Sage Weil wrote:

> [adding ceph-devel]
> 
> Okay, I see the problem.  This seems to be unrelated ot the giant -> 
> hammer move... it's a result of the tiering changes you made:
> 
> > > > > > > The following:
> > > > > > > 
> > > > > > > ceph osd tier add img images --force-nonempty
> > > > > > > ceph osd tier cache-mode images forward 
> > > > > > > ceph osd tier set-overlay img images
> 
> Specifically, --force-nonempty bypassed important safety checks.
> 
> 1. images had snapshots (and removed_snaps)
> 
> 2. images was added as a tier *of* img, and img's removed_snaps was copied 
> to images, clobbering the removed_snaps value (see 
> OSDMap::Incremental::propagate_snaps_to_tiers)
> 
> 3. tiering relation was undone, but removed_snaps was still gone
> 
> 4. on OSD startup, when we load the PG, removed_snaps is initialized with 
> the older map.  later, in PGPool::update(), we assume that removed_snaps 
> alwasy grows (never shrinks) and we trigger an assert.
> 
> To fix this I think we need to do 2 things:
> 
> 1. make the OSD forgiving out removed_snaps getting smaller.  This is 
> probably a good thing anyway: once we know snaps are removed on all OSDs 
> we can prune the interval_set in the OSDMap.  Maybe.
> 
> 2. Fix the mon to prevent this from happening, *even* when 
> --force-nonempty is specified.  (This is the root cause.)
> 
> I've opened http://tracker.ceph.com/issues/11493 to track this.
> 
> sage
> 
>     
> 
> > > > > > > 
> > > > > > > Idea was to make images as a tier to img, move data to img 
> > > > > > > then change
> > > > > > clients to use the new img pool.
> > > > > > > 
> > > > > > > Br,
> > > > > > > Tuomas
> > > > > > > 
> > > > > > > > Can you explain exactly what you mean by:
> > > > > > > >
> > > > > > > > "Also I created one pool for tier to be able to move data 
> > > > > > > > without
> > > > > > outage."
> > > > > > > >
> > > > > > > > -Sam
> > > > > > > > ----- Original Message -----
> > > > > > > > From: "tuomas juntunen" <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > > To: "Ian Colle" <icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > > > > > Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and 
> > > > > > > > after some basic operations most of the OSD's went down
> > > > > > > >
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > Any solution for this yet?
> > > > > > > >
> > > > > > > > Br,
> > > > > > > > Tuomas
> > > > > > > >
> > > > > > > >> It looks like you may have hit
> > > > > > > >> http://tracker.ceph.com/issues/7915
> > > > > > > >>
> > > > > > > >> Ian R. Colle
> > > > > > > >> Global Director
> > > > > > > >> of Software Engineering
> > > > > > > >> Red Hat (Inktank is now part of Red Hat!) 
> > > > > > > >> http://www.linkedin.com/in/ircolle
> > > > > > > >> http://www.twitter.com/ircolle
> > > > > > > >> Cell: +1.303.601.7713
> > > > > > > >> Email: icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > > > > > > >>
> > > > > > > >> ----- Original Message -----
> > > > > > > >> From: "tuomas juntunen" <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > >> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and 
> > > > > > > >> after some basic operations most of the OSD's went down
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> > > > > > > >>
> > > > > > > >> Then created new pools and deleted some old ones. Also I 
> > > > > > > >> created one pool for tier to be able to move data without
> > outage.
> > > > > > > >>
> > > > > > > >> After these operations all but 10 OSD's are down and 
> > > > > > > >> creating this kind of messages to logs, I get more than 
> > > > > > > >> 100gb of these in a
> > > > > night:
> > > > > > > >>
> > > > > > > >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Started
> > > > > > > >>    -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Start
> > > > > > > >>    -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> > > > > > > >>    -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> > > > > > > >>    -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> > > > > > > >>    -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > > >> NOTIFY] exit Reset 0.119467 4 0.000037
> > > > > > > >>    -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > > >> NOTIFY] enter Started
> > > > > > > >>    -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > > >> NOTIFY] enter Start
> > > > > > > >>    -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > > >> NOTIFY]
> > > > > > > >> state<Start>: transitioning to Stray
> > > > > > > >>    -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > > >> NOTIFY] exit Start 0.000020 0 0.000000
> > > > > > > >>     -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > > > > >> NOTIFY] enter Started/Stray
> > > > > > > >>     -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > > >> inactive] exit Reset 7.511623 45 0.000165
> > > > > > > >>     -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > > >> inactive] enter Started
> > > > > > > >>     -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > > >> inactive] enter Start
> > > > > > > >>     -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > > >> inactive]
> > > > > > > >> state<Start>: transitioning to Primary
> > > > > > > >>     -4> 2015-04-27 10:17:08.809479 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > > >> inactive] exit Start 0.000023 0 0.000000
> > > > > > > >>     -3> 2015-04-27 10:17:08.809492 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > > >> inactive] enter Started/Primary
> > > > > > > >>     -2> 2015-04-27 10:17:08.809502 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > > >> inactive] enter Started/Primary/Peering
> > > > > > > >>     -1> 2015-04-27 10:17:08.809513 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > > > > >> peering] enter Started/Primary/Peering/GetInfo
> > > > > > > >>      0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> > > > > > ./include/interval_set.h:
> > > > > > > >> In
> > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
> > snapid_t]' 
> > > > > > > >> thread
> > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >= 0)
> > > > > > > >>
> > > > > > > >>  ceph version 0.94.1
> > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> > > > > > > >>  1: (ceph::__ceph_assert_fail(char const*, char const*, 
> > > > > > > >> int, char
> > > > > > > >> const*)+0x8b)
> > > > > > > >> [0xbc271b]
> > > > > > > >>  2: 
> > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t>
> > > > > > > >> const&)+0xb0) [0x82cd50]
> > > > > > > >>  3: (PGPool::update(std::tr1::shared_ptr<OSDMap
> > > > > > > >> const>)+0x52e) [0x80113e]
> > > > > > > >>  4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>, 
> > > > > > > >> const>std::vector<int,
> > > > > > > >> std::allocator<int> >&, int, std::vector<int, 
> > > > > > > >> std::allocator<int>
> > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> > > > > > > >>  5: (OSD::advance_pg(unsigned int, PG*, 
> > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*, 
> > > > > > > >> std::set<boost::intrusive_ptr<PG>,
> > > > > > > >> std::less<boost::intrusive_ptr<PG> >, 
> > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3) 
> > > > > > > >> [0x6b0e43]
> > > > > > > >>  6: (OSD::process_peering_events(std::list<PG*,
> > > > > > > >> std::allocator<PG*>
> > > > > > > >> > const&,
> > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> > > > > > > >>  7: (OSD::PeeringWQ::_process(std::list<PG*,
> > > > > > > >> std::allocator<PG*>
> > > > > > > >> > const&,
> > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> > > > > > > >>  8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> > > > > > > >> [0xbb38ae]
> > > > > > > >>  9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> > > > > > > >>  10: (()+0x8182) [0x7fd906946182]
> > > > > > > >>  11: (clone()+0x6d) [0x7fd904eb147d]
> > > > > > > >>
> > > > > > > >> Also by monitoring (ceph -w) I get the following messages, 
> > > > > > > >> also lots of
> > > > > > them.
> > > > > > > >>
> > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> > > > > > 10.20.0.13:0/1174409'
> > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush create-or-move",
> > > "args":
> > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]: 
> > > > > > > >> dispatch
> > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> > > > > > 10.20.0.13:0/1174483'
> > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush create-or-move",
> > > "args":
> > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]: 
> > > > > > > >> dispatch
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are also 
> > > > > > > >> mons and mds's to save servers. All run Ubuntu 14.04.2.
> > > > > > > >>
> > > > > > > >> I have pretty much tried everything I could think of.
> > > > > > > >>
> > > > > > > >> Restarting daemons doesn't help.
> > > > > > > >>
> > > > > > > >> Any help would be appreciated. I can also provide more logs 
> > > > > > > >> if necessary. They just seem to get pretty large in few
> > moments.
> > > > > > > >>
> > > > > > > >> Thank you
> > > > > > > >> Tuomas
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> _______________________________________________
> > > > > > > >> ceph-users mailing list
> > > > > > > >> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > ceph-users mailing list
> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > 
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list
> > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list
> > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > 
> > 
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
       [not found]                                       ` <alpine.DEB.2.00.1504281355130.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2015-04-29  4:16                                         ` Tuomas Juntunen
       [not found]                                           ` <81216125e573cf00539f61cc090b282b-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Tuomas Juntunen @ 2015-04-29  4:16 UTC (permalink / raw)
  To: 'Sage Weil'
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 17530 bytes --]

Hi

I updated that version and it seems that something did happen, the osd's
stayed up for a while and 'ceph status' got updated. But then in couple of
minutes, they all went down the same way.

I have attached new 'ceph osd dump -f json-pretty' and got a new log from
one of the osd's with osd debug = 20,
http://beta.xaasbox.com/ceph/ceph-osd.15.log

Thank you!

Br,
Tuomas



-----Original Message-----
From: Sage Weil [mailto:sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org] 
Sent: 28. huhtikuuta 2015 23:57
To: Tuomas Juntunen
Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org; ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
operations most of the OSD's went down

Hi Tuomas,

I've pushed an updated wip-hammer-snaps branch.  Can you please try it?  
The build will appear here

	
http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
2eb514067f72afda11bcde286

(or a similar url; adjust for your distro).

Thanks!
sage


On Tue, 28 Apr 2015, Sage Weil wrote:

> [adding ceph-devel]
> 
> Okay, I see the problem.  This seems to be unrelated ot the giant -> 
> hammer move... it's a result of the tiering changes you made:
> 
> > > > > > > The following:
> > > > > > > 
> > > > > > > ceph osd tier add img images --force-nonempty ceph osd 
> > > > > > > tier cache-mode images forward ceph osd tier set-overlay 
> > > > > > > img images
> 
> Specifically, --force-nonempty bypassed important safety checks.
> 
> 1. images had snapshots (and removed_snaps)
> 
> 2. images was added as a tier *of* img, and img's removed_snaps was 
> copied to images, clobbering the removed_snaps value (see
> OSDMap::Incremental::propagate_snaps_to_tiers)
> 
> 3. tiering relation was undone, but removed_snaps was still gone
> 
> 4. on OSD startup, when we load the PG, removed_snaps is initialized 
> with the older map.  later, in PGPool::update(), we assume that 
> removed_snaps alwasy grows (never shrinks) and we trigger an assert.
> 
> To fix this I think we need to do 2 things:
> 
> 1. make the OSD forgiving out removed_snaps getting smaller.  This is 
> probably a good thing anyway: once we know snaps are removed on all 
> OSDs we can prune the interval_set in the OSDMap.  Maybe.
> 
> 2. Fix the mon to prevent this from happening, *even* when 
> --force-nonempty is specified.  (This is the root cause.)
> 
> I've opened http://tracker.ceph.com/issues/11493 to track this.
> 
> sage
> 
>     
> 
> > > > > > > 
> > > > > > > Idea was to make images as a tier to img, move data to img 
> > > > > > > then change
> > > > > > clients to use the new img pool.
> > > > > > > 
> > > > > > > Br,
> > > > > > > Tuomas
> > > > > > > 
> > > > > > > > Can you explain exactly what you mean by:
> > > > > > > >
> > > > > > > > "Also I created one pool for tier to be able to move 
> > > > > > > > data without
> > > > > > outage."
> > > > > > > >
> > > > > > > > -Sam
> > > > > > > > ----- Original Message -----
> > > > > > > > From: "tuomas juntunen" 
> > > > > > > > <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > > To: "Ian Colle" <icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > > > > > Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer 
> > > > > > > > and after some basic operations most of the OSD's went 
> > > > > > > > down
> > > > > > > >
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > Any solution for this yet?
> > > > > > > >
> > > > > > > > Br,
> > > > > > > > Tuomas
> > > > > > > >
> > > > > > > >> It looks like you may have hit
> > > > > > > >> http://tracker.ceph.com/issues/7915
> > > > > > > >>
> > > > > > > >> Ian R. Colle
> > > > > > > >> Global Director
> > > > > > > >> of Software Engineering Red Hat (Inktank is now part of 
> > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
> > > > > > > >> http://www.twitter.com/ircolle
> > > > > > > >> Cell: +1.303.601.7713
> > > > > > > >> Email: icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > > > > > > >>
> > > > > > > >> ----- Original Message -----
> > > > > > > >> From: "tuomas juntunen" 
> > > > > > > >> <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > >> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and 
> > > > > > > >> after some basic operations most of the OSD's went down
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> > > > > > > >>
> > > > > > > >> Then created new pools and deleted some old ones. Also 
> > > > > > > >> I created one pool for tier to be able to move data 
> > > > > > > >> without
> > outage.
> > > > > > > >>
> > > > > > > >> After these operations all but 10 OSD's are down and 
> > > > > > > >> creating this kind of messages to logs, I get more than 
> > > > > > > >> 100gb of these in a
> > > > > night:
> > > > > > > >>
> > > > > > > >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23
> > pg_epoch:
> > > 
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 
> > > > > > > >> n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Started
> > > > > > > >>    -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 
> > > > > > > >> n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Start
> > > > > > > >>    -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 
> > > > > > > >> n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> > > > > > > >>    -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 
> > > > > > > >> n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> > > > > > > >>    -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 
> > > > > > > >> n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> > > > > > > >>    -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
> > > > > > > >>    -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > >> inactive NOTIFY] enter Started
> > > > > > > >>    -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > >> inactive NOTIFY] enter Start
> > > > > > > >>    -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > >> inactive NOTIFY]
> > > > > > > >> state<Start>: transitioning to Stray
> > > > > > > >>    -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
> > > > > > > >>     -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > >> inactive NOTIFY] enter Started/Stray
> > > > > > > >>     -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
> > > > > > > >>     -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > >> 0'0 inactive] enter Started
> > > > > > > >>     -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > >> 0'0 inactive] enter Start
> > > > > > > >>     -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > >> 0'0 inactive]
> > > > > > > >> state<Start>: transitioning to Primary
> > > > > > > >>     -4> 2015-04-27 10:17:08.809479 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
> > > > > > > >>     -3> 2015-04-27 10:17:08.809492 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > >> 0'0 inactive] enter Started/Primary
> > > > > > > >>     -2> 2015-04-27 10:17:08.809502 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
> > > > > > > >>     -1> 2015-04-27 10:17:08.809513 7fd8e748d700  5 
> > > > > > > >> osd.23
> > > pg_epoch:
> > > > 
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
> > > > > > > >>      0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> > > > > > ./include/interval_set.h:
> > > > > > > >> In
> > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
> > snapid_t]' 
> > > > > > > >> thread
> > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >= 
> > > > > > > >> 0)
> > > > > > > >>
> > > > > > > >>  ceph version 0.94.1
> > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> > > > > > > >>  1: (ceph::__ceph_assert_fail(char const*, char const*, 
> > > > > > > >> int, char
> > > > > > > >> const*)+0x8b)
> > > > > > > >> [0xbc271b]
> > > > > > > >>  2: 
> > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
> > > > > > > >> >
> > > > > > > >> const&)+0xb0) [0x82cd50]
> > > > > > > >>  3: (PGPool::update(std::tr1::shared_ptr<OSDMap
> > > > > > > >> const>)+0x52e) [0x80113e]
> > > > > > > >>  4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>, 
> > > > > > > >> const>std::vector<int,
> > > > > > > >> std::allocator<int> >&, int, std::vector<int, 
> > > > > > > >> std::allocator<int>
> > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> > > > > > > >>  5: (OSD::advance_pg(unsigned int, PG*, 
> > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*, 
> > > > > > > >> std::set<boost::intrusive_ptr<PG>,
> > > > > > > >> std::less<boost::intrusive_ptr<PG> >, 
> > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3) 
> > > > > > > >> [0x6b0e43]
> > > > > > > >>  6: (OSD::process_peering_events(std::list<PG*,
> > > > > > > >> std::allocator<PG*>
> > > > > > > >> > const&,
> > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> > > > > > > >>  7: (OSD::PeeringWQ::_process(std::list<PG*,
> > > > > > > >> std::allocator<PG*>
> > > > > > > >> > const&,
> > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> > > > > > > >>  8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> > > > > > > >> [0xbb38ae]
> > > > > > > >>  9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> > > > > > > >>  10: (()+0x8182) [0x7fd906946182]
> > > > > > > >>  11: (clone()+0x6d) [0x7fd904eb147d]
> > > > > > > >>
> > > > > > > >> Also by monitoring (ceph -w) I get the following 
> > > > > > > >> messages, also lots of
> > > > > > them.
> > > > > > > >>
> > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> > > > > > 10.20.0.13:0/1174409'
> > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush 
> > > > > > > >> create-or-move",
> > > "args":
> > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]:

> > > > > > > >> dispatch
> > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> > > > > > 10.20.0.13:0/1174483'
> > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush 
> > > > > > > >> create-or-move",
> > > "args":
> > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]:

> > > > > > > >> dispatch
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are 
> > > > > > > >> also mons and mds's to save servers. All run Ubuntu
14.04.2.
> > > > > > > >>
> > > > > > > >> I have pretty much tried everything I could think of.
> > > > > > > >>
> > > > > > > >> Restarting daemons doesn't help.
> > > > > > > >>
> > > > > > > >> Any help would be appreciated. I can also provide more 
> > > > > > > >> logs if necessary. They just seem to get pretty large 
> > > > > > > >> in few
> > moments.
> > > > > > > >>
> > > > > > > >> Thank you
> > > > > > > >> Tuomas
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> _______________________________________________
> > > > > > > >> ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > ceph-users mailing list
> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > 
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list
> > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list
> > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > 
> > 
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

[-- Attachment #2: 18610json.pretty.txt --]
[-- Type: text/plain, Size: 93942 bytes --]


{
    "epoch": 18610,
    "fsid": "a2974742-3805-4cd3-bc79-765f2bddaefe",
    "created": "2014-10-15 20:43:45.186949",
    "modified": "2015-04-29 06:49:32.691995",
    "flags": "",
    "cluster_snapshot": "",
    "pool_max": 17,
    "max_osd": 71,
    "pools": [
        {
            "pool": 0,
            "pool_name": "data",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 3,
            "min_size": 1,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 4096,
            "pg_placement_num": 4096,
            "crash_replay_interval": 45,
            "last_change": "1112",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 0,
            "cache_target_full_ratio_micro": 0,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 1,
            "stripe_width": 0,
            "expected_num_objects": 0
        },
        {
            "pool": 1,
            "pool_name": "metadata",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 3,
            "min_size": 1,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 4096,
            "pg_placement_num": 4096,
            "crash_replay_interval": 0,
            "last_change": "1114",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 0,
            "cache_target_full_ratio_micro": 0,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 1,
            "stripe_width": 0,
            "expected_num_objects": 0
        },
        {
            "pool": 2,
            "pool_name": "rbd",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 2,
            "min_size": 1,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 4096,
            "pg_placement_num": 4096,
            "crash_replay_interval": 0,
            "last_change": "1116",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 0,
            "cache_target_full_ratio_micro": 0,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 1,
            "stripe_width": 0,
            "expected_num_objects": 0
        },
        {
            "pool": 3,
            "pool_name": "volumes",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 3,
            "min_size": 1,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 4096,
            "pg_placement_num": 4096,
            "crash_replay_interval": 0,
            "last_change": "9974",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 23,
            "snap_epoch": 9974,
            "pool_snaps": [],
            "removed_snaps": "[1~17]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 400000,
            "cache_target_full_ratio_micro": 800000,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "default",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 1,
            "stripe_width": 0,
            "expected_num_objects": 0
        },
        {
            "pool": 4,
            "pool_name": "images",
            "flags": 9,
            "flags_names": "hashpspool,incomplete_clones",
            "type": 1,
            "size": 3,
            "min_size": 1,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 4096,
            "pg_placement_num": 4096,
            "crash_replay_interval": 0,
            "last_change": "17905",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 17882,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 0,
            "cache_target_full_ratio_micro": 0,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "default",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 1,
            "stripe_width": 0,
            "expected_num_objects": 0
        },
        {
            "pool": 6,
            "pool_name": "vms",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 3,
            "min_size": 1,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 4096,
            "pg_placement_num": 4096,
            "crash_replay_interval": 0,
            "last_change": "1122",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 400000,
            "cache_target_full_ratio_micro": 800000,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "default",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 1,
            "stripe_width": 0,
            "expected_num_objects": 0
        },
        {
            "pool": 7,
            "pool_name": "san",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 3,
            "min_size": 1,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 4096,
            "pg_placement_num": 4096,
            "crash_replay_interval": 0,
            "last_change": "14096",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 400000,
            "cache_target_full_ratio_micro": 800000,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 0,
            "stripe_width": 0,
            "expected_num_objects": 0
        },
        {
            "pool": 8,
            "pool_name": "vol-ssd-accelerated",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 3,
            "min_size": 2,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 1024,
            "pg_placement_num": 1024,
            "crash_replay_interval": 0,
            "last_change": "17861",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 400000,
            "cache_target_full_ratio_micro": 800000,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 0,
            "stripe_width": 0,
            "expected_num_objects": 0
        },
        {
            "pool": 14,
            "pool_name": "backup",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 3,
            "min_size": 2,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 128,
            "pg_placement_num": 128,
            "crash_replay_interval": 0,
            "last_change": "18018",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 400000,
            "cache_target_full_ratio_micro": 800000,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 0,
            "stripe_width": 0,
            "expected_num_objects": 0
        },
        {
            "pool": 15,
            "pool_name": "img",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 3,
            "min_size": 2,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 256,
            "pg_placement_num": 256,
            "crash_replay_interval": 0,
            "last_change": "18019",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 400000,
            "cache_target_full_ratio_micro": 800000,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 0,
            "stripe_width": 0,
            "expected_num_objects": 0
        },
        {
            "pool": 16,
            "pool_name": "vm",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 3,
            "min_size": 2,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 1024,
            "pg_placement_num": 1024,
            "crash_replay_interval": 0,
            "last_change": "18020",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 400000,
            "cache_target_full_ratio_micro": 800000,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 0,
            "stripe_width": 0,
            "expected_num_objects": 0
        },
        {
            "pool": 17,
            "pool_name": "infradisks",
            "flags": 1,
            "flags_names": "hashpspool",
            "type": 1,
            "size": 3,
            "min_size": 2,
            "crush_ruleset": 0,
            "object_hash": 2,
            "pg_num": 256,
            "pg_placement_num": 256,
            "crash_replay_interval": 0,
            "last_change": "18021",
            "last_force_op_resend": "0",
            "auid": 0,
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",
            "quota_max_bytes": 0,
            "quota_max_objects": 0,
            "tiers": [],
            "tier_of": -1,
            "read_tier": -1,
            "write_tier": -1,
            "cache_mode": "none",
            "target_max_bytes": 0,
            "target_max_objects": 0,
            "cache_target_dirty_ratio_micro": 400000,
            "cache_target_full_ratio_micro": 800000,
            "cache_min_flush_age": 0,
            "cache_min_evict_age": 0,
            "erasure_code_profile": "",
            "hit_set_params": {
                "type": "none"
            },
            "hit_set_period": 0,
            "hit_set_count": 0,
            "min_read_recency_for_promote": 0,
            "stripe_width": 0,
            "expected_num_objects": 0
        }
    ],
    "osds": [
        {
            "osd": 0,
            "uuid": "757c3bc5-4d00-4344-8de4-82f5379c96af",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15738,
            "last_clean_end": 17882,
            "up_from": 18352,
            "up_thru": 18353,
            "down_at": 18415,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6833\/2259607",
            "cluster_addr": "10.20.0.11:6836\/2259607",
            "heartbeat_back_addr": "10.20.0.11:6853\/2259607",
            "heartbeat_front_addr": "10.20.0.11:6856\/2259607",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 1,
            "uuid": "c7eaa4ac-99fc-46db-84aa-a67274896ec8",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15740,
            "last_clean_end": 17882,
            "up_from": 18350,
            "up_thru": 18352,
            "down_at": 18403,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6813\/2259893",
            "cluster_addr": "10.20.0.11:6814\/2259893",
            "heartbeat_back_addr": "10.20.0.11:6815\/2259893",
            "heartbeat_front_addr": "10.20.0.11:6825\/2259893",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 2,
            "uuid": "206b2949-4adf-4789-8e06-f68a8ee819c9",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15739,
            "last_clean_end": 17882,
            "up_from": 18348,
            "up_thru": 18348,
            "down_at": 18415,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6809\/2259657",
            "cluster_addr": "10.20.0.11:6810\/2259657",
            "heartbeat_back_addr": "10.20.0.11:6811\/2259657",
            "heartbeat_front_addr": "10.20.0.11:6812\/2259657",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 3,
            "uuid": "90b7c219-4dcd-48ea-a24d-f3b796a521e4",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15736,
            "last_clean_end": 17882,
            "up_from": 18346,
            "up_thru": 18346,
            "down_at": 18412,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6829\/2257497",
            "cluster_addr": "10.20.0.11:6830\/2257497",
            "heartbeat_back_addr": "10.20.0.11:6831\/2257497",
            "heartbeat_front_addr": "10.20.0.11:6832\/2257497",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 4,
            "uuid": "049ef94f-121a-4e71-8ba6-27eaebf0a569",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15737,
            "last_clean_end": 17883,
            "up_from": 18342,
            "up_thru": 18345,
            "down_at": 18415,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6861\/2257349",
            "cluster_addr": "10.20.0.11:6862\/2257349",
            "heartbeat_back_addr": "10.20.0.11:6863\/2257349",
            "heartbeat_front_addr": "10.20.0.11:6864\/2257349",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 5,
            "uuid": "2437a53b-339e-45af-b4de-0fc675d27405",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15734,
            "last_clean_end": 17882,
            "up_from": 18347,
            "up_thru": 18347,
            "down_at": 18403,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6821\/2256278",
            "cluster_addr": "10.20.0.11:6822\/2256278",
            "heartbeat_back_addr": "10.20.0.11:6823\/2256278",
            "heartbeat_front_addr": "10.20.0.11:6824\/2256278",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 6,
            "uuid": "f117ceed-b1fd-4069-99fe-b7aba9f3ef8d",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15738,
            "last_clean_end": 17882,
            "up_from": 18349,
            "up_thru": 18349,
            "down_at": 18415,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6854\/2257155",
            "cluster_addr": "10.20.0.11:6855\/2257155",
            "heartbeat_back_addr": "10.20.0.11:6857\/2257155",
            "heartbeat_front_addr": "10.20.0.11:6859\/2257155",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 7,
            "uuid": "e98e9b8a-9c62-4e3e-bdb4-c2c30103c0c1",
            "up": 0,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15730,
            "last_clean_end": 17883,
            "up_from": 18345,
            "up_thru": 18345,
            "down_at": 18419,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6873\/2258645",
            "cluster_addr": "10.20.0.11:6874\/2258645",
            "heartbeat_back_addr": "10.20.0.11:6875\/2258645",
            "heartbeat_front_addr": "10.20.0.11:6876\/2258645",
            "state": [
                "exists"
            ]
        },
        {
            "osd": 8,
            "uuid": "41e471cd-fafe-4422-8bf5-22018bbe1375",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 17795,
            "last_clean_end": 17882,
            "up_from": 18346,
            "up_thru": 18346,
            "down_at": 18412,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6877\/2258943",
            "cluster_addr": "10.20.0.11:6878\/2258943",
            "heartbeat_back_addr": "10.20.0.11:6879\/2258943",
            "heartbeat_front_addr": "10.20.0.11:6880\/2258943",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 9,
            "uuid": "d68eeebd-d058-4b1c-a30a-994bf8fc8030",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15733,
            "last_clean_end": 17882,
            "up_from": 18347,
            "up_thru": 18347,
            "down_at": 18410,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6849\/2258152",
            "cluster_addr": "10.20.0.11:6850\/2258152",
            "heartbeat_back_addr": "10.20.0.11:6851\/2258152",
            "heartbeat_front_addr": "10.20.0.11:6852\/2258152",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 10,
            "uuid": "660747d6-3f47-449a-bc69-5399b0d54ff6",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15736,
            "last_clean_end": 17883,
            "up_from": 18345,
            "up_thru": 18346,
            "down_at": 18403,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6841\/2256646",
            "cluster_addr": "10.20.0.11:6842\/2256646",
            "heartbeat_back_addr": "10.20.0.11:6843\/2256646",
            "heartbeat_front_addr": "10.20.0.11:6844\/2256646",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 11,
            "uuid": "805965b1-127f-44a6-9a05-8a643eb7a512",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 17801,
            "last_clean_end": 17882,
            "up_from": 18349,
            "up_thru": 18351,
            "down_at": 18439,
            "lost_at": 0,
            "public_addr": "10.20.0.11:6801\/2257816",
            "cluster_addr": "10.20.0.11:6802\/2257816",
            "heartbeat_back_addr": "10.20.0.11:6803\/2257816",
            "heartbeat_front_addr": "10.20.0.11:6804\/2257816",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 12,
            "uuid": "61fbfcbe-d642-478f-9620-f9d72ee96238",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15783,
            "last_clean_end": 17882,
            "up_from": 18162,
            "up_thru": 18162,
            "down_at": 18208,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6833\/3261949",
            "cluster_addr": "10.20.0.12:6834\/3261949",
            "heartbeat_back_addr": "10.20.0.12:6835\/3261949",
            "heartbeat_front_addr": "10.20.0.12:6836\/3261949",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 13,
            "uuid": "6faad33b-00be-42a4-92ba-08be5ab7f995",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15783,
            "last_clean_end": 17882,
            "up_from": 18164,
            "up_thru": 18166,
            "down_at": 18206,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6885\/3262416",
            "cluster_addr": "10.20.0.12:6886\/3262416",
            "heartbeat_back_addr": "10.20.0.12:6887\/3262416",
            "heartbeat_front_addr": "10.20.0.12:6888\/3262416",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 14,
            "uuid": "f301705b-e725-443d-96e1-d9ec9aafe657",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15783,
            "last_clean_end": 17883,
            "up_from": 18164,
            "up_thru": 18164,
            "down_at": 18352,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6861\/3261624",
            "cluster_addr": "10.20.0.12:6862\/3261624",
            "heartbeat_back_addr": "10.20.0.12:6863\/3261624",
            "heartbeat_front_addr": "10.20.0.12:6864\/3261624",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 15,
            "uuid": "536bf483-10de-44b0-8e1e-4f349fbe572a",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15785,
            "last_clean_end": 17882,
            "up_from": 18168,
            "up_thru": 18168,
            "down_at": 18352,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6805\/3262650",
            "cluster_addr": "10.20.0.12:6806\/3262650",
            "heartbeat_back_addr": "10.20.0.12:6807\/3262650",
            "heartbeat_front_addr": "10.20.0.12:6808\/3262650",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 16,
            "uuid": "4185bd20-8eb0-4616-b36e-bacb181ae40e",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15783,
            "last_clean_end": 17882,
            "up_from": 18164,
            "up_thru": 18164,
            "down_at": 18352,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6849\/3261188",
            "cluster_addr": "10.20.0.12:6850\/3261188",
            "heartbeat_back_addr": "10.20.0.12:6851\/3261188",
            "heartbeat_front_addr": "10.20.0.12:6852\/3261188",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 17,
            "uuid": "a6f2f5b4-477f-48f9-9acf-d5b7a6c88b98",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15783,
            "last_clean_end": 17882,
            "up_from": 18164,
            "up_thru": 18166,
            "down_at": 18352,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6857\/3261610",
            "cluster_addr": "10.20.0.12:6858\/3261610",
            "heartbeat_back_addr": "10.20.0.12:6859\/3261610",
            "heartbeat_front_addr": "10.20.0.12:6860\/3261610",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 18,
            "uuid": "b31b0bd8-938a-496d-91bc-19bf4f794f82",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15783,
            "last_clean_end": 17883,
            "up_from": 18164,
            "up_thru": 18166,
            "down_at": 18352,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6869\/3261788",
            "cluster_addr": "10.20.0.12:6870\/3261788",
            "heartbeat_back_addr": "10.20.0.12:6871\/3261788",
            "heartbeat_front_addr": "10.20.0.12:6872\/3261788",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 19,
            "uuid": "d76b6bd5-1ef3-436c-a75d-3587c515eb56",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15783,
            "last_clean_end": 17882,
            "up_from": 18150,
            "up_thru": 18150,
            "down_at": 18203,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6865\/3261778",
            "cluster_addr": "10.20.0.12:6866\/3261778",
            "heartbeat_back_addr": "10.20.0.12:6867\/3261778",
            "heartbeat_front_addr": "10.20.0.12:6868\/3261778",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 20,
            "uuid": "8e4dd982-a4c5-4ca5-9fc5-243f55c4db57",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15783,
            "last_clean_end": 17882,
            "up_from": 18151,
            "up_thru": 18151,
            "down_at": 18239,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6881\/3262190",
            "cluster_addr": "10.20.0.12:6882\/3262190",
            "heartbeat_back_addr": "10.20.0.12:6883\/3262190",
            "heartbeat_front_addr": "10.20.0.12:6884\/3262190",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 21,
            "uuid": "760aaf28-0a34-4bbc-af0c-2654b0a43fff",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15783,
            "last_clean_end": 17882,
            "up_from": 18150,
            "up_thru": 18150,
            "down_at": 18203,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6845\/3261106",
            "cluster_addr": "10.20.0.12:6846\/3261106",
            "heartbeat_back_addr": "10.20.0.12:6847\/3261106",
            "heartbeat_front_addr": "10.20.0.12:6848\/3261106",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 22,
            "uuid": "40322a34-ab31-4760-b71e-a7672f812cb3",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15783,
            "last_clean_end": 17882,
            "up_from": 18161,
            "up_thru": 18161,
            "down_at": 18352,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6853\/3261379",
            "cluster_addr": "10.20.0.12:6854\/3261379",
            "heartbeat_back_addr": "10.20.0.12:6855\/3261379",
            "heartbeat_front_addr": "10.20.0.12:6856\/3261379",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 23,
            "uuid": "e1d81949-f4b5-4cf2-b6af-dccaaeb30ed7",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15783,
            "last_clean_end": 17882,
            "up_from": 18165,
            "up_thru": 18166,
            "down_at": 18352,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6873\/3262047",
            "cluster_addr": "10.20.0.12:6874\/3262047",
            "heartbeat_back_addr": "10.20.0.12:6875\/3262047",
            "heartbeat_front_addr": "10.20.0.12:6876\/3262047",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 24,
            "uuid": "ede77283-a423-4c6b-9c6e-b0e807c63cb5",
            "up": 1,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 18520,
            "last_clean_end": 18582,
            "up_from": 18589,
            "up_thru": 18592,
            "down_at": 18588,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6801\/3842583",
            "cluster_addr": "10.20.0.13:6839\/3842583",
            "heartbeat_back_addr": "10.20.0.13:6840\/3842583",
            "heartbeat_front_addr": "10.20.0.13:6841\/3842583",
            "state": [
                "exists",
                "up"
            ]
        },
        {
            "osd": 25,
            "uuid": "7cfe85f8-3ae9-493d-9801-025ff6c6265d",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15686,
            "last_clean_end": 17883,
            "up_from": 18426,
            "up_thru": 18426,
            "down_at": 18518,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6829\/3788954",
            "cluster_addr": "10.20.0.13:6830\/3788954",
            "heartbeat_back_addr": "10.20.0.13:6831\/3788954",
            "heartbeat_front_addr": "10.20.0.13:6832\/3788954",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 26,
            "uuid": "266f6d70-519f-4c24-bca2-236495a600a7",
            "up": 0,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15692,
            "last_clean_end": 17883,
            "up_from": 18420,
            "up_thru": 18421,
            "down_at": 18542,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6873\/3788357",
            "cluster_addr": "10.20.0.13:6874\/3788357",
            "heartbeat_back_addr": "10.20.0.13:6875\/3788357",
            "heartbeat_front_addr": "10.20.0.13:6876\/3788357",
            "state": [
                "exists"
            ]
        },
        {
            "osd": 27,
            "uuid": "68644fa9-9459-4db0-a6c9-01661645038b",
            "up": 0,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15689,
            "last_clean_end": 17883,
            "up_from": 18420,
            "up_thru": 18421,
            "down_at": 18527,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6813\/3788083",
            "cluster_addr": "10.20.0.13:6814\/3788083",
            "heartbeat_back_addr": "10.20.0.13:6815\/3788083",
            "heartbeat_front_addr": "10.20.0.13:6816\/3788083",
            "state": [
                "exists"
            ]
        },
        {
            "osd": 28,
            "uuid": "fc3d5749-7673-4100-a0d4-f25e9cc0bc88",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15688,
            "last_clean_end": 17882,
            "up_from": 18424,
            "up_thru": 18424,
            "down_at": 18518,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6825\/3789248",
            "cluster_addr": "10.20.0.13:6826\/3789248",
            "heartbeat_back_addr": "10.20.0.13:6827\/3789248",
            "heartbeat_front_addr": "10.20.0.13:6828\/3789248",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 29,
            "uuid": "cb5feda9-de3f-4e42-bb73-7945b4928b22",
            "up": 1,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 18511,
            "last_clean_end": 18534,
            "up_from": 18544,
            "up_thru": 18571,
            "down_at": 18543,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6817\/3815548",
            "cluster_addr": "10.20.0.13:6868\/3815548",
            "heartbeat_back_addr": "10.20.0.13:6869\/3815548",
            "heartbeat_front_addr": "10.20.0.13:6870\/3815548",
            "state": [
                "exists",
                "up"
            ]
        },
        {
            "osd": 30,
            "uuid": "ef1e65bb-a634-4096-9466-1262af55db01",
            "up": 1,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15693,
            "last_clean_end": 17882,
            "up_from": 18437,
            "up_thru": 18585,
            "down_at": 17884,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6833\/3787367",
            "cluster_addr": "10.20.0.13:6834\/3787367",
            "heartbeat_back_addr": "10.20.0.13:6835\/3787367",
            "heartbeat_front_addr": "10.20.0.13:6836\/3787367",
            "state": [
                "exists",
                "up"
            ]
        },
        {
            "osd": 31,
            "uuid": "3dad6393-67a8-43d4-ba8d-ffd320827396",
            "up": 1,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 18534,
            "last_clean_end": 18551,
            "up_from": 18562,
            "up_thru": 18581,
            "down_at": 18561,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6842\/3819894",
            "cluster_addr": "10.20.0.13:6864\/3819894",
            "heartbeat_back_addr": "10.20.0.13:6865\/3819894",
            "heartbeat_front_addr": "10.20.0.13:6871\/3819894",
            "state": [
                "exists",
                "up"
            ]
        },
        {
            "osd": 32,
            "uuid": "db6f3afa-53ed-453a-97e3-861e88cb818f",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15684,
            "last_clean_end": 17882,
            "up_from": 18419,
            "up_thru": 18420,
            "down_at": 18523,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6809\/3786362",
            "cluster_addr": "10.20.0.13:6810\/3786362",
            "heartbeat_back_addr": "10.20.0.13:6811\/3786362",
            "heartbeat_front_addr": "10.20.0.13:6812\/3786362",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 33,
            "uuid": "d5e59852-06b4-4a30-8c5e-ff7e328b5455",
            "up": 1,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 18508,
            "last_clean_end": 18534,
            "up_from": 18551,
            "up_thru": 18577,
            "down_at": 18550,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6809\/3817103",
            "cluster_addr": "10.20.0.13:6810\/3817103",
            "heartbeat_back_addr": "10.20.0.13:6811\/3817103",
            "heartbeat_front_addr": "10.20.0.13:6812\/3817103",
            "state": [
                "exists",
                "up"
            ]
        },
        {
            "osd": 34,
            "uuid": "f35a10c5-217a-4cfb-88b9-7334bda441b8",
            "up": 1,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 18521,
            "last_clean_end": 18572,
            "up_from": 18592,
            "up_thru": 18592,
            "down_at": 18591,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6805\/3842840",
            "cluster_addr": "10.20.0.13:6819\/3842840",
            "heartbeat_back_addr": "10.20.0.13:6820\/3842840",
            "heartbeat_front_addr": "10.20.0.13:6821\/3842840",
            "state": [
                "exists",
                "up"
            ]
        },
        {
            "osd": 35,
            "uuid": "335e797f-a390-4f08-9da6-9ab76ffb12ae",
            "up": 0,
            "in": 0,
            "weight": 0.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 15687,
            "last_clean_end": 17882,
            "up_from": 18424,
            "up_thru": 18424,
            "down_at": 18498,
            "lost_at": 0,
            "public_addr": "10.20.0.13:6861\/3787537",
            "cluster_addr": "10.20.0.13:6862\/3787537",
            "heartbeat_back_addr": "10.20.0.13:6863\/3787537",
            "heartbeat_front_addr": "10.20.0.13:6864\/3787537",
            "state": [
                "autoout",
                "exists"
            ]
        },
        {
            "osd": 36,
            "uuid": "33c11fa1-1b03-42e4-8296-dc55ba052b35",
            "up": 1,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 18599,
            "last_clean_end": 18600,
            "up_from": 18606,
            "up_thru": 18609,
            "down_at": 18605,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6829\/3479135",
            "cluster_addr": "10.20.0.12:6830\/3479135",
            "heartbeat_back_addr": "10.20.0.12:6831\/3479135",
            "heartbeat_front_addr": "10.20.0.12:6832\/3479135",
            "state": [
                "exists",
                "up"
            ]
        },
        {
            "osd": 37,
            "uuid": "a97a791a-fe36-438b-80e2-db2a0d5e8e27",
            "up": 1,
            "in": 1,
            "weight": 1.000000,
            "primary_affinity": 1.000000,
            "last_clean_begin": 18596,
            "last_clean_end": 18600,
            "up_from": 18609,
            "up_thru": 18609,
            "down_at": 18608,
            "lost_at": 0,
            "public_addr": "10.20.0.12:6889\/3481637",
            "cluster_addr": "10.20.0.12:6890\/3481637",
            "heartbeat_back_addr": "10.20.0.12:6891\/3481637",
            "heartbeat_front_addr": "10.20.0.12:6894\/3481637",
            "state": [
                "exists",
                "up"
            ]
        }
    ],
    "osd_xinfo": [
        {
            "osd": 0,
            "down_stamp": "2015-04-29 06:36:11.510911",
            "laggy_probability": 0.648970,
            "laggy_interval": 32,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 1,
            "down_stamp": "2015-04-29 06:35:39.342646",
            "laggy_probability": 0.627290,
            "laggy_interval": 30,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 2,
            "down_stamp": "2015-04-29 06:36:11.510911",
            "laggy_probability": 0.617737,
            "laggy_interval": 47,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 3,
            "down_stamp": "2015-04-29 06:36:06.479824",
            "laggy_probability": 0.660475,
            "laggy_interval": 28,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 4,
            "down_stamp": "2015-04-29 06:36:11.510911",
            "laggy_probability": 0.642416,
            "laggy_interval": 39,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 5,
            "down_stamp": "2015-04-29 06:35:39.342646",
            "laggy_probability": 0.617737,
            "laggy_interval": 10,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 6,
            "down_stamp": "2015-04-29 06:36:11.510911",
            "laggy_probability": 0.642416,
            "laggy_interval": 41,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 7,
            "down_stamp": "2015-04-29 06:38:05.135599",
            "laggy_probability": 0.642416,
            "laggy_interval": 66,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 8,
            "down_stamp": "2015-04-29 06:36:06.479824",
            "laggy_probability": 0.449691,
            "laggy_interval": 40,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 9,
            "down_stamp": "2015-04-29 06:35:59.293041",
            "laggy_probability": 0.643535,
            "laggy_interval": 16,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 10,
            "down_stamp": "2015-04-29 06:35:39.342646",
            "laggy_probability": 0.616699,
            "laggy_interval": 48,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 11,
            "down_stamp": "2015-04-29 06:38:34.318677",
            "laggy_probability": 0.422864,
            "laggy_interval": 22,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 12,
            "down_stamp": "2015-04-29 06:30:10.761975",
            "laggy_probability": 0.594721,
            "laggy_interval": 41,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 13,
            "down_stamp": "2015-04-29 06:30:08.803695",
            "laggy_probability": 0.601756,
            "laggy_interval": 29,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 14,
            "down_stamp": "2015-04-29 06:34:32.372745",
            "laggy_probability": 0.663821,
            "laggy_interval": 21,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 15,
            "down_stamp": "2015-04-29 06:34:32.372745",
            "laggy_probability": 0.661855,
            "laggy_interval": 18,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 16,
            "down_stamp": "2015-04-29 06:34:32.372745",
            "laggy_probability": 0.663889,
            "laggy_interval": 13,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 17,
            "down_stamp": "2015-04-29 06:34:32.372745",
            "laggy_probability": 0.541368,
            "laggy_interval": 50,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 18,
            "down_stamp": "2015-04-29 06:34:32.372745",
            "laggy_probability": 0.622311,
            "laggy_interval": 35,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 19,
            "down_stamp": "2015-04-29 06:30:02.919322",
            "laggy_probability": 0.651860,
            "laggy_interval": 20,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 20,
            "down_stamp": "2015-04-29 06:30:45.855010",
            "laggy_probability": 0.626463,
            "laggy_interval": 30,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 21,
            "down_stamp": "2015-04-29 06:30:02.919322",
            "laggy_probability": 0.653627,
            "laggy_interval": 9,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 22,
            "down_stamp": "2015-04-29 06:34:32.372745",
            "laggy_probability": 0.666169,
            "laggy_interval": 12,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 23,
            "down_stamp": "2015-04-29 06:34:32.372745",
            "laggy_probability": 0.594888,
            "laggy_interval": 45,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 24,
            "down_stamp": "2015-04-29 06:45:16.246255",
            "laggy_probability": 0.193668,
            "laggy_interval": 10,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 25,
            "down_stamp": "2015-04-29 06:40:01.722875",
            "laggy_probability": 0.567685,
            "laggy_interval": 36,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 26,
            "down_stamp": "2015-04-29 06:40:42.614902",
            "laggy_probability": 0.601077,
            "laggy_interval": 92,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 27,
            "down_stamp": "2015-04-29 06:40:14.223004",
            "laggy_probability": 0.557502,
            "laggy_interval": 49,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 28,
            "down_stamp": "2015-04-29 06:40:01.722875",
            "laggy_probability": 0.635835,
            "laggy_interval": 27,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 29,
            "down_stamp": "2015-04-29 06:40:43.818245",
            "laggy_probability": 0.251127,
            "laggy_interval": 17,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 30,
            "down_stamp": "2015-04-26 14:21:27.940755",
            "laggy_probability": 0.606626,
            "laggy_interval": 30,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 31,
            "down_stamp": "2015-04-29 06:41:16.132199",
            "laggy_probability": 0.145557,
            "laggy_interval": 7,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 32,
            "down_stamp": "2015-04-29 06:40:06.732853",
            "laggy_probability": 0.568801,
            "laggy_interval": 37,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 33,
            "down_stamp": "2015-04-29 06:40:52.364979",
            "laggy_probability": 0.273623,
            "laggy_interval": 21,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 34,
            "down_stamp": "2015-04-29 06:45:19.569449",
            "laggy_probability": 0.233592,
            "laggy_interval": 36,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 35,
            "down_stamp": "2015-04-29 06:39:41.678784",
            "laggy_probability": 0.492127,
            "laggy_interval": 32,
            "features": 1125899906842623,
            "old_weight": 65536
        },
        {
            "osd": 36,
            "down_stamp": "2015-04-29 06:49:26.582575",
            "laggy_probability": 0.048084,
            "laggy_interval": 0,
            "features": 1125899906842623,
            "old_weight": 0
        },
        {
            "osd": 37,
            "down_stamp": "2015-04-29 06:49:30.662891",
            "laggy_probability": 0.140542,
            "laggy_interval": 7,
            "features": 1125899906842623,
            "old_weight": 65536
        }
    ],
    "pg_temp": [
        {
            "pgid": "0.31",
            "osds": [
                24,
                37
            ]
        },
        {
            "pgid": "0.ac",
            "osds": [
                33,
                37
            ]
        },
        {
            "pgid": "0.d4",
            "osds": [
                24,
                36
            ]
        },
        {
            "pgid": "0.169",
            "osds": [
                24,
                37
            ]
        },
        {
            "pgid": "0.1b8",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "0.1d2",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "0.1f0",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "0.600",
            "osds": [
                37,
                17,
                36
            ]
        },
        {
            "pgid": "0.855",
            "osds": [
                29,
                37
            ]
        },
        {
            "pgid": "0.87a",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "0.8d8",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "0.97a",
            "osds": [
                37,
                22,
                36
            ]
        },
        {
            "pgid": "0.a12",
            "osds": [
                36,
                28,
                37
            ]
        },
        {
            "pgid": "0.a1c",
            "osds": [
                37,
                14,
                36
            ]
        },
        {
            "pgid": "0.ad4",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "0.aef",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "0.b30",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "0.b7f",
            "osds": [
                37,
                15,
                36
            ]
        },
        {
            "pgid": "0.ba5",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "0.bc3",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "0.d9c",
            "osds": [
                37,
                9,
                36
            ]
        },
        {
            "pgid": "0.e16",
            "osds": [
                33,
                37
            ]
        },
        {
            "pgid": "0.e71",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "0.eab",
            "osds": [
                24,
                36
            ]
        },
        {
            "pgid": "0.ef9",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "0.f09",
            "osds": [
                36,
                35,
                37
            ]
        },
        {
            "pgid": "0.f32",
            "osds": [
                37,
                26,
                36
            ]
        },
        {
            "pgid": "0.f37",
            "osds": [
                37,
                18,
                36
            ]
        },
        {
            "pgid": "0.fc0",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "0.fd4",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "0.fdf",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "1.31",
            "osds": [
                24,
                37
            ]
        },
        {
            "pgid": "1.39",
            "osds": [
                33,
                37
            ]
        },
        {
            "pgid": "1.4c",
            "osds": [
                30,
                37
            ]
        },
        {
            "pgid": "1.68",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "1.6b",
            "osds": [
                37,
                15,
                36
            ]
        },
        {
            "pgid": "1.9f",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "1.dd",
            "osds": [
                33,
                37
            ]
        },
        {
            "pgid": "1.174",
            "osds": [
                37,
                23,
                36
            ]
        },
        {
            "pgid": "1.178",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "1.1c2",
            "osds": [
                30,
                37
            ]
        },
        {
            "pgid": "1.1f8",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "1.1fc",
            "osds": [
                37,
                15,
                36
            ]
        },
        {
            "pgid": "1.40a",
            "osds": [
                37,
                17,
                36
            ]
        },
        {
            "pgid": "1.4b3",
            "osds": [
                33,
                37
            ]
        },
        {
            "pgid": "1.53f",
            "osds": [
                37,
                16,
                36
            ]
        },
        {
            "pgid": "1.5cc",
            "osds": [
                37,
                16,
                36
            ]
        },
        {
            "pgid": "1.82b",
            "osds": [
                36,
                25,
                37
            ]
        },
        {
            "pgid": "1.90d",
            "osds": [
                37,
                15,
                36
            ]
        },
        {
            "pgid": "1.9ec",
            "osds": [
                36,
                32,
                37
            ]
        },
        {
            "pgid": "1.9ff",
            "osds": [
                34,
                37
            ]
        },
        {
            "pgid": "1.a6d",
            "osds": [
                24,
                36
            ]
        },
        {
            "pgid": "1.b76",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "1.b8a",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "1.c7a",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "1.cb9",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "1.ced",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "1.d05",
            "osds": [
                24,
                37
            ]
        },
        {
            "pgid": "1.d30",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "1.d7a",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "1.ddf",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "1.e0f",
            "osds": [
                33,
                37
            ]
        },
        {
            "pgid": "1.e4f",
            "osds": [
                29,
                37
            ]
        },
        {
            "pgid": "1.e97",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "1.efd",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "1.f2c",
            "osds": [
                37,
                22,
                36
            ]
        },
        {
            "pgid": "1.f3d",
            "osds": [
                30,
                37
            ]
        },
        {
            "pgid": "1.f4b",
            "osds": [
                34,
                31,
                36
            ]
        },
        {
            "pgid": "1.f9a",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "1.fca",
            "osds": [
                30,
                37
            ]
        },
        {
            "pgid": "2.76",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "2.c4",
            "osds": [
                34,
                37
            ]
        },
        {
            "pgid": "2.150",
            "osds": [
                33,
                24
            ]
        },
        {
            "pgid": "2.159",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "2.1b4",
            "osds": [
                34,
                37
            ]
        },
        {
            "pgid": "2.1cf",
            "osds": [
                29,
                24
            ]
        },
        {
            "pgid": "2.1fa",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "2.545",
            "osds": [
                33,
                24
            ]
        },
        {
            "pgid": "2.7e4",
            "osds": [
                31,
                24
            ]
        },
        {
            "pgid": "2.ab7",
            "osds": [
                29,
                24
            ]
        },
        {
            "pgid": "2.d25",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "2.dbd",
            "osds": [
                36,
                24
            ]
        },
        {
            "pgid": "2.e69",
            "osds": [
                34,
                24
            ]
        },
        {
            "pgid": "2.e8d",
            "osds": [
                31,
                24
            ]
        },
        {
            "pgid": "2.ef9",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "2.f50",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "2.f5f",
            "osds": [
                30,
                24
            ]
        },
        {
            "pgid": "2.f9b",
            "osds": [
                30,
                24
            ]
        },
        {
            "pgid": "2.fea",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "3.64",
            "osds": [
                37,
                18,
                36
            ]
        },
        {
            "pgid": "3.c6",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "3.f8",
            "osds": [
                24,
                36
            ]
        },
        {
            "pgid": "3.194",
            "osds": [
                24,
                37
            ]
        },
        {
            "pgid": "3.1a9",
            "osds": [
                36,
                27,
                37
            ]
        },
        {
            "pgid": "3.686",
            "osds": [
                37,
                16,
                36
            ]
        },
        {
            "pgid": "3.98f",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "3.a88",
            "osds": [
                37,
                17,
                36
            ]
        },
        {
            "pgid": "3.acb",
            "osds": [
                37,
                15,
                36
            ]
        },
        {
            "pgid": "3.ae0",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "3.b74",
            "osds": [
                37,
                18,
                36
            ]
        },
        {
            "pgid": "3.c0f",
            "osds": [
                37,
                14,
                36
            ]
        },
        {
            "pgid": "3.c50",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "3.c65",
            "osds": [
                37,
                9,
                36
            ]
        },
        {
            "pgid": "3.d05",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "3.d8f",
            "osds": [
                0,
                37,
                36
            ]
        },
        {
            "pgid": "3.de5",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "3.edd",
            "osds": [
                37,
                1,
                36
            ]
        },
        {
            "pgid": "3.ef5",
            "osds": [
                34,
                31,
                36
            ]
        },
        {
            "pgid": "3.ef6",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "3.ef7",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "3.f01",
            "osds": [
                37,
                26,
                36
            ]
        },
        {
            "pgid": "3.f34",
            "osds": [
                30,
                37
            ]
        },
        {
            "pgid": "3.f35",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "3.f47",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "3.f8f",
            "osds": [
                33,
                37
            ]
        },
        {
            "pgid": "3.fb6",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "3.fdb",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "4.5",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "4.34",
            "osds": [
                30,
                37
            ]
        },
        {
            "pgid": "4.3f",
            "osds": [
                29,
                37
            ]
        },
        {
            "pgid": "4.84",
            "osds": [
                34,
                37
            ]
        },
        {
            "pgid": "4.93",
            "osds": [
                37,
                32,
                36
            ]
        },
        {
            "pgid": "4.156",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "4.165",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "4.17b",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "4.17d",
            "osds": [
                24,
                37
            ]
        },
        {
            "pgid": "4.17e",
            "osds": [
                30,
                37
            ]
        },
        {
            "pgid": "4.182",
            "osds": [
                29,
                37
            ]
        },
        {
            "pgid": "4.194",
            "osds": [
                37,
                26,
                36
            ]
        },
        {
            "pgid": "4.1a3",
            "osds": [
                29,
                37
            ]
        },
        {
            "pgid": "4.1aa",
            "osds": [
                37,
                18,
                36
            ]
        },
        {
            "pgid": "4.1c1",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "4.1c2",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "4.1d6",
            "osds": [
                37,
                15,
                36
            ]
        },
        {
            "pgid": "4.649",
            "osds": [
                37,
                7,
                36
            ]
        },
        {
            "pgid": "4.703",
            "osds": [
                36,
                32,
                37
            ]
        },
        {
            "pgid": "4.73d",
            "osds": [
                37,
                17,
                36
            ]
        },
        {
            "pgid": "4.787",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "4.90e",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "4.a5a",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "4.ab2",
            "osds": [
                32,
                37,
                36
            ]
        },
        {
            "pgid": "4.ab3",
            "osds": [
                0,
                37,
                36
            ]
        },
        {
            "pgid": "4.ae8",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "4.bc7",
            "osds": [
                24,
                36
            ]
        },
        {
            "pgid": "4.c04",
            "osds": [
                33,
                24,
                37
            ]
        },
        {
            "pgid": "4.c10",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "4.c33",
            "osds": [
                37,
                15,
                36
            ]
        },
        {
            "pgid": "4.c46",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "4.d1b",
            "osds": [
                9,
                36,
                37
            ]
        },
        {
            "pgid": "4.d66",
            "osds": [
                37,
                23,
                36
            ]
        },
        {
            "pgid": "4.d73",
            "osds": [
                9,
                37,
                36
            ]
        },
        {
            "pgid": "4.dc4",
            "osds": [
                37,
                17,
                36
            ]
        },
        {
            "pgid": "4.e1a",
            "osds": [
                24,
                36
            ]
        },
        {
            "pgid": "4.e3c",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "4.e60",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "4.e80",
            "osds": [
                37,
                8,
                36
            ]
        },
        {
            "pgid": "4.e92",
            "osds": [
                24,
                37
            ]
        },
        {
            "pgid": "4.eb6",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "4.f08",
            "osds": [
                37,
                34
            ]
        },
        {
            "pgid": "4.f2e",
            "osds": [
                33,
                37
            ]
        },
        {
            "pgid": "4.f44",
            "osds": [
                37,
                2,
                36
            ]
        },
        {
            "pgid": "4.f46",
            "osds": [
                29,
                37
            ]
        },
        {
            "pgid": "4.f6f",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "4.fbc",
            "osds": [
                29,
                37
            ]
        },
        {
            "pgid": "4.ff1",
            "osds": [
                37,
                14,
                36
            ]
        },
        {
            "pgid": "4.ff6",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "4.ffc",
            "osds": [
                29,
                37
            ]
        },
        {
            "pgid": "6.62",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "6.90",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "6.191",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "6.2f5",
            "osds": [
                37,
                22,
                36
            ]
        },
        {
            "pgid": "6.6b8",
            "osds": [
                37,
                11,
                36
            ]
        },
        {
            "pgid": "6.6d1",
            "osds": [
                0,
                37,
                36
            ]
        },
        {
            "pgid": "6.809",
            "osds": [
                37,
                7,
                36
            ]
        },
        {
            "pgid": "6.968",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "6.996",
            "osds": [
                37,
                23,
                36
            ]
        },
        {
            "pgid": "6.99e",
            "osds": [
                37,
                17,
                36
            ]
        },
        {
            "pgid": "6.a2a",
            "osds": [
                37,
                14,
                36
            ]
        },
        {
            "pgid": "6.a35",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "6.aa5",
            "osds": [
                37,
                15,
                36
            ]
        },
        {
            "pgid": "6.aef",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "6.b3b",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "6.b41",
            "osds": [
                2,
                37,
                36
            ]
        },
        {
            "pgid": "6.bdc",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "6.c6b",
            "osds": [
                27,
                37,
                36
            ]
        },
        {
            "pgid": "6.cb1",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "6.cbb",
            "osds": [
                24,
                36
            ]
        },
        {
            "pgid": "6.dbd",
            "osds": [
                37,
                27,
                36
            ]
        },
        {
            "pgid": "6.e9f",
            "osds": [
                37,
                18,
                36
            ]
        },
        {
            "pgid": "6.ec5",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "6.f26",
            "osds": [
                33,
                37
            ]
        },
        {
            "pgid": "6.f7b",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "6.f8b",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "6.fda",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "6.fdc",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "6.fe0",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "7.3b",
            "osds": [
                24,
                37
            ]
        },
        {
            "pgid": "7.52",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "7.7e",
            "osds": [
                30
            ]
        },
        {
            "pgid": "7.87",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "7.a5",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "7.ea",
            "osds": [
                37,
                23,
                36
            ]
        },
        {
            "pgid": "7.161",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "7.163",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "7.1d4",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "7.1da",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "7.1dd",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "7.1f0",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "7.1fd",
            "osds": [
                34,
                37,
                24
            ]
        },
        {
            "pgid": "7.374",
            "osds": [
                37,
                22,
                36
            ]
        },
        {
            "pgid": "7.5ea",
            "osds": [
                37,
                23,
                36
            ]
        },
        {
            "pgid": "7.7f4",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "7.a31",
            "osds": [
                37,
                22,
                36
            ]
        },
        {
            "pgid": "7.a93",
            "osds": [
                24,
                36
            ]
        },
        {
            "pgid": "7.b2b",
            "osds": [
                30,
                37
            ]
        },
        {
            "pgid": "7.c34",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "7.c50",
            "osds": [
                24,
                36
            ]
        },
        {
            "pgid": "7.cd9",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "7.d1b",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "7.d66",
            "osds": [
                34,
                36
            ]
        },
        {
            "pgid": "7.e20",
            "osds": [
                30,
                37
            ]
        },
        {
            "pgid": "7.e8f",
            "osds": [
                29,
                36
            ]
        },
        {
            "pgid": "7.eaa",
            "osds": [
                37,
                17,
                36
            ]
        },
        {
            "pgid": "7.f0b",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "7.f48",
            "osds": [
                30,
                37
            ]
        },
        {
            "pgid": "7.fc2",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "7.fdd",
            "osds": [
                37,
                24
            ]
        },
        {
            "pgid": "8.11",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "8.12",
            "osds": [
                33,
                31
            ]
        },
        {
            "pgid": "8.18",
            "osds": [
                24,
                31
            ]
        },
        {
            "pgid": "8.1d",
            "osds": [
                30,
                31
            ]
        },
        {
            "pgid": "8.37",
            "osds": [
                31,
                34
            ]
        },
        {
            "pgid": "8.5a",
            "osds": [
                34,
                33
            ]
        },
        {
            "pgid": "8.7c",
            "osds": [
                31,
                36
            ]
        },
        {
            "pgid": "8.c0",
            "osds": [
                24,
                34
            ]
        },
        {
            "pgid": "8.c2",
            "osds": [
                34,
                24
            ]
        },
        {
            "pgid": "8.d3",
            "osds": [
                37,
                17,
                36
            ]
        },
        {
            "pgid": "8.e3",
            "osds": [
                29,
                24
            ]
        },
        {
            "pgid": "8.ed",
            "osds": [
                29,
                34
            ]
        },
        {
            "pgid": "8.103",
            "osds": [
                29,
                33
            ]
        },
        {
            "pgid": "8.146",
            "osds": [
                29,
                30
            ]
        },
        {
            "pgid": "8.160",
            "osds": [
                31,
                33
            ]
        },
        {
            "pgid": "8.16f",
            "osds": [
                29,
                24
            ]
        },
        {
            "pgid": "8.171",
            "osds": [
                24,
                37
            ]
        },
        {
            "pgid": "8.175",
            "osds": [
                34,
                30
            ]
        },
        {
            "pgid": "8.17e",
            "osds": [
                31,
                37
            ]
        },
        {
            "pgid": "8.182",
            "osds": [
                34,
                31
            ]
        },
        {
            "pgid": "8.18a",
            "osds": [
                30,
                36
            ]
        },
        {
            "pgid": "8.1a1",
            "osds": [
                33,
                34
            ]
        },
        {
            "pgid": "8.1a4",
            "osds": [
                24,
                30
            ]
        },
        {
            "pgid": "8.1ae",
            "osds": [
                30,
                31
            ]
        },
        {
            "pgid": "8.1b4",
            "osds": [
                33,
                31
            ]
        },
        {
            "pgid": "8.1c3",
            "osds": [
                30,
                24
            ]
        },
        {
            "pgid": "8.1c7",
            "osds": [
                33,
                31
            ]
        },
        {
            "pgid": "8.1ce",
            "osds": [
                24,
                36
            ]
        },
        {
            "pgid": "8.1d0",
            "osds": [
                29,
                30
            ]
        },
        {
            "pgid": "8.1f1",
            "osds": [
                31,
                29
            ]
        },
        {
            "pgid": "8.1f3",
            "osds": [
                24,
                29
            ]
        },
        {
            "pgid": "8.1f5",
            "osds": [
                34,
                29
            ]
        },
        {
            "pgid": "8.240",
            "osds": [
                30,
                24
            ]
        },
        {
            "pgid": "8.25d",
            "osds": [
                34,
                24
            ]
        },
        {
            "pgid": "8.26b",
            "osds": [
                31,
                33
            ]
        },
        {
            "pgid": "8.2ad",
            "osds": [
                34,
                24
            ]
        },
        {
            "pgid": "8.2c5",
            "osds": [
                31,
                24
            ]
        },
        {
            "pgid": "8.2e3",
            "osds": [
                33,
                36
            ]
        },
        {
            "pgid": "8.31b",
            "osds": [
                31,
                24
            ]
        },
        {
            "pgid": "8.36e",
            "osds": [
                37,
                11,
                36
            ]
        },
        {
            "pgid": "8.3c1",
            "osds": [
                34,
                24
            ]
        },
        {
            "pgid": "8.3c7",
            "osds": [
                33,
                24
            ]
        },
        {
            "pgid": "16.32a",
            "osds": [
                33,
                36
            ]
        }
    ],
    "primary_temp": [],
    "blacklist": [
        "2015-04-29 07:13:18.543543",
        "2015-04-29 07:11:05.620929",
        "2015-04-29 07:07:39.090155"
    ],
    "erasure_code_profiles": {
        "default": {
            "directory": "\/usr\/lib\/ceph\/erasure-code",
            "k": "2",
            "m": "1",
            "plugin": "jerasure",
            "technique": "reed_sol_van"
        }
    }
}

[-- Attachment #3: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
       [not found]                                           ` <81216125e573cf00539f61cc090b282b-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
@ 2015-04-29 15:38                                             ` Sage Weil
       [not found]                                               ` <alpine.DEB.2.00.1504290838060.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2015-04-29 15:38 UTC (permalink / raw)
  To: Tuomas Juntunen
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA

On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
> Hi
> 
> I updated that version and it seems that something did happen, the osd's
> stayed up for a while and 'ceph status' got updated. But then in couple of
> minutes, they all went down the same way.
> 
> I have attached new 'ceph osd dump -f json-pretty' and got a new log from
> one of the osd's with osd debug = 20,
> http://beta.xaasbox.com/ceph/ceph-osd.15.log

Sam mentioned that you had said earlier that this was not critical data?  
If not, I think the simplest thing is to just drop those pools.  The 
important thing (from my perspective at least :) is that we understand the 
root cause and can prevent this in the future.

sage


> 
> Thank you!
> 
> Br,
> Tuomas
> 
> 
> 
> -----Original Message-----
> From: Sage Weil [mailto:sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org] 
> Sent: 28. huhtikuuta 2015 23:57
> To: Tuomas Juntunen
> Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org; ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
> operations most of the OSD's went down
> 
> Hi Tuomas,
> 
> I've pushed an updated wip-hammer-snaps branch.  Can you please try it?  
> The build will appear here
> 
> 	
> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
> 2eb514067f72afda11bcde286
> 
> (or a similar url; adjust for your distro).
> 
> Thanks!
> sage
> 
> 
> On Tue, 28 Apr 2015, Sage Weil wrote:
> 
> > [adding ceph-devel]
> > 
> > Okay, I see the problem.  This seems to be unrelated ot the giant -> 
> > hammer move... it's a result of the tiering changes you made:
> > 
> > > > > > > > The following:
> > > > > > > > 
> > > > > > > > ceph osd tier add img images --force-nonempty ceph osd 
> > > > > > > > tier cache-mode images forward ceph osd tier set-overlay 
> > > > > > > > img images
> > 
> > Specifically, --force-nonempty bypassed important safety checks.
> > 
> > 1. images had snapshots (and removed_snaps)
> > 
> > 2. images was added as a tier *of* img, and img's removed_snaps was 
> > copied to images, clobbering the removed_snaps value (see
> > OSDMap::Incremental::propagate_snaps_to_tiers)
> > 
> > 3. tiering relation was undone, but removed_snaps was still gone
> > 
> > 4. on OSD startup, when we load the PG, removed_snaps is initialized 
> > with the older map.  later, in PGPool::update(), we assume that 
> > removed_snaps alwasy grows (never shrinks) and we trigger an assert.
> > 
> > To fix this I think we need to do 2 things:
> > 
> > 1. make the OSD forgiving out removed_snaps getting smaller.  This is 
> > probably a good thing anyway: once we know snaps are removed on all 
> > OSDs we can prune the interval_set in the OSDMap.  Maybe.
> > 
> > 2. Fix the mon to prevent this from happening, *even* when 
> > --force-nonempty is specified.  (This is the root cause.)
> > 
> > I've opened http://tracker.ceph.com/issues/11493 to track this.
> > 
> > sage
> > 
> >     
> > 
> > > > > > > > 
> > > > > > > > Idea was to make images as a tier to img, move data to img 
> > > > > > > > then change
> > > > > > > clients to use the new img pool.
> > > > > > > > 
> > > > > > > > Br,
> > > > > > > > Tuomas
> > > > > > > > 
> > > > > > > > > Can you explain exactly what you mean by:
> > > > > > > > >
> > > > > > > > > "Also I created one pool for tier to be able to move 
> > > > > > > > > data without
> > > > > > > outage."
> > > > > > > > >
> > > > > > > > > -Sam
> > > > > > > > > ----- Original Message -----
> > > > > > > > > From: "tuomas juntunen" 
> > > > > > > > > <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > > > To: "Ian Colle" <icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > > > > > > Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer 
> > > > > > > > > and after some basic operations most of the OSD's went 
> > > > > > > > > down
> > > > > > > > >
> > > > > > > > > Hi
> > > > > > > > >
> > > > > > > > > Any solution for this yet?
> > > > > > > > >
> > > > > > > > > Br,
> > > > > > > > > Tuomas
> > > > > > > > >
> > > > > > > > >> It looks like you may have hit
> > > > > > > > >> http://tracker.ceph.com/issues/7915
> > > > > > > > >>
> > > > > > > > >> Ian R. Colle
> > > > > > > > >> Global Director
> > > > > > > > >> of Software Engineering Red Hat (Inktank is now part of 
> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
> > > > > > > > >> http://www.twitter.com/ircolle
> > > > > > > > >> Cell: +1.303.601.7713
> > > > > > > > >> Email: icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > > > > > > > >>
> > > > > > > > >> ----- Original Message -----
> > > > > > > > >> From: "tuomas juntunen" 
> > > > > > > > >> <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > > >> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> > > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and 
> > > > > > > > >> after some basic operations most of the OSD's went down
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> > > > > > > > >>
> > > > > > > > >> Then created new pools and deleted some old ones. Also 
> > > > > > > > >> I created one pool for tier to be able to move data 
> > > > > > > > >> without
> > > outage.
> > > > > > > > >>
> > > > > > > > >> After these operations all but 10 OSD's are down and 
> > > > > > > > >> creating this kind of messages to logs, I get more than 
> > > > > > > > >> 100gb of these in a
> > > > > > night:
> > > > > > > > >>
> > > > > > > > >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23
> > > pg_epoch:
> > > > 
> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 
> > > > > > > > >> n=0
> > > > > > > > >> ec=1 les/c
> > > > > > > > >> 16609/16659
> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > > >> pi=15659-16589/42
> > > > > > > > >> crt=8480'7 lcod
> > > > > > > > >> 0'0 inactive NOTIFY] enter Started
> > > > > > > > >>    -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 
> > > > > > > > >> n=0
> > > > > > > > >> ec=1 les/c
> > > > > > > > >> 16609/16659
> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > > >> pi=15659-16589/42
> > > > > > > > >> crt=8480'7 lcod
> > > > > > > > >> 0'0 inactive NOTIFY] enter Start
> > > > > > > > >>    -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 
> > > > > > > > >> n=0
> > > > > > > > >> ec=1 les/c
> > > > > > > > >> 16609/16659
> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > > >> pi=15659-16589/42
> > > > > > > > >> crt=8480'7 lcod
> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> > > > > > > > >>    -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 
> > > > > > > > >> n=0
> > > > > > > > >> ec=1 les/c
> > > > > > > > >> 16609/16659
> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > > >> pi=15659-16589/42
> > > > > > > > >> crt=8480'7 lcod
> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> > > > > > > > >>    -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 
> > > > > > > > >> n=0
> > > > > > > > >> ec=1 les/c
> > > > > > > > >> 16609/16659
> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > > >> pi=15659-16589/42
> > > > > > > > >> crt=8480'7 lcod
> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> > > > > > > > >>    -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
> > > > > > > > >>    -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > > >> inactive NOTIFY] enter Started
> > > > > > > > >>    -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > > >> inactive NOTIFY] enter Start
> > > > > > > > >>    -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > > >> inactive NOTIFY]
> > > > > > > > >> state<Start>: transitioning to Stray
> > > > > > > > >>    -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
> > > > > > > > >>     -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 
> > > > > > > > >> inactive NOTIFY] enter Started/Stray
> > > > > > > > >>     -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
> > > > > > > > >>     -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > > >> 0'0 inactive] enter Started
> > > > > > > > >>     -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > > >> 0'0 inactive] enter Start
> > > > > > > > >>     -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > > >> 0'0 inactive]
> > > > > > > > >> state<Start>: transitioning to Primary
> > > > > > > > >>     -4> 2015-04-27 10:17:08.809479 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
> > > > > > > > >>     -3> 2015-04-27 10:17:08.809492 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > > >> 0'0 inactive] enter Started/Primary
> > > > > > > > >>     -2> 2015-04-27 10:17:08.809502 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
> > > > > > > > >>     -1> 2015-04-27 10:17:08.809513 7fd8e748d700  5 
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > > 
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 
> > > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
> > > > > > > > >>      0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> > > > > > > ./include/interval_set.h:
> > > > > > > > >> In
> > > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
> > > snapid_t]' 
> > > > > > > > >> thread
> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> > > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >= 
> > > > > > > > >> 0)
> > > > > > > > >>
> > > > > > > > >>  ceph version 0.94.1
> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> > > > > > > > >>  1: (ceph::__ceph_assert_fail(char const*, char const*, 
> > > > > > > > >> int, char
> > > > > > > > >> const*)+0x8b)
> > > > > > > > >> [0xbc271b]
> > > > > > > > >>  2: 
> > > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
> > > > > > > > >> >
> > > > > > > > >> const&)+0xb0) [0x82cd50]
> > > > > > > > >>  3: (PGPool::update(std::tr1::shared_ptr<OSDMap
> > > > > > > > >> const>)+0x52e) [0x80113e]
> > > > > > > > >>  4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>, 
> > > > > > > > >> const>std::vector<int,
> > > > > > > > >> std::allocator<int> >&, int, std::vector<int, 
> > > > > > > > >> std::allocator<int>
> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> > > > > > > > >>  5: (OSD::advance_pg(unsigned int, PG*, 
> > > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*, 
> > > > > > > > >> std::set<boost::intrusive_ptr<PG>,
> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >, 
> > > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3) 
> > > > > > > > >> [0x6b0e43]
> > > > > > > > >>  6: (OSD::process_peering_events(std::list<PG*,
> > > > > > > > >> std::allocator<PG*>
> > > > > > > > >> > const&,
> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> > > > > > > > >>  7: (OSD::PeeringWQ::_process(std::list<PG*,
> > > > > > > > >> std::allocator<PG*>
> > > > > > > > >> > const&,
> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> > > > > > > > >>  8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> > > > > > > > >> [0xbb38ae]
> > > > > > > > >>  9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> > > > > > > > >>  10: (()+0x8182) [0x7fd906946182]
> > > > > > > > >>  11: (clone()+0x6d) [0x7fd904eb147d]
> > > > > > > > >>
> > > > > > > > >> Also by monitoring (ceph -w) I get the following 
> > > > > > > > >> messages, also lots of
> > > > > > > them.
> > > > > > > > >>
> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> > > > > > > 10.20.0.13:0/1174409'
> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush 
> > > > > > > > >> create-or-move",
> > > > "args":
> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]:
> 
> > > > > > > > >> dispatch
> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> > > > > > > 10.20.0.13:0/1174483'
> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush 
> > > > > > > > >> create-or-move",
> > > > "args":
> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]:
> 
> > > > > > > > >> dispatch
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are 
> > > > > > > > >> also mons and mds's to save servers. All run Ubuntu
> 14.04.2.
> > > > > > > > >>
> > > > > > > > >> I have pretty much tried everything I could think of.
> > > > > > > > >>
> > > > > > > > >> Restarting daemons doesn't help.
> > > > > > > > >>
> > > > > > > > >> Any help would be appreciated. I can also provide more 
> > > > > > > > >> logs if necessary. They just seem to get pretty large 
> > > > > > > > >> in few
> > > moments.
> > > > > > > > >>
> > > > > > > > >> Thank you
> > > > > > > > >> Tuomas
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> _______________________________________________
> > > > > > > > >> ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > ceph-users mailing list
> > > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > 
> > > > > > > > 
> > > > > > > > _______________________________________________
> > > > > > > > ceph-users mailing list
> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > _______________________________________________
> > > > > > > > ceph-users mailing list
> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > 
> > > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
       [not found]                                               ` <alpine.DEB.2.00.1504290838060.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2015-04-30  3:31                                                 ` tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g
       [not found]                                                   ` <928ebb7320e4eb07f14071e997ed7be2-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g @ 2015-04-30  3:31 UTC (permalink / raw)
  To: Sage Weil
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA

Hey

Yes I can drop the images data, you think this will fix it?


Br,

Tuomas

> On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
>> Hi
>>
>> I updated that version and it seems that something did happen, the osd's
>> stayed up for a while and 'ceph status' got updated. But then in couple of
>> minutes, they all went down the same way.
>>
>> I have attached new 'ceph osd dump -f json-pretty' and got a new log from
>> one of the osd's with osd debug = 20,
>> http://beta.xaasbox.com/ceph/ceph-osd.15.log
>
> Sam mentioned that you had said earlier that this was not critical data?
> If not, I think the simplest thing is to just drop those pools.  The
> important thing (from my perspective at least :) is that we understand the
> root cause and can prevent this in the future.
>
> sage
>
>
>>
>> Thank you!
>>
>> Br,
>> Tuomas
>>
>>
>>
>> -----Original Message-----
>> From: Sage Weil [mailto:sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org]
>> Sent: 28. huhtikuuta 2015 23:57
>> To: Tuomas Juntunen
>> Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org; ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
>> operations most of the OSD's went down
>>
>> Hi Tuomas,
>>
>> I've pushed an updated wip-hammer-snaps branch.  Can you please try it?
>> The build will appear here
>>
>>
>> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
>> 2eb514067f72afda11bcde286
>>
>> (or a similar url; adjust for your distro).
>>
>> Thanks!
>> sage
>>
>>
>> On Tue, 28 Apr 2015, Sage Weil wrote:
>>
>> > [adding ceph-devel]
>> >
>> > Okay, I see the problem.  This seems to be unrelated ot the giant ->
>> > hammer move... it's a result of the tiering changes you made:
>> >
>> > > > > > > > The following:
>> > > > > > > >
>> > > > > > > > ceph osd tier add img images --force-nonempty ceph osd
>> > > > > > > > tier cache-mode images forward ceph osd tier set-overlay
>> > > > > > > > img images
>> >
>> > Specifically, --force-nonempty bypassed important safety checks.
>> >
>> > 1. images had snapshots (and removed_snaps)
>> >
>> > 2. images was added as a tier *of* img, and img's removed_snaps was
>> > copied to images, clobbering the removed_snaps value (see
>> > OSDMap::Incremental::propagate_snaps_to_tiers)
>> >
>> > 3. tiering relation was undone, but removed_snaps was still gone
>> >
>> > 4. on OSD startup, when we load the PG, removed_snaps is initialized
>> > with the older map.  later, in PGPool::update(), we assume that
>> > removed_snaps alwasy grows (never shrinks) and we trigger an assert.
>> >
>> > To fix this I think we need to do 2 things:
>> >
>> > 1. make the OSD forgiving out removed_snaps getting smaller.  This is
>> > probably a good thing anyway: once we know snaps are removed on all
>> > OSDs we can prune the interval_set in the OSDMap.  Maybe.
>> >
>> > 2. Fix the mon to prevent this from happening, *even* when
>> > --force-nonempty is specified.  (This is the root cause.)
>> >
>> > I've opened http://tracker.ceph.com/issues/11493 to track this.
>> >
>> > sage
>> >
>> >
>> >
>> > > > > > > >
>> > > > > > > > Idea was to make images as a tier to img, move data to img
>> > > > > > > > then change
>> > > > > > > clients to use the new img pool.
>> > > > > > > >
>> > > > > > > > Br,
>> > > > > > > > Tuomas
>> > > > > > > >
>> > > > > > > > > Can you explain exactly what you mean by:
>> > > > > > > > >
>> > > > > > > > > "Also I created one pool for tier to be able to move
>> > > > > > > > > data without
>> > > > > > > outage."
>> > > > > > > > >
>> > > > > > > > > -Sam
>> > > > > > > > > ----- Original Message -----
>> > > > > > > > > From: "tuomas juntunen"
>> > > > > > > > > <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
>> > > > > > > > > To: "Ian Colle" <icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> > > > > > > > > Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
>> > > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer
>> > > > > > > > > and after some basic operations most of the OSD's went
>> > > > > > > > > down
>> > > > > > > > >
>> > > > > > > > > Hi
>> > > > > > > > >
>> > > > > > > > > Any solution for this yet?
>> > > > > > > > >
>> > > > > > > > > Br,
>> > > > > > > > > Tuomas
>> > > > > > > > >
>> > > > > > > > >> It looks like you may have hit
>> > > > > > > > >> http://tracker.ceph.com/issues/7915
>> > > > > > > > >>
>> > > > > > > > >> Ian R. Colle
>> > > > > > > > >> Global Director
>> > > > > > > > >> of Software Engineering Red Hat (Inktank is now part of
>> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
>> > > > > > > > >> http://www.twitter.com/ircolle
>> > > > > > > > >> Cell: +1.303.601.7713
>> > > > > > > > >> Email: icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
>> > > > > > > > >>
>> > > > > > > > >> ----- Original Message -----
>> > > > > > > > >> From: "tuomas juntunen"
>> > > > > > > > >> <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
>> > > > > > > > >> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
>> > > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
>> > > > > > > > >> after some basic operations most of the OSD's went down
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>> > > > > > > > >>
>> > > > > > > > >> Then created new pools and deleted some old ones. Also
>> > > > > > > > >> I created one pool for tier to be able to move data
>> > > > > > > > >> without
>> > > outage.
>> > > > > > > > >>
>> > > > > > > > >> After these operations all but 10 OSD's are down and
>> > > > > > > > >> creating this kind of messages to logs, I get more than
>> > > > > > > > >> 100gb of these in a
>> > > > > > night:
>> > > > > > > > >>
>> > > > > > > > >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23
>> > > pg_epoch:
>> > > >
>> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> > > > > > > > >> n=0
>> > > > > > > > >> ec=1 les/c
>> > > > > > > > >> 16609/16659
>> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> > > > > > > > >> pi=15659-16589/42
>> > > > > > > > >> crt=8480'7 lcod
>> > > > > > > > >> 0'0 inactive NOTIFY] enter Started
>> > > > > > > > >>    -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> > > > > > > > >> n=0
>> > > > > > > > >> ec=1 les/c
>> > > > > > > > >> 16609/16659
>> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> > > > > > > > >> pi=15659-16589/42
>> > > > > > > > >> crt=8480'7 lcod
>> > > > > > > > >> 0'0 inactive NOTIFY] enter Start
>> > > > > > > > >>    -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> > > > > > > > >> n=0
>> > > > > > > > >> ec=1 les/c
>> > > > > > > > >> 16609/16659
>> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> > > > > > > > >> pi=15659-16589/42
>> > > > > > > > >> crt=8480'7 lcod
>> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
>> > > > > > > > >>    -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> > > > > > > > >> n=0
>> > > > > > > > >> ec=1 les/c
>> > > > > > > > >> 16609/16659
>> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> > > > > > > > >> pi=15659-16589/42
>> > > > > > > > >> crt=8480'7 lcod
>> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
>> > > > > > > > >>    -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> > > > > > > > >> n=0
>> > > > > > > > >> ec=1 les/c
>> > > > > > > > >> 16609/16659
>> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> > > > > > > > >> pi=15659-16589/42
>> > > > > > > > >> crt=8480'7 lcod
>> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
>> > > > > > > > >>    -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
>> > > > > > > > >>    -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY] enter Started
>> > > > > > > > >>    -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY] enter Start
>> > > > > > > > >>    -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY]
>> > > > > > > > >> state<Start>: transitioning to Stray
>> > > > > > > > >>    -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
>> > > > > > > > >>     -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY] enter Started/Stray
>> > > > > > > > >>     -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
>> > > > > > > > >>     -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] enter Started
>> > > > > > > > >>     -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] enter Start
>> > > > > > > > >>     -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive]
>> > > > > > > > >> state<Start>: transitioning to Primary
>> > > > > > > > >>     -4> 2015-04-27 10:17:08.809479 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
>> > > > > > > > >>     -3> 2015-04-27 10:17:08.809492 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] enter Started/Primary
>> > > > > > > > >>     -2> 2015-04-27 10:17:08.809502 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
>> > > > > > > > >>     -1> 2015-04-27 10:17:08.809513 7fd8e748d700  5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
>> > > > > > > > >>      0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
>> > > > > > > ./include/interval_set.h:
>> > > > > > > > >> In
>> > > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
>> > > snapid_t]'
>> > > > > > > > >> thread
>> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
>> > > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >=
>> > > > > > > > >> 0)
>> > > > > > > > >>
>> > > > > > > > >>  ceph version 0.94.1
>> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>> > > > > > > > >>  1: (ceph::__ceph_assert_fail(char const*, char const*,
>> > > > > > > > >> int, char
>> > > > > > > > >> const*)+0x8b)
>> > > > > > > > >> [0xbc271b]
>> > > > > > > > >>  2:
>> > > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
>> > > > > > > > >> >
>> > > > > > > > >> const&)+0xb0) [0x82cd50]
>> > > > > > > > >>  3: (PGPool::update(std::tr1::shared_ptr<OSDMap
>> > > > > > > > >> const>)+0x52e) [0x80113e]
>> > > > > > > > >>  4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
>> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
>> > > > > > > > >> const>std::vector<int,
>> > > > > > > > >> std::allocator<int> >&, int, std::vector<int,
>> > > > > > > > >> std::allocator<int>
>> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
>> > > > > > > > >>  5: (OSD::advance_pg(unsigned int, PG*,
>> > > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
>> > > > > > > > >> std::set<boost::intrusive_ptr<PG>,
>> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
>> > > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
>> > > > > > > > >> [0x6b0e43]
>> > > > > > > > >>  6: (OSD::process_peering_events(std::list<PG*,
>> > > > > > > > >> std::allocator<PG*>
>> > > > > > > > >> > const&,
>> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
>> > > > > > > > >>  7: (OSD::PeeringWQ::_process(std::list<PG*,
>> > > > > > > > >> std::allocator<PG*>
>> > > > > > > > >> > const&,
>> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
>> > > > > > > > >>  8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
>> > > > > > > > >> [0xbb38ae]
>> > > > > > > > >>  9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
>> > > > > > > > >>  10: (()+0x8182) [0x7fd906946182]
>> > > > > > > > >>  11: (clone()+0x6d) [0x7fd904eb147d]
>> > > > > > > > >>
>> > > > > > > > >> Also by monitoring (ceph -w) I get the following
>> > > > > > > > >> messages, also lots of
>> > > > > > > them.
>> > > > > > > > >>
>> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
>> > > > > > > 10.20.0.13:0/1174409'
>> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush
>> > > > > > > > >> create-or-move",
>> > > > "args":
>> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]:
>>
>> > > > > > > > >> dispatch
>> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
>> > > > > > > 10.20.0.13:0/1174483'
>> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush
>> > > > > > > > >> create-or-move",
>> > > > "args":
>> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]:
>>
>> > > > > > > > >> dispatch
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are
>> > > > > > > > >> also mons and mds's to save servers. All run Ubuntu
>> 14.04.2.
>> > > > > > > > >>
>> > > > > > > > >> I have pretty much tried everything I could think of.
>> > > > > > > > >>
>> > > > > > > > >> Restarting daemons doesn't help.
>> > > > > > > > >>
>> > > > > > > > >> Any help would be appreciated. I can also provide more
>> > > > > > > > >> logs if necessary. They just seem to get pretty large
>> > > > > > > > >> in few
>> > > moments.
>> > > > > > > > >>
>> > > > > > > > >> Thank you
>> > > > > > > > >> Tuomas
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> _______________________________________________
>> > > > > > > > >> ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > _______________________________________________
>> > > > > > > > > ceph-users mailing list
>> > > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > _______________________________________________
>> > > > > > > > ceph-users mailing list
>> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > _______________________________________________
>> > > > > > > > ceph-users mailing list
>> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
       [not found]                                                   ` <928ebb7320e4eb07f14071e997ed7be2-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
@ 2015-04-30 15:23                                                     ` Sage Weil
  0 siblings, 0 replies; 8+ messages in thread
From: Sage Weil @ 2015-04-30 15:23 UTC (permalink / raw)
  To: Tuomas Juntunen
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA

On Thu, 30 Apr 2015, tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org wrote:
> Hey
> 
> Yes I can drop the images data, you think this will fix it?

It's a slightly different assert that (I believe) should not trigger once 
the pool is deleted.  Please give that a try and if you still hit it I'll 
whip up a workaround.

Thanks!
sage

 > 
> 
> Br,
> 
> Tuomas
> 
> > On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
> >> Hi
> >>
> >> I updated that version and it seems that something did happen, the osd's
> >> stayed up for a while and 'ceph status' got updated. But then in couple of
> >> minutes, they all went down the same way.
> >>
> >> I have attached new 'ceph osd dump -f json-pretty' and got a new log from
> >> one of the osd's with osd debug = 20,
> >> http://beta.xaasbox.com/ceph/ceph-osd.15.log
> >
> > Sam mentioned that you had said earlier that this was not critical data?
> > If not, I think the simplest thing is to just drop those pools.  The
> > important thing (from my perspective at least :) is that we understand the
> > root cause and can prevent this in the future.
> >
> > sage
> >
> >
> >>
> >> Thank you!
> >>
> >> Br,
> >> Tuomas
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Sage Weil [mailto:sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org]
> >> Sent: 28. huhtikuuta 2015 23:57
> >> To: Tuomas Juntunen
> >> Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org; ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
> >> operations most of the OSD's went down
> >>
> >> Hi Tuomas,
> >>
> >> I've pushed an updated wip-hammer-snaps branch.  Can you please try it?
> >> The build will appear here
> >>
> >>
> >> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
> >> 2eb514067f72afda11bcde286
> >>
> >> (or a similar url; adjust for your distro).
> >>
> >> Thanks!
> >> sage
> >>
> >>
> >> On Tue, 28 Apr 2015, Sage Weil wrote:
> >>
> >> > [adding ceph-devel]
> >> >
> >> > Okay, I see the problem.  This seems to be unrelated ot the giant ->
> >> > hammer move... it's a result of the tiering changes you made:
> >> >
> >> > > > > > > > The following:
> >> > > > > > > >
> >> > > > > > > > ceph osd tier add img images --force-nonempty ceph osd
> >> > > > > > > > tier cache-mode images forward ceph osd tier set-overlay
> >> > > > > > > > img images
> >> >
> >> > Specifically, --force-nonempty bypassed important safety checks.
> >> >
> >> > 1. images had snapshots (and removed_snaps)
> >> >
> >> > 2. images was added as a tier *of* img, and img's removed_snaps was
> >> > copied to images, clobbering the removed_snaps value (see
> >> > OSDMap::Incremental::propagate_snaps_to_tiers)
> >> >
> >> > 3. tiering relation was undone, but removed_snaps was still gone
> >> >
> >> > 4. on OSD startup, when we load the PG, removed_snaps is initialized
> >> > with the older map.  later, in PGPool::update(), we assume that
> >> > removed_snaps alwasy grows (never shrinks) and we trigger an assert.
> >> >
> >> > To fix this I think we need to do 2 things:
> >> >
> >> > 1. make the OSD forgiving out removed_snaps getting smaller.  This is
> >> > probably a good thing anyway: once we know snaps are removed on all
> >> > OSDs we can prune the interval_set in the OSDMap.  Maybe.
> >> >
> >> > 2. Fix the mon to prevent this from happening, *even* when
> >> > --force-nonempty is specified.  (This is the root cause.)
> >> >
> >> > I've opened http://tracker.ceph.com/issues/11493 to track this.
> >> >
> >> > sage
> >> >
> >> >
> >> >
> >> > > > > > > >
> >> > > > > > > > Idea was to make images as a tier to img, move data to img
> >> > > > > > > > then change
> >> > > > > > > clients to use the new img pool.
> >> > > > > > > >
> >> > > > > > > > Br,
> >> > > > > > > > Tuomas
> >> > > > > > > >
> >> > > > > > > > > Can you explain exactly what you mean by:
> >> > > > > > > > >
> >> > > > > > > > > "Also I created one pool for tier to be able to move
> >> > > > > > > > > data without
> >> > > > > > > outage."
> >> > > > > > > > >
> >> > > > > > > > > -Sam
> >> > > > > > > > > ----- Original Message -----
> >> > > > > > > > > From: "tuomas juntunen"
> >> > > > > > > > > <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> >> > > > > > > > > To: "Ian Colle" <icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >> > > > > > > > > Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> >> > > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer
> >> > > > > > > > > and after some basic operations most of the OSD's went
> >> > > > > > > > > down
> >> > > > > > > > >
> >> > > > > > > > > Hi
> >> > > > > > > > >
> >> > > > > > > > > Any solution for this yet?
> >> > > > > > > > >
> >> > > > > > > > > Br,
> >> > > > > > > > > Tuomas
> >> > > > > > > > >
> >> > > > > > > > >> It looks like you may have hit
> >> > > > > > > > >> http://tracker.ceph.com/issues/7915
> >> > > > > > > > >>
> >> > > > > > > > >> Ian R. Colle
> >> > > > > > > > >> Global Director
> >> > > > > > > > >> of Software Engineering Red Hat (Inktank is now part of
> >> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
> >> > > > > > > > >> http://www.twitter.com/ircolle
> >> > > > > > > > >> Cell: +1.303.601.7713
> >> > > > > > > > >> Email: icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> >> > > > > > > > >>
> >> > > > > > > > >> ----- Original Message -----
> >> > > > > > > > >> From: "tuomas juntunen"
> >> > > > > > > > >> <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> >> > > > > > > > >> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> >> > > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
> >> > > > > > > > >> after some basic operations most of the OSD's went down
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> >> > > > > > > > >>
> >> > > > > > > > >> Then created new pools and deleted some old ones. Also
> >> > > > > > > > >> I created one pool for tier to be able to move data
> >> > > > > > > > >> without
> >> > > outage.
> >> > > > > > > > >>
> >> > > > > > > > >> After these operations all but 10 OSD's are down and
> >> > > > > > > > >> creating this kind of messages to logs, I get more than
> >> > > > > > > > >> 100gb of these in a
> >> > > > > > night:
> >> > > > > > > > >>
> >> > > > > > > > >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23
> >> > > pg_epoch:
> >> > > >
> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> > > > > > > > >> n=0
> >> > > > > > > > >> ec=1 les/c
> >> > > > > > > > >> 16609/16659
> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> > > > > > > > >> pi=15659-16589/42
> >> > > > > > > > >> crt=8480'7 lcod
> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started
> >> > > > > > > > >>    -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> > > > > > > > >> n=0
> >> > > > > > > > >> ec=1 les/c
> >> > > > > > > > >> 16609/16659
> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> > > > > > > > >> pi=15659-16589/42
> >> > > > > > > > >> crt=8480'7 lcod
> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Start
> >> > > > > > > > >>    -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> > > > > > > > >> n=0
> >> > > > > > > > >> ec=1 les/c
> >> > > > > > > > >> 16609/16659
> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> > > > > > > > >> pi=15659-16589/42
> >> > > > > > > > >> crt=8480'7 lcod
> >> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> >> > > > > > > > >>    -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> > > > > > > > >> n=0
> >> > > > > > > > >> ec=1 les/c
> >> > > > > > > > >> 16609/16659
> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> > > > > > > > >> pi=15659-16589/42
> >> > > > > > > > >> crt=8480'7 lcod
> >> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> >> > > > > > > > >>    -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> > > > > > > > >> n=0
> >> > > > > > > > >> ec=1 les/c
> >> > > > > > > > >> 16609/16659
> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> > > > > > > > >> pi=15659-16589/42
> >> > > > > > > > >> crt=8480'7 lcod
> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> >> > > > > > > > >>    -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
> >> > > > > > > > >>    -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY] enter Started
> >> > > > > > > > >>    -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY] enter Start
> >> > > > > > > > >>    -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY]
> >> > > > > > > > >> state<Start>: transitioning to Stray
> >> > > > > > > > >>    -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
> >> > > > > > > > >>     -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY] enter Started/Stray
> >> > > > > > > > >>     -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
> >> > > > > > > > >>     -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] enter Started
> >> > > > > > > > >>     -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] enter Start
> >> > > > > > > > >>     -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive]
> >> > > > > > > > >> state<Start>: transitioning to Primary
> >> > > > > > > > >>     -4> 2015-04-27 10:17:08.809479 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
> >> > > > > > > > >>     -3> 2015-04-27 10:17:08.809492 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] enter Started/Primary
> >> > > > > > > > >>     -2> 2015-04-27 10:17:08.809502 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
> >> > > > > > > > >>     -1> 2015-04-27 10:17:08.809513 7fd8e748d700  5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
> >> > > > > > > > >>      0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> >> > > > > > > ./include/interval_set.h:
> >> > > > > > > > >> In
> >> > > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
> >> > > snapid_t]'
> >> > > > > > > > >> thread
> >> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> >> > > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >=
> >> > > > > > > > >> 0)
> >> > > > > > > > >>
> >> > > > > > > > >>  ceph version 0.94.1
> >> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> >> > > > > > > > >>  1: (ceph::__ceph_assert_fail(char const*, char const*,
> >> > > > > > > > >> int, char
> >> > > > > > > > >> const*)+0x8b)
> >> > > > > > > > >> [0xbc271b]
> >> > > > > > > > >>  2:
> >> > > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
> >> > > > > > > > >> >
> >> > > > > > > > >> const&)+0xb0) [0x82cd50]
> >> > > > > > > > >>  3: (PGPool::update(std::tr1::shared_ptr<OSDMap
> >> > > > > > > > >> const>)+0x52e) [0x80113e]
> >> > > > > > > > >>  4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> >> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
> >> > > > > > > > >> const>std::vector<int,
> >> > > > > > > > >> std::allocator<int> >&, int, std::vector<int,
> >> > > > > > > > >> std::allocator<int>
> >> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> >> > > > > > > > >>  5: (OSD::advance_pg(unsigned int, PG*,
> >> > > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
> >> > > > > > > > >> std::set<boost::intrusive_ptr<PG>,
> >> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
> >> > > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
> >> > > > > > > > >> [0x6b0e43]
> >> > > > > > > > >>  6: (OSD::process_peering_events(std::list<PG*,
> >> > > > > > > > >> std::allocator<PG*>
> >> > > > > > > > >> > const&,
> >> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> >> > > > > > > > >>  7: (OSD::PeeringWQ::_process(std::list<PG*,
> >> > > > > > > > >> std::allocator<PG*>
> >> > > > > > > > >> > const&,
> >> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> >> > > > > > > > >>  8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> >> > > > > > > > >> [0xbb38ae]
> >> > > > > > > > >>  9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> >> > > > > > > > >>  10: (()+0x8182) [0x7fd906946182]
> >> > > > > > > > >>  11: (clone()+0x6d) [0x7fd904eb147d]
> >> > > > > > > > >>
> >> > > > > > > > >> Also by monitoring (ceph -w) I get the following
> >> > > > > > > > >> messages, also lots of
> >> > > > > > > them.
> >> > > > > > > > >>
> >> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> >> > > > > > > 10.20.0.13:0/1174409'
> >> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush
> >> > > > > > > > >> create-or-move",
> >> > > > "args":
> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]:
> >>
> >> > > > > > > > >> dispatch
> >> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> >> > > > > > > 10.20.0.13:0/1174483'
> >> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush
> >> > > > > > > > >> create-or-move",
> >> > > > "args":
> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]:
> >>
> >> > > > > > > > >> dispatch
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are
> >> > > > > > > > >> also mons and mds's to save servers. All run Ubuntu
> >> 14.04.2.
> >> > > > > > > > >>
> >> > > > > > > > >> I have pretty much tried everything I could think of.
> >> > > > > > > > >>
> >> > > > > > > > >> Restarting daemons doesn't help.
> >> > > > > > > > >>
> >> > > > > > > > >> Any help would be appreciated. I can also provide more
> >> > > > > > > > >> logs if necessary. They just seem to get pretty large
> >> > > > > > > > >> in few
> >> > > moments.
> >> > > > > > > > >>
> >> > > > > > > > >> Thank you
> >> > > > > > > > >> Tuomas
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> _______________________________________________
> >> > > > > > > > >> ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > _______________________________________________
> >> > > > > > > > > ceph-users mailing list
> >> > > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > _______________________________________________
> >> > > > > > > > ceph-users mailing list
> >> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > _______________________________________________
> >> > > > > > > > ceph-users mailing list
> >> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >> >
> >>
> >
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
  2015-05-01 16:04   ` Sage Weil
@ 2015-05-01 18:13     ` tuomas.juntunen
  0 siblings, 0 replies; 8+ messages in thread
From: tuomas.juntunen @ 2015-05-01 18:13 UTC (permalink / raw)
  To: Sage Weil; +Cc: tuomas.juntunen, ceph-users, ceph-devel

Thanks, I'll do this when the commit is available and report back.

And indeed, I'll change to the official ones after everything is ok.

Br,
Tuomas

> On Fri, 1 May 2015, tuomas.juntunen@databasement.fi wrote:
>> Hi
>>
>> I deleted the images and img pools and started osd's, they still die.
>>
>> Here's a log of one of the osd's after this, if you need it.
>>
>> http://beta.xaasbox.com/ceph/ceph-osd.19.log
>
> I've pushed another commit that should avoid this case, sha1
> 425bd4e1dba00cc2243b0c27232d1f9740b04e34.
>
> Note that once the pools are fully deleted (shouldn't take too long once
> the osds are up and stabilize) you should switch back to the normal
> packages that don't have these workarounds.
>
> sage
>
>
>
>>
>> Br,
>> Tuomas
>>
>>
>> > Thanks man. I'll try it tomorrow. Have a good one.
>> >
>> > Br,T
>> >
>> > -------- Original message --------
>> > From: Sage Weil <sage@newdream.net>
>> > Date: 30/04/2015  18:23  (GMT+02:00)
>> > To: Tuomas Juntunen <tuomas.juntunen@databasement.fi>
>> > Cc: ceph-users@lists.ceph.com, ceph-devel@vger.kernel.org
>> > Subject: RE: [ceph-users] Upgrade from Giant to Hammer and after some basic
>>
>> > operations most of the OSD's went down
>> >
>> > On Thu, 30 Apr 2015, tuomas.juntunen@databasement.fi wrote:
>> >> Hey
>> >>
>> >> Yes I can drop the images data, you think this will fix it?
>> >
>> > It's a slightly different assert that (I believe) should not trigger once
>> > the pool is deleted.  Please give that a try and if you still hit it I'll
>> > whip up a workaround.
>> >
>> > Thanks!
>> > sage
>> >
>> >  >
>> >>
>> >> Br,
>> >>
>> >> Tuomas
>> >>
>> >> > On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
>> >> >> Hi
>> >> >>
>> >> >> I updated that version and it seems that something did happen, the osd's
>> >> >> stayed up for a while and 'ceph status' got updated. But then in couple
>> of
>> >> >> minutes, they all went down the same way.
>> >> >>
>> >> >> I have attached new 'ceph osd dump -f json-pretty' and got a new log
>> from
>> >> >> one of the osd's with osd debug = 20,
>> >> >> http://beta.xaasbox.com/ceph/ceph-osd.15.log
>> >> >
>> >> > Sam mentioned that you had said earlier that this was not critical data?
>> >> > If not, I think the simplest thing is to just drop those pools.  The
>> >> > important thing (from my perspective at least :) is that we understand
>> the
>> >> > root cause and can prevent this in the future.
>> >> >
>> >> > sage
>> >> >
>> >> >
>> >> >>
>> >> >> Thank you!
>> >> >>
>> >> >> Br,
>> >> >> Tuomas
>> >> >>
>> >> >>
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: Sage Weil [mailto:sage@newdream.net]
>> >> >> Sent: 28. huhtikuuta 2015 23:57
>> >> >> To: Tuomas Juntunen
>> >> >> Cc: ceph-users@lists.ceph.com; ceph-devel@vger.kernel.org
>> >> >> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some
>> basic
>> >> >> operations most of the OSD's went down
>> >> >>
>> >> >> Hi Tuomas,
>> >> >>
>> >> >> I've pushed an updated wip-hammer-snaps branch.  Can you please try it?
>> >> >> The build will appear here
>> >> >>
>> >> >>
>> >> >> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
>> >> >> 2eb514067f72afda11bcde286
>> >> >>
>> >> >> (or a similar url; adjust for your distro).
>> >> >>
>> >> >> Thanks!
>> >> >> sage
>> >> >>
>> >> >>
>> >> >> On Tue, 28 Apr 2015, Sage Weil wrote:
>> >> >>
>> >> >> > [adding ceph-devel]
>> >> >> >
>> >> >> > Okay, I see the problem.  This seems to be unrelated ot the giant ->
>> >> >> > hammer move... it's a result of the tiering changes you made:
>> >> >> >
>> >> >> > > > > > > > The following:
>> >> >> > > > > > > >
>> >> >> > > > > > > > ceph osd tier add img images --force-nonempty ceph osd
>> >> >> > > > > > > > tier cache-mode images forward ceph osd tier set-overlay
>> >> >> > > > > > > > img images
>> >> >> >
>> >> >> > Specifically, --force-nonempty bypassed important safety checks.
>> >> >> >
>> >> >> > 1. images had snapshots (and removed_snaps)
>> >> >> >
>> >> >> > 2. images was added as a tier *of* img, and img's removed_snaps was
>> >> >> > copied to images, clobbering the removed_snaps value (see
>> >> >> > OSDMap::Incremental::propagate_snaps_to_tiers)
>> >> >> >
>> >> >> > 3. tiering relation was undone, but removed_snaps was still gone
>> >> >> >
>> >> >> > 4. on OSD startup, when we load the PG, removed_snaps is initialized
>> >> >> > with the older map.  later, in PGPool::update(), we assume that
>> >> >> > removed_snaps alwasy grows (never shrinks) and we trigger an assert.
>> >> >> >
>> >> >> > To fix this I think we need to do 2 things:
>> >> >> >
>> >> >> > 1. make the OSD forgiving out removed_snaps getting smaller.  This is
>> >> >> > probably a good thing anyway: once we know snaps are removed on all
>> >> >> > OSDs we can prune the interval_set in the OSDMap.  Maybe.
>> >> >> >
>> >> >> > 2. Fix the mon to prevent this from happening, *even* when
>> >> >> > --force-nonempty is specified.  (This is the root cause.)
>> >> >> >
>> >> >> > I've opened http://tracker.ceph.com/issues/11493 to track this.
>> >> >> >
>> >> >> > sage
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > > > > > > >
>> >> >> > > > > > > > Idea was to make images as a tier to img, move data to img
>> >> >> > > > > > > > then change
>> >> >> > > > > > > clients to use the new img pool.
>> >> >> > > > > > > >
>> >> >> > > > > > > > Br,
>> >> >> > > > > > > > Tuomas
>> >> >> > > > > > > >
>> >> >> > > > > > > > > Can you explain exactly what you mean by:
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > "Also I created one pool for tier to be able to move
>> >> >> > > > > > > > > data without
>> >> >> > > > > > > outage."
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > -Sam
>> >> >> > > > > > > > > ----- Original Message -----
>> >> >> > > > > > > > > From: "tuomas juntunen"
>> >> >> > > > > > > > > <tuomas.juntunen@databasement.fi>
>> >> >> > > > > > > > > To: "Ian Colle" <icolle@redhat.com>
>> >> >> > > > > > > > > Cc: ceph-users@lists.ceph.com
>> >> >> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
>> >> >> > > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer
>> >> >> > > > > > > > > and after some basic operations most of the OSD's went
>> >> >> > > > > > > > > down
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > Hi
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > Any solution for this yet?
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > Br,
>> >> >> > > > > > > > > Tuomas
>> >> >> > > > > > > > >
>> >> >> > > > > > > > >> It looks like you may have hit
>> >> >> > > > > > > > >> http://tracker.ceph.com/issues/7915
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Ian R. Colle
>> >> >> > > > > > > > >> Global Director
>> >> >> > > > > > > > >> of Software Engineering Red Hat (Inktank is now part of
>> >> >> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
>> >> >> > > > > > > > >> http://www.twitter.com/ircolle
>> >> >> > > > > > > > >> Cell: +1.303.601.7713
>> >> >> > > > > > > > >> Email: icolle@redhat.com
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> ----- Original Message -----
>> >> >> > > > > > > > >> From: "tuomas juntunen"
>> >> >> > > > > > > > >> <tuomas.juntunen@databasement.fi>
>> >> >> > > > > > > > >> To: ceph-users@lists.ceph.com
>> >> >> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
>> >> >> > > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
>> >> >> > > > > > > > >> after some basic operations most of the OSD's went down
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Then created new pools and deleted some old ones. Also
>> >> >> > > > > > > > >> I created one pool for tier to be able to move data
>> >> >> > > > > > > > >> without
>> >> >> > > outage.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> After these operations all but 10 OSD's are down and
>> >> >> > > > > > > > >> creating this kind of messages to logs, I get more than
>> >> >> > > > > > > > >> 100gb of these in a
>> >> >> > > > > > night:
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5
>> osd.23
>> >> >> > > pg_epoch:
>> >> >> > > >
>> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> >> > > > > > > > >> n=0
>> >> >> > > > > > > > >> ec=1 les/c
>> >> >> > > > > > > > >> 16609/16659
>> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> >> > > > > > > > >> pi=15659-16589/42
>> >> >> > > > > > > > >> crt=8480'7 lcod
>> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started
>> >> >> > > > > > > > >>    -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> >> > > > > > > > >> n=0
>> >> >> > > > > > > > >> ec=1 les/c
>> >> >> > > > > > > > >> 16609/16659
>> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> >> > > > > > > > >> pi=15659-16589/42
>> >> >> > > > > > > > >> crt=8480'7 lcod
>> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Start
>> >> >> > > > > > > > >>    -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> >> > > > > > > > >> n=0
>> >> >> > > > > > > > >> ec=1 les/c
>> >> >> > > > > > > > >> 16609/16659
>> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> >> > > > > > > > >> pi=15659-16589/42
>> >> >> > > > > > > > >> crt=8480'7 lcod
>> >> >> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to
>> Stray
>> >> >> > > > > > > > >>    -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> >> > > > > > > > >> n=0
>> >> >> > > > > > > > >> ec=1 les/c
>> >> >> > > > > > > > >> 16609/16659
>> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> >> > > > > > > > >> pi=15659-16589/42
>> >> >> > > > > > > > >> crt=8480'7 lcod
>> >> >> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
>> >> >> > > > > > > > >>    -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> >> > > > > > > > >> n=0
>> >> >> > > > > > > > >> ec=1 les/c
>> >> >> > > > > > > > >> 16609/16659
>> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> >> > > > > > > > >> pi=15659-16589/42
>> >> >> > > > > > > > >> crt=8480'7 lcod
>> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
>> >> >> > > > > > > > >>    -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
>> >> >> > > > > > > > >>    -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY] enter Started
>> >> >> > > > > > > > >>    -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY] enter Start
>> >> >> > > > > > > > >>    -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY]
>> >> >> > > > > > > > >> state<Start>: transitioning to Stray
>> >> >> > > > > > > > >>    -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
>> >> >> > > > > > > > >>     -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY] enter Started/Stray
>> >> >> > > > > > > > >>     -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
>> >> >> > > > > > > > >>     -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] enter Started
>> >> >> > > > > > > > >>     -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] enter Start
>> >> >> > > > > > > > >>     -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive]
>> >> >> > > > > > > > >> state<Start>: transitioning to Primary
>> >> >> > > > > > > > >>     -4> 2015-04-27 10:17:08.809479 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
>> >> >> > > > > > > > >>     -3> 2015-04-27 10:17:08.809492 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] enter Started/Primary
>> >> >> > > > > > > > >>     -2> 2015-04-27 10:17:08.809502 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
>> >> >> > > > > > > > >>     -1> 2015-04-27 10:17:08.809513 7fd8e748d700  5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
>> >> >> > > > > > > > >>      0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
>> >> >> > > > > > > ./include/interval_set.h:
>> >> >> > > > > > > > >> In
>> >> >> > > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
>> >> >> > > snapid_t]'
>> >> >> > > > > > > > >> thread
>> >> >> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
>> >> >> > > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >=
>> >> >> > > > > > > > >> 0)
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>  ceph version 0.94.1
>> >> >> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>> >> >> > > > > > > > >>  1: (ceph::__ceph_assert_fail(char const*, char
>> const*,
>> >> >> > > > > > > > >> int, char
>> >> >> > > > > > > > >> const*)+0x8b)
>> >> >> > > > > > > > >> [0xbc271b]
>> >> >> > > > > > > > >>  2:
>> >> >> > > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
>> >> >> > > > > > > > >> >
>> >> >> > > > > > > > >> const&)+0xb0) [0x82cd50]
>> >> >> > > > > > > > >>  3: (PGPool::update(std::tr1::shared_ptr<OSDMap
>> >> >> > > > > > > > >> const>)+0x52e) [0x80113e]
>> >> >> > > > > > > > >>  4:
>> (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
>> >> >> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
>> >> >> > > > > > > > >> const>std::vector<int,
>> >> >> > > > > > > > >> std::allocator<int> >&, int, std::vector<int,
>> >> >> > > > > > > > >> std::allocator<int>
>> >> >> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
>> >> >> > > > > > > > >>  5: (OSD::advance_pg(unsigned int, PG*,
>> >> >> > > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
>> >> >> > > > > > > > >> std::set<boost::intrusive_ptr<PG>,
>> >> >> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
>> >> >> > > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
>> >> >> > > > > > > > >> [0x6b0e43]
>> >> >> > > > > > > > >>  6: (OSD::process_peering_events(std::list<PG*,
>> >> >> > > > > > > > >> std::allocator<PG*>
>> >> >> > > > > > > > >> > const&,
>> >> >> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
>> >> >> > > > > > > > >>  7: (OSD::PeeringWQ::_process(std::list<PG*,
>> >> >> > > > > > > > >> std::allocator<PG*>
>> >> >> > > > > > > > >> > const&,
>> >> >> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
>> >> >> > > > > > > > >>  8:
>> (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
>> >> >> > > > > > > > >> [0xbb38ae]
>> >> >> > > > > > > > >>  9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
>> >> >> > > > > > > > >>  10: (()+0x8182) [0x7fd906946182]
>> >> >> > > > > > > > >>  11: (clone()+0x6d) [0x7fd904eb147d]
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Also by monitoring (ceph -w) I get the following
>> >> >> > > > > > > > >> messages, also lots of
>> >> >> > > > > > > them.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
>> >> >> > > > > > > 10.20.0.13:0/1174409'
>> >> >> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush
>> >> >> > > > > > > > >> create-or-move",
>> >> >> > > > "args":
>> >> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight":
>> >> 1.82}]:
>> >> >>
>> >> >> > > > > > > > >> dispatch
>> >> >> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
>> >> >> > > > > > > 10.20.0.13:0/1174483'
>> >> >> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush
>> >> >> > > > > > > > >> create-or-move",
>> >> >> > > > "args":
>> >> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight":
>> >> 1.82}]:
>> >> >>
>> >> >> > > > > > > > >> dispatch
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are
>> >> >> > > > > > > > >> also mons and mds's to save servers. All run Ubuntu
>> >> >> 14.04.2.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> I have pretty much tried everything I could think of.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Restarting daemons doesn't help.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Any help would be appreciated. I can also provide more
>> >> >> > > > > > > > >> logs if necessary. They just seem to get pretty large
>> >> >> > > > > > > > >> in few
>> >> >> > > moments.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Thank you
>> >> >> > > > > > > > >> Tuomas
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> _______________________________________________
>> >> >> > > > > > > > >> ceph-users mailing list ceph-users@lists.ceph.com
>> >> >> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > _______________________________________________
>> >> >> > > > > > > > > ceph-users mailing list
>> >> >> > > > > > > > > ceph-users@lists.ceph.com
>> >> >> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> > > > > > > > >
>> >> >> > > > > > > > >
>> >> >> > > > > > > > >
>> >> >> > > > > > > >
>> >> >> > > > > > > >
>> >> >> > > > > > > > _______________________________________________
>> >> >> > > > > > > > ceph-users mailing list
>> >> >> > > > > > > > ceph-users@lists.ceph.com
>> >> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> > > > > > > >
>> >> >> > > > > > > >
>> >> >> > > > > > > >
>> >> >> > > > > > > > _______________________________________________
>> >> >> > > > > > > > ceph-users mailing list
>> >> >> > > > > > > > ceph-users@lists.ceph.com
>> >> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> > > > > > > >
>> >> >> > > > > > > >
>> >> >> > > > > > >
>> >> >> > > > > >
>> >> >> > > > > >
>> >> >> > > > >
>> >> >> > > > >
>> >> >> > > >
>> >> >> > >
>> >> >> > >
>> >> >> > _______________________________________________
>> >> >> > ceph-users mailing list
>> >> >> > ceph-users@lists.ceph.com
>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> >
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
  2015-04-30 17:27 tuomas.juntunen
@ 2015-05-01 15:10 ` tuomas.juntunen
  2015-05-01 16:04   ` Sage Weil
  0 siblings, 1 reply; 8+ messages in thread
From: tuomas.juntunen @ 2015-05-01 15:10 UTC (permalink / raw)
  To: tuomas.juntunen; +Cc: Sage Weil, ceph-users, ceph-devel

Hi

I deleted the images and img pools and started osd's, they still die.

Here's a log of one of the osd's after this, if you need it.

http://beta.xaasbox.com/ceph/ceph-osd.19.log

Br,
Tuomas


> Thanks man. I'll try it tomorrow. Have a good one.
>
> Br,T
>
> -------- Original message --------
> From: Sage Weil <sage@newdream.net>
> Date: 30/04/2015  18:23  (GMT+02:00)
> To: Tuomas Juntunen <tuomas.juntunen@databasement.fi>
> Cc: ceph-users@lists.ceph.com, ceph-devel@vger.kernel.org
> Subject: RE: [ceph-users] Upgrade from Giant to Hammer and after some basic

> operations most of the OSD's went down
>
> On Thu, 30 Apr 2015, tuomas.juntunen@databasement.fi wrote:
>> Hey
>>
>> Yes I can drop the images data, you think this will fix it?
>
> It's a slightly different assert that (I believe) should not trigger once
> the pool is deleted.  Please give that a try and if you still hit it I'll
> whip up a workaround.
>
> Thanks!
> sage
>
>  >
>>
>> Br,
>>
>> Tuomas
>>
>> > On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
>> >> Hi
>> >>
>> >> I updated that version and it seems that something did happen, the osd's
>> >> stayed up for a while and 'ceph status' got updated. But then in couple of
>> >> minutes, they all went down the same way.
>> >>
>> >> I have attached new 'ceph osd dump -f json-pretty' and got a new log from
>> >> one of the osd's with osd debug = 20,
>> >> http://beta.xaasbox.com/ceph/ceph-osd.15.log
>> >
>> > Sam mentioned that you had said earlier that this was not critical data?
>> > If not, I think the simplest thing is to just drop those pools.  The
>> > important thing (from my perspective at least :) is that we understand the
>> > root cause and can prevent this in the future.
>> >
>> > sage
>> >
>> >
>> >>
>> >> Thank you!
>> >>
>> >> Br,
>> >> Tuomas
>> >>
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Sage Weil [mailto:sage@newdream.net]
>> >> Sent: 28. huhtikuuta 2015 23:57
>> >> To: Tuomas Juntunen
>> >> Cc: ceph-users@lists.ceph.com; ceph-devel@vger.kernel.org
>> >> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
>> >> operations most of the OSD's went down
>> >>
>> >> Hi Tuomas,
>> >>
>> >> I've pushed an updated wip-hammer-snaps branch.  Can you please try it?
>> >> The build will appear here
>> >>
>> >>
>> >> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
>> >> 2eb514067f72afda11bcde286
>> >>
>> >> (or a similar url; adjust for your distro).
>> >>
>> >> Thanks!
>> >> sage
>> >>
>> >>
>> >> On Tue, 28 Apr 2015, Sage Weil wrote:
>> >>
>> >> > [adding ceph-devel]
>> >> >
>> >> > Okay, I see the problem.  This seems to be unrelated ot the giant ->
>> >> > hammer move... it's a result of the tiering changes you made:
>> >> >
>> >> > > > > > > > The following:
>> >> > > > > > > >
>> >> > > > > > > > ceph osd tier add img images --force-nonempty ceph osd
>> >> > > > > > > > tier cache-mode images forward ceph osd tier set-overlay
>> >> > > > > > > > img images
>> >> >
>> >> > Specifically, --force-nonempty bypassed important safety checks.
>> >> >
>> >> > 1. images had snapshots (and removed_snaps)
>> >> >
>> >> > 2. images was added as a tier *of* img, and img's removed_snaps was
>> >> > copied to images, clobbering the removed_snaps value (see
>> >> > OSDMap::Incremental::propagate_snaps_to_tiers)
>> >> >
>> >> > 3. tiering relation was undone, but removed_snaps was still gone
>> >> >
>> >> > 4. on OSD startup, when we load the PG, removed_snaps is initialized
>> >> > with the older map.  later, in PGPool::update(), we assume that
>> >> > removed_snaps alwasy grows (never shrinks) and we trigger an assert.
>> >> >
>> >> > To fix this I think we need to do 2 things:
>> >> >
>> >> > 1. make the OSD forgiving out removed_snaps getting smaller.  This is
>> >> > probably a good thing anyway: once we know snaps are removed on all
>> >> > OSDs we can prune the interval_set in the OSDMap.  Maybe.
>> >> >
>> >> > 2. Fix the mon to prevent this from happening, *even* when
>> >> > --force-nonempty is specified.  (This is the root cause.)
>> >> >
>> >> > I've opened http://tracker.ceph.com/issues/11493 to track this.
>> >> >
>> >> > sage
>> >> >
>> >> >
>> >> >
>> >> > > > > > > >
>> >> > > > > > > > Idea was to make images as a tier to img, move data to img
>> >> > > > > > > > then change
>> >> > > > > > > clients to use the new img pool.
>> >> > > > > > > >
>> >> > > > > > > > Br,
>> >> > > > > > > > Tuomas
>> >> > > > > > > >
>> >> > > > > > > > > Can you explain exactly what you mean by:
>> >> > > > > > > > >
>> >> > > > > > > > > "Also I created one pool for tier to be able to move
>> >> > > > > > > > > data without
>> >> > > > > > > outage."
>> >> > > > > > > > >
>> >> > > > > > > > > -Sam
>> >> > > > > > > > > ----- Original Message -----
>> >> > > > > > > > > From: "tuomas juntunen"
>> >> > > > > > > > > <tuomas.juntunen@databasement.fi>
>> >> > > > > > > > > To: "Ian Colle" <icolle@redhat.com>
>> >> > > > > > > > > Cc: ceph-users@lists.ceph.com
>> >> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
>> >> > > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer
>> >> > > > > > > > > and after some basic operations most of the OSD's went
>> >> > > > > > > > > down
>> >> > > > > > > > >
>> >> > > > > > > > > Hi
>> >> > > > > > > > >
>> >> > > > > > > > > Any solution for this yet?
>> >> > > > > > > > >
>> >> > > > > > > > > Br,
>> >> > > > > > > > > Tuomas
>> >> > > > > > > > >
>> >> > > > > > > > >> It looks like you may have hit
>> >> > > > > > > > >> http://tracker.ceph.com/issues/7915
>> >> > > > > > > > >>
>> >> > > > > > > > >> Ian R. Colle
>> >> > > > > > > > >> Global Director
>> >> > > > > > > > >> of Software Engineering Red Hat (Inktank is now part of
>> >> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
>> >> > > > > > > > >> http://www.twitter.com/ircolle
>> >> > > > > > > > >> Cell: +1.303.601.7713
>> >> > > > > > > > >> Email: icolle@redhat.com
>> >> > > > > > > > >>
>> >> > > > > > > > >> ----- Original Message -----
>> >> > > > > > > > >> From: "tuomas juntunen"
>> >> > > > > > > > >> <tuomas.juntunen@databasement.fi>
>> >> > > > > > > > >> To: ceph-users@lists.ceph.com
>> >> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
>> >> > > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
>> >> > > > > > > > >> after some basic operations most of the OSD's went down
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>> >> > > > > > > > >>
>> >> > > > > > > > >> Then created new pools and deleted some old ones. Also
>> >> > > > > > > > >> I created one pool for tier to be able to move data
>> >> > > > > > > > >> without
>> >> > > outage.
>> >> > > > > > > > >>
>> >> > > > > > > > >> After these operations all but 10 OSD's are down and
>> >> > > > > > > > >> creating this kind of messages to logs, I get more than
>> >> > > > > > > > >> 100gb of these in a
>> >> > > > > > night:
>> >> > > > > > > > >>
>> >> > > > > > > > >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23
>> >> > > pg_epoch:
>> >> > > >
>> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> > > > > > > > >> n=0
>> >> > > > > > > > >> ec=1 les/c
>> >> > > > > > > > >> 16609/16659
>> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> > > > > > > > >> pi=15659-16589/42
>> >> > > > > > > > >> crt=8480'7 lcod
>> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started
>> >> > > > > > > > >>    -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> > > > > > > > >> n=0
>> >> > > > > > > > >> ec=1 les/c
>> >> > > > > > > > >> 16609/16659
>> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> > > > > > > > >> pi=15659-16589/42
>> >> > > > > > > > >> crt=8480'7 lcod
>> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Start
>> >> > > > > > > > >>    -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> > > > > > > > >> n=0
>> >> > > > > > > > >> ec=1 les/c
>> >> > > > > > > > >> 16609/16659
>> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> > > > > > > > >> pi=15659-16589/42
>> >> > > > > > > > >> crt=8480'7 lcod
>> >> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
>> >> > > > > > > > >>    -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> > > > > > > > >> n=0
>> >> > > > > > > > >> ec=1 les/c
>> >> > > > > > > > >> 16609/16659
>> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> > > > > > > > >> pi=15659-16589/42
>> >> > > > > > > > >> crt=8480'7 lcod
>> >> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
>> >> > > > > > > > >>    -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> > > > > > > > >> n=0
>> >> > > > > > > > >> ec=1 les/c
>> >> > > > > > > > >> 16609/16659
>> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> > > > > > > > >> pi=15659-16589/42
>> >> > > > > > > > >> crt=8480'7 lcod
>> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
>> >> > > > > > > > >>    -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
>> >> > > > > > > > >>    -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY] enter Started
>> >> > > > > > > > >>    -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY] enter Start
>> >> > > > > > > > >>    -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY]
>> >> > > > > > > > >> state<Start>: transitioning to Stray
>> >> > > > > > > > >>    -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
>> >> > > > > > > > >>     -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY] enter Started/Stray
>> >> > > > > > > > >>     -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
>> >> > > > > > > > >>     -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] enter Started
>> >> > > > > > > > >>     -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] enter Start
>> >> > > > > > > > >>     -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive]
>> >> > > > > > > > >> state<Start>: transitioning to Primary
>> >> > > > > > > > >>     -4> 2015-04-27 10:17:08.809479 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
>> >> > > > > > > > >>     -3> 2015-04-27 10:17:08.809492 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] enter Started/Primary
>> >> > > > > > > > >>     -2> 2015-04-27 10:17:08.809502 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
>> >> > > > > > > > >>     -1> 2015-04-27 10:17:08.809513 7fd8e748d700  5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
>> >> > > > > > > > >>      0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
>> >> > > > > > > ./include/interval_set.h:
>> >> > > > > > > > >> In
>> >> > > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
>> >> > > snapid_t]'
>> >> > > > > > > > >> thread
>> >> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
>> >> > > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >=
>> >> > > > > > > > >> 0)
>> >> > > > > > > > >>
>> >> > > > > > > > >>  ceph version 0.94.1
>> >> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>> >> > > > > > > > >>  1: (ceph::__ceph_assert_fail(char const*, char const*,
>> >> > > > > > > > >> int, char
>> >> > > > > > > > >> const*)+0x8b)
>> >> > > > > > > > >> [0xbc271b]
>> >> > > > > > > > >>  2:
>> >> > > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
>> >> > > > > > > > >> >
>> >> > > > > > > > >> const&)+0xb0) [0x82cd50]
>> >> > > > > > > > >>  3: (PGPool::update(std::tr1::shared_ptr<OSDMap
>> >> > > > > > > > >> const>)+0x52e) [0x80113e]
>> >> > > > > > > > >>  4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
>> >> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
>> >> > > > > > > > >> const>std::vector<int,
>> >> > > > > > > > >> std::allocator<int> >&, int, std::vector<int,
>> >> > > > > > > > >> std::allocator<int>
>> >> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
>> >> > > > > > > > >>  5: (OSD::advance_pg(unsigned int, PG*,
>> >> > > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
>> >> > > > > > > > >> std::set<boost::intrusive_ptr<PG>,
>> >> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
>> >> > > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
>> >> > > > > > > > >> [0x6b0e43]
>> >> > > > > > > > >>  6: (OSD::process_peering_events(std::list<PG*,
>> >> > > > > > > > >> std::allocator<PG*>
>> >> > > > > > > > >> > const&,
>> >> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
>> >> > > > > > > > >>  7: (OSD::PeeringWQ::_process(std::list<PG*,
>> >> > > > > > > > >> std::allocator<PG*>
>> >> > > > > > > > >> > const&,
>> >> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
>> >> > > > > > > > >>  8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
>> >> > > > > > > > >> [0xbb38ae]
>> >> > > > > > > > >>  9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
>> >> > > > > > > > >>  10: (()+0x8182) [0x7fd906946182]
>> >> > > > > > > > >>  11: (clone()+0x6d) [0x7fd904eb147d]
>> >> > > > > > > > >>
>> >> > > > > > > > >> Also by monitoring (ceph -w) I get the following
>> >> > > > > > > > >> messages, also lots of
>> >> > > > > > > them.
>> >> > > > > > > > >>
>> >> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
>> >> > > > > > > 10.20.0.13:0/1174409'
>> >> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush
>> >> > > > > > > > >> create-or-move",
>> >> > > > "args":
>> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight":
>> 1.82}]:
>> >>
>> >> > > > > > > > >> dispatch
>> >> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
>> >> > > > > > > 10.20.0.13:0/1174483'
>> >> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush
>> >> > > > > > > > >> create-or-move",
>> >> > > > "args":
>> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight":
>> 1.82}]:
>> >>
>> >> > > > > > > > >> dispatch
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are
>> >> > > > > > > > >> also mons and mds's to save servers. All run Ubuntu
>> >> 14.04.2.
>> >> > > > > > > > >>
>> >> > > > > > > > >> I have pretty much tried everything I could think of.
>> >> > > > > > > > >>
>> >> > > > > > > > >> Restarting daemons doesn't help.
>> >> > > > > > > > >>
>> >> > > > > > > > >> Any help would be appreciated. I can also provide more
>> >> > > > > > > > >> logs if necessary. They just seem to get pretty large
>> >> > > > > > > > >> in few
>> >> > > moments.
>> >> > > > > > > > >>
>> >> > > > > > > > >> Thank you
>> >> > > > > > > > >> Tuomas
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >> _______________________________________________
>> >> > > > > > > > >> ceph-users mailing list ceph-users@lists.ceph.com
>> >> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > _______________________________________________
>> >> > > > > > > > > ceph-users mailing list
>> >> > > > > > > > > ceph-users@lists.ceph.com
>> >> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > _______________________________________________
>> >> > > > > > > > ceph-users mailing list
>> >> > > > > > > > ceph-users@lists.ceph.com
>> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > _______________________________________________
>> >> > > > > > > > ceph-users mailing list
>> >> > > > > > > > ceph-users@lists.ceph.com
>> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> > >
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >
>> >> >
>> >>
>> >
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-05-01 18:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <479273764e377f37b81dc6b0ccd55fb3@mail.meizo.com>
     [not found] ` <770484917.5624554.1430133524268.JavaMail.zimbra@redhat.com>
     [not found]   ` <813bbcbbf7d7e7ab4a8e2dba2e5cf6a2@mail.meizo.com>
     [not found]     ` <1551034631.7094890.1430134900209.JavaMail.zimbra@redhat.com>
     [not found]       ` <964da36ebed90592d8f5794ac2617a36@mail.meizo.com>
     [not found]         ` <1226598674.7136470.1430138991322.JavaMail.zimbra@redhat.com>
     [not found]           ` <76bac95ebd000308018bf900d11fae1e@mail.meizo.com>
     [not found]             ` <alpine.DEB.2.00.1504270919020.5458@cobra.newdream.net>
     [not found]               ` <03cd5dfba8f5fec3f80458a92d377a60@mail.meizo.com>
     [not found]                 ` <alpine.DEB.2.00.1504271034560.5458@cobra.newdream.net>
     [not found]                   ` <a06d58aa527edec6225737f18abb055b@mail.meizo.com>
     [not found]                     ` <alpine.DEB.2.00.1504271222002.5458@cobra.newdream.net>
     [not found]                       ` <8bed4ff8a05a8b96ed848e9f1aafa576@mail.meizo.com>
     [not found]                         ` <alpine.DEB.2.00.1504280959280.5458@cobra.newdream.net>
     [not found]                           ` <bb760e0f01a667a582f6bda67cc31684@mail.meizo.com>
     [not found]                             ` <alpine.DEB.2.00.1504281155530.5458@cobra.newdream.net>
     [not found]                               ` <f9adb4b2dcada947f418b6f95ad7a8d1@mail.meizo.com>
2015-04-28 20:19                                 ` [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down Sage Weil
     [not found]                                   ` <alpine.DEB.2.00.1504281256440.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-04-28 20:57                                     ` Sage Weil
     [not found]                                       ` <alpine.DEB.2.00.1504281355130.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-04-29  4:16                                         ` Tuomas Juntunen
     [not found]                                           ` <81216125e573cf00539f61cc090b282b-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
2015-04-29 15:38                                             ` Sage Weil
     [not found]                                               ` <alpine.DEB.2.00.1504290838060.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-04-30  3:31                                                 ` tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g
     [not found]                                                   ` <928ebb7320e4eb07f14071e997ed7be2-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
2015-04-30 15:23                                                     ` Sage Weil
2015-04-30 17:27 tuomas.juntunen
2015-05-01 15:10 ` [ceph-users] " tuomas.juntunen
2015-05-01 16:04   ` Sage Weil
2015-05-01 18:13     ` [ceph-users] " tuomas.juntunen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.