* RE: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
[not found] ` <f9adb4b2dcada947f418b6f95ad7a8d1@mail.meizo.com>
@ 2015-04-28 20:19 ` Sage Weil
[not found] ` <alpine.DEB.2.00.1504281256440.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2015-04-28 20:19 UTC (permalink / raw)
To: Tuomas Juntunen; +Cc: ceph-users, ceph-devel
[adding ceph-devel]
Okay, I see the problem. This seems to be unrelated ot the giant ->
hammer move... it's a result of the tiering changes you made:
> > > > > > The following:
> > > > > >
> > > > > > ceph osd tier add img images --force-nonempty
> > > > > > ceph osd tier cache-mode images forward
> > > > > > ceph osd tier set-overlay img images
Specifically, --force-nonempty bypassed important safety checks.
1. images had snapshots (and removed_snaps)
2. images was added as a tier *of* img, and img's removed_snaps was copied
to images, clobbering the removed_snaps value (see
OSDMap::Incremental::propagate_snaps_to_tiers)
3. tiering relation was undone, but removed_snaps was still gone
4. on OSD startup, when we load the PG, removed_snaps is initialized with
the older map. later, in PGPool::update(), we assume that removed_snaps
alwasy grows (never shrinks) and we trigger an assert.
To fix this I think we need to do 2 things:
1. make the OSD forgiving out removed_snaps getting smaller. This is
probably a good thing anyway: once we know snaps are removed on all OSDs
we can prune the interval_set in the OSDMap. Maybe.
2. Fix the mon to prevent this from happening, *even* when
--force-nonempty is specified. (This is the root cause.)
I've opened http://tracker.ceph.com/issues/11493 to track this.
sage
> > > > > >
> > > > > > Idea was to make images as a tier to img, move data to img
> > > > > > then change
> > > > > clients to use the new img pool.
> > > > > >
> > > > > > Br,
> > > > > > Tuomas
> > > > > >
> > > > > > > Can you explain exactly what you mean by:
> > > > > > >
> > > > > > > "Also I created one pool for tier to be able to move data
> > > > > > > without
> > > > > outage."
> > > > > > >
> > > > > > > -Sam
> > > > > > > ----- Original Message -----
> > > > > > > From: "tuomas juntunen" <tuomas.juntunen@databasement.fi>
> > > > > > > To: "Ian Colle" <icolle@redhat.com>
> > > > > > > Cc: ceph-users@lists.ceph.com
> > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and
> > > > > > > after some basic operations most of the OSD's went down
> > > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > Any solution for this yet?
> > > > > > >
> > > > > > > Br,
> > > > > > > Tuomas
> > > > > > >
> > > > > > >> It looks like you may have hit
> > > > > > >> http://tracker.ceph.com/issues/7915
> > > > > > >>
> > > > > > >> Ian R. Colle
> > > > > > >> Global Director
> > > > > > >> of Software Engineering
> > > > > > >> Red Hat (Inktank is now part of Red Hat!)
> > > > > > >> http://www.linkedin.com/in/ircolle
> > > > > > >> http://www.twitter.com/ircolle
> > > > > > >> Cell: +1.303.601.7713
> > > > > > >> Email: icolle@redhat.com
> > > > > > >>
> > > > > > >> ----- Original Message -----
> > > > > > >> From: "tuomas juntunen" <tuomas.juntunen@databasement.fi>
> > > > > > >> To: ceph-users@lists.ceph.com
> > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
> > > > > > >> after some basic operations most of the OSD's went down
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> > > > > > >>
> > > > > > >> Then created new pools and deleted some old ones. Also I
> > > > > > >> created one pool for tier to be able to move data without
> outage.
> > > > > > >>
> > > > > > >> After these operations all but 10 OSD's are down and
> > > > > > >> creating this kind of messages to logs, I get more than
> > > > > > >> 100gb of these in a
> > > > night:
> > > > > > >>
> > > > > > >> -19> 2015-04-27 10:17:08.808584 7fd8e748d700 5 osd.23
> pg_epoch:
> >
> > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > >> ec=1 les/c
> > > > > > >> 16609/16659
> > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > >> pi=15659-16589/42
> > > > > > >> crt=8480'7 lcod
> > > > > > >> 0'0 inactive NOTIFY] enter Started
> > > > > > >> -18> 2015-04-27 10:17:08.808596 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > >> ec=1 les/c
> > > > > > >> 16609/16659
> > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > >> pi=15659-16589/42
> > > > > > >> crt=8480'7 lcod
> > > > > > >> 0'0 inactive NOTIFY] enter Start
> > > > > > >> -17> 2015-04-27 10:17:08.808608 7fd8e748d700 1 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > >> ec=1 les/c
> > > > > > >> 16609/16659
> > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > >> pi=15659-16589/42
> > > > > > >> crt=8480'7 lcod
> > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> > > > > > >> -16> 2015-04-27 10:17:08.808621 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > >> ec=1 les/c
> > > > > > >> 16609/16659
> > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > >> pi=15659-16589/42
> > > > > > >> crt=8480'7 lcod
> > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> > > > > > >> -15> 2015-04-27 10:17:08.808637 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > >> ec=1 les/c
> > > > > > >> 16609/16659
> > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > >> pi=15659-16589/42
> > > > > > >> crt=8480'7 lcod
> > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> > > > > > >> -14> 2015-04-27 10:17:08.808796 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > >> NOTIFY] exit Reset 0.119467 4 0.000037
> > > > > > >> -13> 2015-04-27 10:17:08.808817 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > >> NOTIFY] enter Started
> > > > > > >> -12> 2015-04-27 10:17:08.808828 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > >> NOTIFY] enter Start
> > > > > > >> -11> 2015-04-27 10:17:08.808838 7fd8e748d700 1 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > >> NOTIFY]
> > > > > > >> state<Start>: transitioning to Stray
> > > > > > >> -10> 2015-04-27 10:17:08.808849 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > >> NOTIFY] exit Start 0.000020 0 0.000000
> > > > > > >> -9> 2015-04-27 10:17:08.808861 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > >> 17879/17879
> > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > >> NOTIFY] enter Started/Stray
> > > > > > >> -8> 2015-04-27 10:17:08.809427 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > >> inactive] exit Reset 7.511623 45 0.000165
> > > > > > >> -7> 2015-04-27 10:17:08.809445 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > >> inactive] enter Started
> > > > > > >> -6> 2015-04-27 10:17:08.809456 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > >> inactive] enter Start
> > > > > > >> -5> 2015-04-27 10:17:08.809468 7fd8e748d700 1 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > >> inactive]
> > > > > > >> state<Start>: transitioning to Primary
> > > > > > >> -4> 2015-04-27 10:17:08.809479 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > >> inactive] exit Start 0.000023 0 0.000000
> > > > > > >> -3> 2015-04-27 10:17:08.809492 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > >> inactive] enter Started/Primary
> > > > > > >> -2> 2015-04-27 10:17:08.809502 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > >> inactive] enter Started/Primary/Peering
> > > > > > >> -1> 2015-04-27 10:17:08.809513 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > >> 16127/16344
> > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > >> peering] enter Started/Primary/Peering/GetInfo
> > > > > > >> 0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> > > > > ./include/interval_set.h:
> > > > > > >> In
> > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
> snapid_t]'
> > > > > > >> thread
> > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >= 0)
> > > > > > >>
> > > > > > >> ceph version 0.94.1
> > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> > > > > > >> 1: (ceph::__ceph_assert_fail(char const*, char const*,
> > > > > > >> int, char
> > > > > > >> const*)+0x8b)
> > > > > > >> [0xbc271b]
> > > > > > >> 2:
> > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t>
> > > > > > >> const&)+0xb0) [0x82cd50]
> > > > > > >> 3: (PGPool::update(std::tr1::shared_ptr<OSDMap
> > > > > > >> const>)+0x52e) [0x80113e]
> > > > > > >> 4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
> > > > > > >> const>std::vector<int,
> > > > > > >> std::allocator<int> >&, int, std::vector<int,
> > > > > > >> std::allocator<int>
> > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> > > > > > >> 5: (OSD::advance_pg(unsigned int, PG*,
> > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
> > > > > > >> std::set<boost::intrusive_ptr<PG>,
> > > > > > >> std::less<boost::intrusive_ptr<PG> >,
> > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
> > > > > > >> [0x6b0e43]
> > > > > > >> 6: (OSD::process_peering_events(std::list<PG*,
> > > > > > >> std::allocator<PG*>
> > > > > > >> > const&,
> > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> > > > > > >> 7: (OSD::PeeringWQ::_process(std::list<PG*,
> > > > > > >> std::allocator<PG*>
> > > > > > >> > const&,
> > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> > > > > > >> 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> > > > > > >> [0xbb38ae]
> > > > > > >> 9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> > > > > > >> 10: (()+0x8182) [0x7fd906946182]
> > > > > > >> 11: (clone()+0x6d) [0x7fd904eb147d]
> > > > > > >>
> > > > > > >> Also by monitoring (ceph -w) I get the following messages,
> > > > > > >> also lots of
> > > > > them.
> > > > > > >>
> > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> > > > > 10.20.0.13:0/1174409'
> > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush create-or-move",
> > "args":
> > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]:
> > > > > > >> dispatch
> > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> > > > > 10.20.0.13:0/1174483'
> > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush create-or-move",
> > "args":
> > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]:
> > > > > > >> dispatch
> > > > > > >>
> > > > > > >>
> > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are also
> > > > > > >> mons and mds's to save servers. All run Ubuntu 14.04.2.
> > > > > > >>
> > > > > > >> I have pretty much tried everything I could think of.
> > > > > > >>
> > > > > > >> Restarting daemons doesn't help.
> > > > > > >>
> > > > > > >> Any help would be appreciated. I can also provide more logs
> > > > > > >> if necessary. They just seem to get pretty large in few
> moments.
> > > > > > >>
> > > > > > >> Thank you
> > > > > > >> Tuomas
> > > > > > >>
> > > > > > >>
> > > > > > >> _______________________________________________
> > > > > > >> ceph-users mailing list
> > > > > > >> ceph-users@lists.ceph.com
> > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list
> > > > > > > ceph-users@lists.ceph.com
> > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list
> > > > > > ceph-users@lists.ceph.com
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list
> > > > > > ceph-users@lists.ceph.com
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
[not found] ` <alpine.DEB.2.00.1504281256440.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2015-04-28 20:57 ` Sage Weil
[not found] ` <alpine.DEB.2.00.1504281355130.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2015-04-28 20:57 UTC (permalink / raw)
To: Tuomas Juntunen
Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA
Hi Tuomas,
I've pushed an updated wip-hammer-snaps branch. Can you please try it?
The build will appear here
http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e2eb514067f72afda11bcde286
(or a similar url; adjust for your distro).
Thanks!
sage
On Tue, 28 Apr 2015, Sage Weil wrote:
> [adding ceph-devel]
>
> Okay, I see the problem. This seems to be unrelated ot the giant ->
> hammer move... it's a result of the tiering changes you made:
>
> > > > > > > The following:
> > > > > > >
> > > > > > > ceph osd tier add img images --force-nonempty
> > > > > > > ceph osd tier cache-mode images forward
> > > > > > > ceph osd tier set-overlay img images
>
> Specifically, --force-nonempty bypassed important safety checks.
>
> 1. images had snapshots (and removed_snaps)
>
> 2. images was added as a tier *of* img, and img's removed_snaps was copied
> to images, clobbering the removed_snaps value (see
> OSDMap::Incremental::propagate_snaps_to_tiers)
>
> 3. tiering relation was undone, but removed_snaps was still gone
>
> 4. on OSD startup, when we load the PG, removed_snaps is initialized with
> the older map. later, in PGPool::update(), we assume that removed_snaps
> alwasy grows (never shrinks) and we trigger an assert.
>
> To fix this I think we need to do 2 things:
>
> 1. make the OSD forgiving out removed_snaps getting smaller. This is
> probably a good thing anyway: once we know snaps are removed on all OSDs
> we can prune the interval_set in the OSDMap. Maybe.
>
> 2. Fix the mon to prevent this from happening, *even* when
> --force-nonempty is specified. (This is the root cause.)
>
> I've opened http://tracker.ceph.com/issues/11493 to track this.
>
> sage
>
>
>
> > > > > > >
> > > > > > > Idea was to make images as a tier to img, move data to img
> > > > > > > then change
> > > > > > clients to use the new img pool.
> > > > > > >
> > > > > > > Br,
> > > > > > > Tuomas
> > > > > > >
> > > > > > > > Can you explain exactly what you mean by:
> > > > > > > >
> > > > > > > > "Also I created one pool for tier to be able to move data
> > > > > > > > without
> > > > > > outage."
> > > > > > > >
> > > > > > > > -Sam
> > > > > > > > ----- Original Message -----
> > > > > > > > From: "tuomas juntunen" <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > > To: "Ian Colle" <icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > > > > > Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer and
> > > > > > > > after some basic operations most of the OSD's went down
> > > > > > > >
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > Any solution for this yet?
> > > > > > > >
> > > > > > > > Br,
> > > > > > > > Tuomas
> > > > > > > >
> > > > > > > >> It looks like you may have hit
> > > > > > > >> http://tracker.ceph.com/issues/7915
> > > > > > > >>
> > > > > > > >> Ian R. Colle
> > > > > > > >> Global Director
> > > > > > > >> of Software Engineering
> > > > > > > >> Red Hat (Inktank is now part of Red Hat!)
> > > > > > > >> http://www.linkedin.com/in/ircolle
> > > > > > > >> http://www.twitter.com/ircolle
> > > > > > > >> Cell: +1.303.601.7713
> > > > > > > >> Email: icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > > > > > > >>
> > > > > > > >> ----- Original Message -----
> > > > > > > >> From: "tuomas juntunen" <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > >> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
> > > > > > > >> after some basic operations most of the OSD's went down
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> > > > > > > >>
> > > > > > > >> Then created new pools and deleted some old ones. Also I
> > > > > > > >> created one pool for tier to be able to move data without
> > outage.
> > > > > > > >>
> > > > > > > >> After these operations all but 10 OSD's are down and
> > > > > > > >> creating this kind of messages to logs, I get more than
> > > > > > > >> 100gb of these in a
> > > > > night:
> > > > > > > >>
> > > > > > > >> -19> 2015-04-27 10:17:08.808584 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Started
> > > > > > > >> -18> 2015-04-27 10:17:08.808596 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Start
> > > > > > > >> -17> 2015-04-27 10:17:08.808608 7fd8e748d700 1 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> > > > > > > >> -16> 2015-04-27 10:17:08.808621 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> > > > > > > >> -15> 2015-04-27 10:17:08.808637 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> > > > > > > >> -14> 2015-04-27 10:17:08.808796 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > > >> NOTIFY] exit Reset 0.119467 4 0.000037
> > > > > > > >> -13> 2015-04-27 10:17:08.808817 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > > >> NOTIFY] enter Started
> > > > > > > >> -12> 2015-04-27 10:17:08.808828 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > > >> NOTIFY] enter Start
> > > > > > > >> -11> 2015-04-27 10:17:08.808838 7fd8e748d700 1 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > > >> NOTIFY]
> > > > > > > >> state<Start>: transitioning to Stray
> > > > > > > >> -10> 2015-04-27 10:17:08.808849 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > > >> NOTIFY] exit Start 0.000020 0 0.000000
> > > > > > > >> -9> 2015-04-27 10:17:08.808861 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive
> > > > > > > >> NOTIFY] enter Started/Stray
> > > > > > > >> -8> 2015-04-27 10:17:08.809427 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > > >> inactive] exit Reset 7.511623 45 0.000165
> > > > > > > >> -7> 2015-04-27 10:17:08.809445 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > > >> inactive] enter Started
> > > > > > > >> -6> 2015-04-27 10:17:08.809456 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > > >> inactive] enter Start
> > > > > > > >> -5> 2015-04-27 10:17:08.809468 7fd8e748d700 1 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > > >> inactive]
> > > > > > > >> state<Start>: transitioning to Primary
> > > > > > > >> -4> 2015-04-27 10:17:08.809479 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > > >> inactive] exit Start 0.000023 0 0.000000
> > > > > > > >> -3> 2015-04-27 10:17:08.809492 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > > >> inactive] enter Started/Primary
> > > > > > > >> -2> 2015-04-27 10:17:08.809502 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > > >> inactive] enter Started/Primary/Peering
> > > > > > > >> -1> 2015-04-27 10:17:08.809513 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0
> > > > > > > >> peering] enter Started/Primary/Peering/GetInfo
> > > > > > > >> 0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> > > > > > ./include/interval_set.h:
> > > > > > > >> In
> > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
> > snapid_t]'
> > > > > > > >> thread
> > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >= 0)
> > > > > > > >>
> > > > > > > >> ceph version 0.94.1
> > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> > > > > > > >> 1: (ceph::__ceph_assert_fail(char const*, char const*,
> > > > > > > >> int, char
> > > > > > > >> const*)+0x8b)
> > > > > > > >> [0xbc271b]
> > > > > > > >> 2:
> > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t>
> > > > > > > >> const&)+0xb0) [0x82cd50]
> > > > > > > >> 3: (PGPool::update(std::tr1::shared_ptr<OSDMap
> > > > > > > >> const>)+0x52e) [0x80113e]
> > > > > > > >> 4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
> > > > > > > >> const>std::vector<int,
> > > > > > > >> std::allocator<int> >&, int, std::vector<int,
> > > > > > > >> std::allocator<int>
> > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> > > > > > > >> 5: (OSD::advance_pg(unsigned int, PG*,
> > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
> > > > > > > >> std::set<boost::intrusive_ptr<PG>,
> > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
> > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
> > > > > > > >> [0x6b0e43]
> > > > > > > >> 6: (OSD::process_peering_events(std::list<PG*,
> > > > > > > >> std::allocator<PG*>
> > > > > > > >> > const&,
> > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> > > > > > > >> 7: (OSD::PeeringWQ::_process(std::list<PG*,
> > > > > > > >> std::allocator<PG*>
> > > > > > > >> > const&,
> > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> > > > > > > >> 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> > > > > > > >> [0xbb38ae]
> > > > > > > >> 9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> > > > > > > >> 10: (()+0x8182) [0x7fd906946182]
> > > > > > > >> 11: (clone()+0x6d) [0x7fd904eb147d]
> > > > > > > >>
> > > > > > > >> Also by monitoring (ceph -w) I get the following messages,
> > > > > > > >> also lots of
> > > > > > them.
> > > > > > > >>
> > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> > > > > > 10.20.0.13:0/1174409'
> > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush create-or-move",
> > > "args":
> > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]:
> > > > > > > >> dispatch
> > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> > > > > > 10.20.0.13:0/1174483'
> > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush create-or-move",
> > > "args":
> > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]:
> > > > > > > >> dispatch
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are also
> > > > > > > >> mons and mds's to save servers. All run Ubuntu 14.04.2.
> > > > > > > >>
> > > > > > > >> I have pretty much tried everything I could think of.
> > > > > > > >>
> > > > > > > >> Restarting daemons doesn't help.
> > > > > > > >>
> > > > > > > >> Any help would be appreciated. I can also provide more logs
> > > > > > > >> if necessary. They just seem to get pretty large in few
> > moments.
> > > > > > > >>
> > > > > > > >> Thank you
> > > > > > > >> Tuomas
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> _______________________________________________
> > > > > > > >> ceph-users mailing list
> > > > > > > >> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > ceph-users mailing list
> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list
> > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list
> > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
[not found] ` <alpine.DEB.2.00.1504281355130.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2015-04-29 4:16 ` Tuomas Juntunen
[not found] ` <81216125e573cf00539f61cc090b282b-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Tuomas Juntunen @ 2015-04-29 4:16 UTC (permalink / raw)
To: 'Sage Weil'
Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 17530 bytes --]
Hi
I updated that version and it seems that something did happen, the osd's
stayed up for a while and 'ceph status' got updated. But then in couple of
minutes, they all went down the same way.
I have attached new 'ceph osd dump -f json-pretty' and got a new log from
one of the osd's with osd debug = 20,
http://beta.xaasbox.com/ceph/ceph-osd.15.log
Thank you!
Br,
Tuomas
-----Original Message-----
From: Sage Weil [mailto:sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org]
Sent: 28. huhtikuuta 2015 23:57
To: Tuomas Juntunen
Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org; ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
operations most of the OSD's went down
Hi Tuomas,
I've pushed an updated wip-hammer-snaps branch. Can you please try it?
The build will appear here
http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
2eb514067f72afda11bcde286
(or a similar url; adjust for your distro).
Thanks!
sage
On Tue, 28 Apr 2015, Sage Weil wrote:
> [adding ceph-devel]
>
> Okay, I see the problem. This seems to be unrelated ot the giant ->
> hammer move... it's a result of the tiering changes you made:
>
> > > > > > > The following:
> > > > > > >
> > > > > > > ceph osd tier add img images --force-nonempty ceph osd
> > > > > > > tier cache-mode images forward ceph osd tier set-overlay
> > > > > > > img images
>
> Specifically, --force-nonempty bypassed important safety checks.
>
> 1. images had snapshots (and removed_snaps)
>
> 2. images was added as a tier *of* img, and img's removed_snaps was
> copied to images, clobbering the removed_snaps value (see
> OSDMap::Incremental::propagate_snaps_to_tiers)
>
> 3. tiering relation was undone, but removed_snaps was still gone
>
> 4. on OSD startup, when we load the PG, removed_snaps is initialized
> with the older map. later, in PGPool::update(), we assume that
> removed_snaps alwasy grows (never shrinks) and we trigger an assert.
>
> To fix this I think we need to do 2 things:
>
> 1. make the OSD forgiving out removed_snaps getting smaller. This is
> probably a good thing anyway: once we know snaps are removed on all
> OSDs we can prune the interval_set in the OSDMap. Maybe.
>
> 2. Fix the mon to prevent this from happening, *even* when
> --force-nonempty is specified. (This is the root cause.)
>
> I've opened http://tracker.ceph.com/issues/11493 to track this.
>
> sage
>
>
>
> > > > > > >
> > > > > > > Idea was to make images as a tier to img, move data to img
> > > > > > > then change
> > > > > > clients to use the new img pool.
> > > > > > >
> > > > > > > Br,
> > > > > > > Tuomas
> > > > > > >
> > > > > > > > Can you explain exactly what you mean by:
> > > > > > > >
> > > > > > > > "Also I created one pool for tier to be able to move
> > > > > > > > data without
> > > > > > outage."
> > > > > > > >
> > > > > > > > -Sam
> > > > > > > > ----- Original Message -----
> > > > > > > > From: "tuomas juntunen"
> > > > > > > > <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > > To: "Ian Colle" <icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > > > > > Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer
> > > > > > > > and after some basic operations most of the OSD's went
> > > > > > > > down
> > > > > > > >
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > Any solution for this yet?
> > > > > > > >
> > > > > > > > Br,
> > > > > > > > Tuomas
> > > > > > > >
> > > > > > > >> It looks like you may have hit
> > > > > > > >> http://tracker.ceph.com/issues/7915
> > > > > > > >>
> > > > > > > >> Ian R. Colle
> > > > > > > >> Global Director
> > > > > > > >> of Software Engineering Red Hat (Inktank is now part of
> > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
> > > > > > > >> http://www.twitter.com/ircolle
> > > > > > > >> Cell: +1.303.601.7713
> > > > > > > >> Email: icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > > > > > > >>
> > > > > > > >> ----- Original Message -----
> > > > > > > >> From: "tuomas juntunen"
> > > > > > > >> <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > >> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
> > > > > > > >> after some basic operations most of the OSD's went down
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> > > > > > > >>
> > > > > > > >> Then created new pools and deleted some old ones. Also
> > > > > > > >> I created one pool for tier to be able to move data
> > > > > > > >> without
> > outage.
> > > > > > > >>
> > > > > > > >> After these operations all but 10 OSD's are down and
> > > > > > > >> creating this kind of messages to logs, I get more than
> > > > > > > >> 100gb of these in a
> > > > > night:
> > > > > > > >>
> > > > > > > >> -19> 2015-04-27 10:17:08.808584 7fd8e748d700 5 osd.23
> > pg_epoch:
> > >
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> > > > > > > >> n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Started
> > > > > > > >> -18> 2015-04-27 10:17:08.808596 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> > > > > > > >> n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Start
> > > > > > > >> -17> 2015-04-27 10:17:08.808608 7fd8e748d700 1
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> > > > > > > >> n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> > > > > > > >> -16> 2015-04-27 10:17:08.808621 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> > > > > > > >> n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> > > > > > > >> -15> 2015-04-27 10:17:08.808637 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> > > > > > > >> n=0
> > > > > > > >> ec=1 les/c
> > > > > > > >> 16609/16659
> > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > >> pi=15659-16589/42
> > > > > > > >> crt=8480'7 lcod
> > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> > > > > > > >> -14> 2015-04-27 10:17:08.808796 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
> > > > > > > >> -13> 2015-04-27 10:17:08.808817 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > >> inactive NOTIFY] enter Started
> > > > > > > >> -12> 2015-04-27 10:17:08.808828 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > >> inactive NOTIFY] enter Start
> > > > > > > >> -11> 2015-04-27 10:17:08.808838 7fd8e748d700 1
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > >> inactive NOTIFY]
> > > > > > > >> state<Start>: transitioning to Stray
> > > > > > > >> -10> 2015-04-27 10:17:08.808849 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
> > > > > > > >> -9> 2015-04-27 10:17:08.808861 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > >> les/c
> > > > > > > >> 17879/17879
> > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > >> inactive NOTIFY] enter Started/Stray
> > > > > > > >> -8> 2015-04-27 10:17:08.809427 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
> > > > > > > >> -7> 2015-04-27 10:17:08.809445 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > >> 0'0 inactive] enter Started
> > > > > > > >> -6> 2015-04-27 10:17:08.809456 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > >> 0'0 inactive] enter Start
> > > > > > > >> -5> 2015-04-27 10:17:08.809468 7fd8e748d700 1
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > >> 0'0 inactive]
> > > > > > > >> state<Start>: transitioning to Primary
> > > > > > > >> -4> 2015-04-27 10:17:08.809479 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
> > > > > > > >> -3> 2015-04-27 10:17:08.809492 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > >> 0'0 inactive] enter Started/Primary
> > > > > > > >> -2> 2015-04-27 10:17:08.809502 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
> > > > > > > >> -1> 2015-04-27 10:17:08.809513 7fd8e748d700 5
> > > > > > > >> osd.23
> > > pg_epoch:
> > > >
> > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > >> 16127/16344
> > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
> > > > > > > >> 0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> > > > > > ./include/interval_set.h:
> > > > > > > >> In
> > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
> > snapid_t]'
> > > > > > > >> thread
> > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >=
> > > > > > > >> 0)
> > > > > > > >>
> > > > > > > >> ceph version 0.94.1
> > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> > > > > > > >> 1: (ceph::__ceph_assert_fail(char const*, char const*,
> > > > > > > >> int, char
> > > > > > > >> const*)+0x8b)
> > > > > > > >> [0xbc271b]
> > > > > > > >> 2:
> > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
> > > > > > > >> >
> > > > > > > >> const&)+0xb0) [0x82cd50]
> > > > > > > >> 3: (PGPool::update(std::tr1::shared_ptr<OSDMap
> > > > > > > >> const>)+0x52e) [0x80113e]
> > > > > > > >> 4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
> > > > > > > >> const>std::vector<int,
> > > > > > > >> std::allocator<int> >&, int, std::vector<int,
> > > > > > > >> std::allocator<int>
> > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> > > > > > > >> 5: (OSD::advance_pg(unsigned int, PG*,
> > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
> > > > > > > >> std::set<boost::intrusive_ptr<PG>,
> > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
> > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
> > > > > > > >> [0x6b0e43]
> > > > > > > >> 6: (OSD::process_peering_events(std::list<PG*,
> > > > > > > >> std::allocator<PG*>
> > > > > > > >> > const&,
> > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> > > > > > > >> 7: (OSD::PeeringWQ::_process(std::list<PG*,
> > > > > > > >> std::allocator<PG*>
> > > > > > > >> > const&,
> > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> > > > > > > >> 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> > > > > > > >> [0xbb38ae]
> > > > > > > >> 9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> > > > > > > >> 10: (()+0x8182) [0x7fd906946182]
> > > > > > > >> 11: (clone()+0x6d) [0x7fd904eb147d]
> > > > > > > >>
> > > > > > > >> Also by monitoring (ceph -w) I get the following
> > > > > > > >> messages, also lots of
> > > > > > them.
> > > > > > > >>
> > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> > > > > > 10.20.0.13:0/1174409'
> > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush
> > > > > > > >> create-or-move",
> > > "args":
> > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]:
> > > > > > > >> dispatch
> > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> > > > > > 10.20.0.13:0/1174483'
> > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush
> > > > > > > >> create-or-move",
> > > "args":
> > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]:
> > > > > > > >> dispatch
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are
> > > > > > > >> also mons and mds's to save servers. All run Ubuntu
14.04.2.
> > > > > > > >>
> > > > > > > >> I have pretty much tried everything I could think of.
> > > > > > > >>
> > > > > > > >> Restarting daemons doesn't help.
> > > > > > > >>
> > > > > > > >> Any help would be appreciated. I can also provide more
> > > > > > > >> logs if necessary. They just seem to get pretty large
> > > > > > > >> in few
> > moments.
> > > > > > > >>
> > > > > > > >> Thank you
> > > > > > > >> Tuomas
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> _______________________________________________
> > > > > > > >> ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > ceph-users mailing list
> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list
> > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list
> > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
[-- Attachment #2: 18610json.pretty.txt --]
[-- Type: text/plain, Size: 93942 bytes --]
{
"epoch": 18610,
"fsid": "a2974742-3805-4cd3-bc79-765f2bddaefe",
"created": "2014-10-15 20:43:45.186949",
"modified": "2015-04-29 06:49:32.691995",
"flags": "",
"cluster_snapshot": "",
"pool_max": 17,
"max_osd": 71,
"pools": [
{
"pool": 0,
"pool_name": "data",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 3,
"min_size": 1,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 4096,
"pg_placement_num": 4096,
"crash_replay_interval": 45,
"last_change": "1112",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 0,
"cache_target_full_ratio_micro": 0,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 1,
"stripe_width": 0,
"expected_num_objects": 0
},
{
"pool": 1,
"pool_name": "metadata",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 3,
"min_size": 1,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 4096,
"pg_placement_num": 4096,
"crash_replay_interval": 0,
"last_change": "1114",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 0,
"cache_target_full_ratio_micro": 0,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 1,
"stripe_width": 0,
"expected_num_objects": 0
},
{
"pool": 2,
"pool_name": "rbd",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 2,
"min_size": 1,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 4096,
"pg_placement_num": 4096,
"crash_replay_interval": 0,
"last_change": "1116",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 0,
"cache_target_full_ratio_micro": 0,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 1,
"stripe_width": 0,
"expected_num_objects": 0
},
{
"pool": 3,
"pool_name": "volumes",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 3,
"min_size": 1,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 4096,
"pg_placement_num": 4096,
"crash_replay_interval": 0,
"last_change": "9974",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 23,
"snap_epoch": 9974,
"pool_snaps": [],
"removed_snaps": "[1~17]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 400000,
"cache_target_full_ratio_micro": 800000,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "default",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 1,
"stripe_width": 0,
"expected_num_objects": 0
},
{
"pool": 4,
"pool_name": "images",
"flags": 9,
"flags_names": "hashpspool,incomplete_clones",
"type": 1,
"size": 3,
"min_size": 1,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 4096,
"pg_placement_num": 4096,
"crash_replay_interval": 0,
"last_change": "17905",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 17882,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 0,
"cache_target_full_ratio_micro": 0,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "default",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 1,
"stripe_width": 0,
"expected_num_objects": 0
},
{
"pool": 6,
"pool_name": "vms",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 3,
"min_size": 1,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 4096,
"pg_placement_num": 4096,
"crash_replay_interval": 0,
"last_change": "1122",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 400000,
"cache_target_full_ratio_micro": 800000,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "default",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 1,
"stripe_width": 0,
"expected_num_objects": 0
},
{
"pool": 7,
"pool_name": "san",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 3,
"min_size": 1,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 4096,
"pg_placement_num": 4096,
"crash_replay_interval": 0,
"last_change": "14096",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 400000,
"cache_target_full_ratio_micro": 800000,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 0,
"stripe_width": 0,
"expected_num_objects": 0
},
{
"pool": 8,
"pool_name": "vol-ssd-accelerated",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 3,
"min_size": 2,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 1024,
"pg_placement_num": 1024,
"crash_replay_interval": 0,
"last_change": "17861",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 400000,
"cache_target_full_ratio_micro": 800000,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 0,
"stripe_width": 0,
"expected_num_objects": 0
},
{
"pool": 14,
"pool_name": "backup",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 3,
"min_size": 2,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 128,
"pg_placement_num": 128,
"crash_replay_interval": 0,
"last_change": "18018",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 400000,
"cache_target_full_ratio_micro": 800000,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 0,
"stripe_width": 0,
"expected_num_objects": 0
},
{
"pool": 15,
"pool_name": "img",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 3,
"min_size": 2,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 256,
"pg_placement_num": 256,
"crash_replay_interval": 0,
"last_change": "18019",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 400000,
"cache_target_full_ratio_micro": 800000,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 0,
"stripe_width": 0,
"expected_num_objects": 0
},
{
"pool": 16,
"pool_name": "vm",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 3,
"min_size": 2,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 1024,
"pg_placement_num": 1024,
"crash_replay_interval": 0,
"last_change": "18020",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 400000,
"cache_target_full_ratio_micro": 800000,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 0,
"stripe_width": 0,
"expected_num_objects": 0
},
{
"pool": 17,
"pool_name": "infradisks",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 3,
"min_size": 2,
"crush_ruleset": 0,
"object_hash": 2,
"pg_num": 256,
"pg_placement_num": 256,
"crash_replay_interval": 0,
"last_change": "18021",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 0,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 400000,
"cache_target_full_ratio_micro": 800000,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"min_read_recency_for_promote": 0,
"stripe_width": 0,
"expected_num_objects": 0
}
],
"osds": [
{
"osd": 0,
"uuid": "757c3bc5-4d00-4344-8de4-82f5379c96af",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15738,
"last_clean_end": 17882,
"up_from": 18352,
"up_thru": 18353,
"down_at": 18415,
"lost_at": 0,
"public_addr": "10.20.0.11:6833\/2259607",
"cluster_addr": "10.20.0.11:6836\/2259607",
"heartbeat_back_addr": "10.20.0.11:6853\/2259607",
"heartbeat_front_addr": "10.20.0.11:6856\/2259607",
"state": [
"autoout",
"exists"
]
},
{
"osd": 1,
"uuid": "c7eaa4ac-99fc-46db-84aa-a67274896ec8",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15740,
"last_clean_end": 17882,
"up_from": 18350,
"up_thru": 18352,
"down_at": 18403,
"lost_at": 0,
"public_addr": "10.20.0.11:6813\/2259893",
"cluster_addr": "10.20.0.11:6814\/2259893",
"heartbeat_back_addr": "10.20.0.11:6815\/2259893",
"heartbeat_front_addr": "10.20.0.11:6825\/2259893",
"state": [
"autoout",
"exists"
]
},
{
"osd": 2,
"uuid": "206b2949-4adf-4789-8e06-f68a8ee819c9",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15739,
"last_clean_end": 17882,
"up_from": 18348,
"up_thru": 18348,
"down_at": 18415,
"lost_at": 0,
"public_addr": "10.20.0.11:6809\/2259657",
"cluster_addr": "10.20.0.11:6810\/2259657",
"heartbeat_back_addr": "10.20.0.11:6811\/2259657",
"heartbeat_front_addr": "10.20.0.11:6812\/2259657",
"state": [
"autoout",
"exists"
]
},
{
"osd": 3,
"uuid": "90b7c219-4dcd-48ea-a24d-f3b796a521e4",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15736,
"last_clean_end": 17882,
"up_from": 18346,
"up_thru": 18346,
"down_at": 18412,
"lost_at": 0,
"public_addr": "10.20.0.11:6829\/2257497",
"cluster_addr": "10.20.0.11:6830\/2257497",
"heartbeat_back_addr": "10.20.0.11:6831\/2257497",
"heartbeat_front_addr": "10.20.0.11:6832\/2257497",
"state": [
"autoout",
"exists"
]
},
{
"osd": 4,
"uuid": "049ef94f-121a-4e71-8ba6-27eaebf0a569",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15737,
"last_clean_end": 17883,
"up_from": 18342,
"up_thru": 18345,
"down_at": 18415,
"lost_at": 0,
"public_addr": "10.20.0.11:6861\/2257349",
"cluster_addr": "10.20.0.11:6862\/2257349",
"heartbeat_back_addr": "10.20.0.11:6863\/2257349",
"heartbeat_front_addr": "10.20.0.11:6864\/2257349",
"state": [
"autoout",
"exists"
]
},
{
"osd": 5,
"uuid": "2437a53b-339e-45af-b4de-0fc675d27405",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15734,
"last_clean_end": 17882,
"up_from": 18347,
"up_thru": 18347,
"down_at": 18403,
"lost_at": 0,
"public_addr": "10.20.0.11:6821\/2256278",
"cluster_addr": "10.20.0.11:6822\/2256278",
"heartbeat_back_addr": "10.20.0.11:6823\/2256278",
"heartbeat_front_addr": "10.20.0.11:6824\/2256278",
"state": [
"autoout",
"exists"
]
},
{
"osd": 6,
"uuid": "f117ceed-b1fd-4069-99fe-b7aba9f3ef8d",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15738,
"last_clean_end": 17882,
"up_from": 18349,
"up_thru": 18349,
"down_at": 18415,
"lost_at": 0,
"public_addr": "10.20.0.11:6854\/2257155",
"cluster_addr": "10.20.0.11:6855\/2257155",
"heartbeat_back_addr": "10.20.0.11:6857\/2257155",
"heartbeat_front_addr": "10.20.0.11:6859\/2257155",
"state": [
"autoout",
"exists"
]
},
{
"osd": 7,
"uuid": "e98e9b8a-9c62-4e3e-bdb4-c2c30103c0c1",
"up": 0,
"in": 1,
"weight": 1.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15730,
"last_clean_end": 17883,
"up_from": 18345,
"up_thru": 18345,
"down_at": 18419,
"lost_at": 0,
"public_addr": "10.20.0.11:6873\/2258645",
"cluster_addr": "10.20.0.11:6874\/2258645",
"heartbeat_back_addr": "10.20.0.11:6875\/2258645",
"heartbeat_front_addr": "10.20.0.11:6876\/2258645",
"state": [
"exists"
]
},
{
"osd": 8,
"uuid": "41e471cd-fafe-4422-8bf5-22018bbe1375",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 17795,
"last_clean_end": 17882,
"up_from": 18346,
"up_thru": 18346,
"down_at": 18412,
"lost_at": 0,
"public_addr": "10.20.0.11:6877\/2258943",
"cluster_addr": "10.20.0.11:6878\/2258943",
"heartbeat_back_addr": "10.20.0.11:6879\/2258943",
"heartbeat_front_addr": "10.20.0.11:6880\/2258943",
"state": [
"autoout",
"exists"
]
},
{
"osd": 9,
"uuid": "d68eeebd-d058-4b1c-a30a-994bf8fc8030",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15733,
"last_clean_end": 17882,
"up_from": 18347,
"up_thru": 18347,
"down_at": 18410,
"lost_at": 0,
"public_addr": "10.20.0.11:6849\/2258152",
"cluster_addr": "10.20.0.11:6850\/2258152",
"heartbeat_back_addr": "10.20.0.11:6851\/2258152",
"heartbeat_front_addr": "10.20.0.11:6852\/2258152",
"state": [
"autoout",
"exists"
]
},
{
"osd": 10,
"uuid": "660747d6-3f47-449a-bc69-5399b0d54ff6",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15736,
"last_clean_end": 17883,
"up_from": 18345,
"up_thru": 18346,
"down_at": 18403,
"lost_at": 0,
"public_addr": "10.20.0.11:6841\/2256646",
"cluster_addr": "10.20.0.11:6842\/2256646",
"heartbeat_back_addr": "10.20.0.11:6843\/2256646",
"heartbeat_front_addr": "10.20.0.11:6844\/2256646",
"state": [
"autoout",
"exists"
]
},
{
"osd": 11,
"uuid": "805965b1-127f-44a6-9a05-8a643eb7a512",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 17801,
"last_clean_end": 17882,
"up_from": 18349,
"up_thru": 18351,
"down_at": 18439,
"lost_at": 0,
"public_addr": "10.20.0.11:6801\/2257816",
"cluster_addr": "10.20.0.11:6802\/2257816",
"heartbeat_back_addr": "10.20.0.11:6803\/2257816",
"heartbeat_front_addr": "10.20.0.11:6804\/2257816",
"state": [
"autoout",
"exists"
]
},
{
"osd": 12,
"uuid": "61fbfcbe-d642-478f-9620-f9d72ee96238",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15783,
"last_clean_end": 17882,
"up_from": 18162,
"up_thru": 18162,
"down_at": 18208,
"lost_at": 0,
"public_addr": "10.20.0.12:6833\/3261949",
"cluster_addr": "10.20.0.12:6834\/3261949",
"heartbeat_back_addr": "10.20.0.12:6835\/3261949",
"heartbeat_front_addr": "10.20.0.12:6836\/3261949",
"state": [
"autoout",
"exists"
]
},
{
"osd": 13,
"uuid": "6faad33b-00be-42a4-92ba-08be5ab7f995",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15783,
"last_clean_end": 17882,
"up_from": 18164,
"up_thru": 18166,
"down_at": 18206,
"lost_at": 0,
"public_addr": "10.20.0.12:6885\/3262416",
"cluster_addr": "10.20.0.12:6886\/3262416",
"heartbeat_back_addr": "10.20.0.12:6887\/3262416",
"heartbeat_front_addr": "10.20.0.12:6888\/3262416",
"state": [
"autoout",
"exists"
]
},
{
"osd": 14,
"uuid": "f301705b-e725-443d-96e1-d9ec9aafe657",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15783,
"last_clean_end": 17883,
"up_from": 18164,
"up_thru": 18164,
"down_at": 18352,
"lost_at": 0,
"public_addr": "10.20.0.12:6861\/3261624",
"cluster_addr": "10.20.0.12:6862\/3261624",
"heartbeat_back_addr": "10.20.0.12:6863\/3261624",
"heartbeat_front_addr": "10.20.0.12:6864\/3261624",
"state": [
"autoout",
"exists"
]
},
{
"osd": 15,
"uuid": "536bf483-10de-44b0-8e1e-4f349fbe572a",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15785,
"last_clean_end": 17882,
"up_from": 18168,
"up_thru": 18168,
"down_at": 18352,
"lost_at": 0,
"public_addr": "10.20.0.12:6805\/3262650",
"cluster_addr": "10.20.0.12:6806\/3262650",
"heartbeat_back_addr": "10.20.0.12:6807\/3262650",
"heartbeat_front_addr": "10.20.0.12:6808\/3262650",
"state": [
"autoout",
"exists"
]
},
{
"osd": 16,
"uuid": "4185bd20-8eb0-4616-b36e-bacb181ae40e",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15783,
"last_clean_end": 17882,
"up_from": 18164,
"up_thru": 18164,
"down_at": 18352,
"lost_at": 0,
"public_addr": "10.20.0.12:6849\/3261188",
"cluster_addr": "10.20.0.12:6850\/3261188",
"heartbeat_back_addr": "10.20.0.12:6851\/3261188",
"heartbeat_front_addr": "10.20.0.12:6852\/3261188",
"state": [
"autoout",
"exists"
]
},
{
"osd": 17,
"uuid": "a6f2f5b4-477f-48f9-9acf-d5b7a6c88b98",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15783,
"last_clean_end": 17882,
"up_from": 18164,
"up_thru": 18166,
"down_at": 18352,
"lost_at": 0,
"public_addr": "10.20.0.12:6857\/3261610",
"cluster_addr": "10.20.0.12:6858\/3261610",
"heartbeat_back_addr": "10.20.0.12:6859\/3261610",
"heartbeat_front_addr": "10.20.0.12:6860\/3261610",
"state": [
"autoout",
"exists"
]
},
{
"osd": 18,
"uuid": "b31b0bd8-938a-496d-91bc-19bf4f794f82",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15783,
"last_clean_end": 17883,
"up_from": 18164,
"up_thru": 18166,
"down_at": 18352,
"lost_at": 0,
"public_addr": "10.20.0.12:6869\/3261788",
"cluster_addr": "10.20.0.12:6870\/3261788",
"heartbeat_back_addr": "10.20.0.12:6871\/3261788",
"heartbeat_front_addr": "10.20.0.12:6872\/3261788",
"state": [
"autoout",
"exists"
]
},
{
"osd": 19,
"uuid": "d76b6bd5-1ef3-436c-a75d-3587c515eb56",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15783,
"last_clean_end": 17882,
"up_from": 18150,
"up_thru": 18150,
"down_at": 18203,
"lost_at": 0,
"public_addr": "10.20.0.12:6865\/3261778",
"cluster_addr": "10.20.0.12:6866\/3261778",
"heartbeat_back_addr": "10.20.0.12:6867\/3261778",
"heartbeat_front_addr": "10.20.0.12:6868\/3261778",
"state": [
"autoout",
"exists"
]
},
{
"osd": 20,
"uuid": "8e4dd982-a4c5-4ca5-9fc5-243f55c4db57",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15783,
"last_clean_end": 17882,
"up_from": 18151,
"up_thru": 18151,
"down_at": 18239,
"lost_at": 0,
"public_addr": "10.20.0.12:6881\/3262190",
"cluster_addr": "10.20.0.12:6882\/3262190",
"heartbeat_back_addr": "10.20.0.12:6883\/3262190",
"heartbeat_front_addr": "10.20.0.12:6884\/3262190",
"state": [
"autoout",
"exists"
]
},
{
"osd": 21,
"uuid": "760aaf28-0a34-4bbc-af0c-2654b0a43fff",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15783,
"last_clean_end": 17882,
"up_from": 18150,
"up_thru": 18150,
"down_at": 18203,
"lost_at": 0,
"public_addr": "10.20.0.12:6845\/3261106",
"cluster_addr": "10.20.0.12:6846\/3261106",
"heartbeat_back_addr": "10.20.0.12:6847\/3261106",
"heartbeat_front_addr": "10.20.0.12:6848\/3261106",
"state": [
"autoout",
"exists"
]
},
{
"osd": 22,
"uuid": "40322a34-ab31-4760-b71e-a7672f812cb3",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15783,
"last_clean_end": 17882,
"up_from": 18161,
"up_thru": 18161,
"down_at": 18352,
"lost_at": 0,
"public_addr": "10.20.0.12:6853\/3261379",
"cluster_addr": "10.20.0.12:6854\/3261379",
"heartbeat_back_addr": "10.20.0.12:6855\/3261379",
"heartbeat_front_addr": "10.20.0.12:6856\/3261379",
"state": [
"autoout",
"exists"
]
},
{
"osd": 23,
"uuid": "e1d81949-f4b5-4cf2-b6af-dccaaeb30ed7",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15783,
"last_clean_end": 17882,
"up_from": 18165,
"up_thru": 18166,
"down_at": 18352,
"lost_at": 0,
"public_addr": "10.20.0.12:6873\/3262047",
"cluster_addr": "10.20.0.12:6874\/3262047",
"heartbeat_back_addr": "10.20.0.12:6875\/3262047",
"heartbeat_front_addr": "10.20.0.12:6876\/3262047",
"state": [
"autoout",
"exists"
]
},
{
"osd": 24,
"uuid": "ede77283-a423-4c6b-9c6e-b0e807c63cb5",
"up": 1,
"in": 1,
"weight": 1.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 18520,
"last_clean_end": 18582,
"up_from": 18589,
"up_thru": 18592,
"down_at": 18588,
"lost_at": 0,
"public_addr": "10.20.0.13:6801\/3842583",
"cluster_addr": "10.20.0.13:6839\/3842583",
"heartbeat_back_addr": "10.20.0.13:6840\/3842583",
"heartbeat_front_addr": "10.20.0.13:6841\/3842583",
"state": [
"exists",
"up"
]
},
{
"osd": 25,
"uuid": "7cfe85f8-3ae9-493d-9801-025ff6c6265d",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15686,
"last_clean_end": 17883,
"up_from": 18426,
"up_thru": 18426,
"down_at": 18518,
"lost_at": 0,
"public_addr": "10.20.0.13:6829\/3788954",
"cluster_addr": "10.20.0.13:6830\/3788954",
"heartbeat_back_addr": "10.20.0.13:6831\/3788954",
"heartbeat_front_addr": "10.20.0.13:6832\/3788954",
"state": [
"autoout",
"exists"
]
},
{
"osd": 26,
"uuid": "266f6d70-519f-4c24-bca2-236495a600a7",
"up": 0,
"in": 1,
"weight": 1.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15692,
"last_clean_end": 17883,
"up_from": 18420,
"up_thru": 18421,
"down_at": 18542,
"lost_at": 0,
"public_addr": "10.20.0.13:6873\/3788357",
"cluster_addr": "10.20.0.13:6874\/3788357",
"heartbeat_back_addr": "10.20.0.13:6875\/3788357",
"heartbeat_front_addr": "10.20.0.13:6876\/3788357",
"state": [
"exists"
]
},
{
"osd": 27,
"uuid": "68644fa9-9459-4db0-a6c9-01661645038b",
"up": 0,
"in": 1,
"weight": 1.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15689,
"last_clean_end": 17883,
"up_from": 18420,
"up_thru": 18421,
"down_at": 18527,
"lost_at": 0,
"public_addr": "10.20.0.13:6813\/3788083",
"cluster_addr": "10.20.0.13:6814\/3788083",
"heartbeat_back_addr": "10.20.0.13:6815\/3788083",
"heartbeat_front_addr": "10.20.0.13:6816\/3788083",
"state": [
"exists"
]
},
{
"osd": 28,
"uuid": "fc3d5749-7673-4100-a0d4-f25e9cc0bc88",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15688,
"last_clean_end": 17882,
"up_from": 18424,
"up_thru": 18424,
"down_at": 18518,
"lost_at": 0,
"public_addr": "10.20.0.13:6825\/3789248",
"cluster_addr": "10.20.0.13:6826\/3789248",
"heartbeat_back_addr": "10.20.0.13:6827\/3789248",
"heartbeat_front_addr": "10.20.0.13:6828\/3789248",
"state": [
"autoout",
"exists"
]
},
{
"osd": 29,
"uuid": "cb5feda9-de3f-4e42-bb73-7945b4928b22",
"up": 1,
"in": 1,
"weight": 1.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 18511,
"last_clean_end": 18534,
"up_from": 18544,
"up_thru": 18571,
"down_at": 18543,
"lost_at": 0,
"public_addr": "10.20.0.13:6817\/3815548",
"cluster_addr": "10.20.0.13:6868\/3815548",
"heartbeat_back_addr": "10.20.0.13:6869\/3815548",
"heartbeat_front_addr": "10.20.0.13:6870\/3815548",
"state": [
"exists",
"up"
]
},
{
"osd": 30,
"uuid": "ef1e65bb-a634-4096-9466-1262af55db01",
"up": 1,
"in": 1,
"weight": 1.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15693,
"last_clean_end": 17882,
"up_from": 18437,
"up_thru": 18585,
"down_at": 17884,
"lost_at": 0,
"public_addr": "10.20.0.13:6833\/3787367",
"cluster_addr": "10.20.0.13:6834\/3787367",
"heartbeat_back_addr": "10.20.0.13:6835\/3787367",
"heartbeat_front_addr": "10.20.0.13:6836\/3787367",
"state": [
"exists",
"up"
]
},
{
"osd": 31,
"uuid": "3dad6393-67a8-43d4-ba8d-ffd320827396",
"up": 1,
"in": 1,
"weight": 1.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 18534,
"last_clean_end": 18551,
"up_from": 18562,
"up_thru": 18581,
"down_at": 18561,
"lost_at": 0,
"public_addr": "10.20.0.13:6842\/3819894",
"cluster_addr": "10.20.0.13:6864\/3819894",
"heartbeat_back_addr": "10.20.0.13:6865\/3819894",
"heartbeat_front_addr": "10.20.0.13:6871\/3819894",
"state": [
"exists",
"up"
]
},
{
"osd": 32,
"uuid": "db6f3afa-53ed-453a-97e3-861e88cb818f",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15684,
"last_clean_end": 17882,
"up_from": 18419,
"up_thru": 18420,
"down_at": 18523,
"lost_at": 0,
"public_addr": "10.20.0.13:6809\/3786362",
"cluster_addr": "10.20.0.13:6810\/3786362",
"heartbeat_back_addr": "10.20.0.13:6811\/3786362",
"heartbeat_front_addr": "10.20.0.13:6812\/3786362",
"state": [
"autoout",
"exists"
]
},
{
"osd": 33,
"uuid": "d5e59852-06b4-4a30-8c5e-ff7e328b5455",
"up": 1,
"in": 1,
"weight": 1.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 18508,
"last_clean_end": 18534,
"up_from": 18551,
"up_thru": 18577,
"down_at": 18550,
"lost_at": 0,
"public_addr": "10.20.0.13:6809\/3817103",
"cluster_addr": "10.20.0.13:6810\/3817103",
"heartbeat_back_addr": "10.20.0.13:6811\/3817103",
"heartbeat_front_addr": "10.20.0.13:6812\/3817103",
"state": [
"exists",
"up"
]
},
{
"osd": 34,
"uuid": "f35a10c5-217a-4cfb-88b9-7334bda441b8",
"up": 1,
"in": 1,
"weight": 1.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 18521,
"last_clean_end": 18572,
"up_from": 18592,
"up_thru": 18592,
"down_at": 18591,
"lost_at": 0,
"public_addr": "10.20.0.13:6805\/3842840",
"cluster_addr": "10.20.0.13:6819\/3842840",
"heartbeat_back_addr": "10.20.0.13:6820\/3842840",
"heartbeat_front_addr": "10.20.0.13:6821\/3842840",
"state": [
"exists",
"up"
]
},
{
"osd": 35,
"uuid": "335e797f-a390-4f08-9da6-9ab76ffb12ae",
"up": 0,
"in": 0,
"weight": 0.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 15687,
"last_clean_end": 17882,
"up_from": 18424,
"up_thru": 18424,
"down_at": 18498,
"lost_at": 0,
"public_addr": "10.20.0.13:6861\/3787537",
"cluster_addr": "10.20.0.13:6862\/3787537",
"heartbeat_back_addr": "10.20.0.13:6863\/3787537",
"heartbeat_front_addr": "10.20.0.13:6864\/3787537",
"state": [
"autoout",
"exists"
]
},
{
"osd": 36,
"uuid": "33c11fa1-1b03-42e4-8296-dc55ba052b35",
"up": 1,
"in": 1,
"weight": 1.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 18599,
"last_clean_end": 18600,
"up_from": 18606,
"up_thru": 18609,
"down_at": 18605,
"lost_at": 0,
"public_addr": "10.20.0.12:6829\/3479135",
"cluster_addr": "10.20.0.12:6830\/3479135",
"heartbeat_back_addr": "10.20.0.12:6831\/3479135",
"heartbeat_front_addr": "10.20.0.12:6832\/3479135",
"state": [
"exists",
"up"
]
},
{
"osd": 37,
"uuid": "a97a791a-fe36-438b-80e2-db2a0d5e8e27",
"up": 1,
"in": 1,
"weight": 1.000000,
"primary_affinity": 1.000000,
"last_clean_begin": 18596,
"last_clean_end": 18600,
"up_from": 18609,
"up_thru": 18609,
"down_at": 18608,
"lost_at": 0,
"public_addr": "10.20.0.12:6889\/3481637",
"cluster_addr": "10.20.0.12:6890\/3481637",
"heartbeat_back_addr": "10.20.0.12:6891\/3481637",
"heartbeat_front_addr": "10.20.0.12:6894\/3481637",
"state": [
"exists",
"up"
]
}
],
"osd_xinfo": [
{
"osd": 0,
"down_stamp": "2015-04-29 06:36:11.510911",
"laggy_probability": 0.648970,
"laggy_interval": 32,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 1,
"down_stamp": "2015-04-29 06:35:39.342646",
"laggy_probability": 0.627290,
"laggy_interval": 30,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 2,
"down_stamp": "2015-04-29 06:36:11.510911",
"laggy_probability": 0.617737,
"laggy_interval": 47,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 3,
"down_stamp": "2015-04-29 06:36:06.479824",
"laggy_probability": 0.660475,
"laggy_interval": 28,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 4,
"down_stamp": "2015-04-29 06:36:11.510911",
"laggy_probability": 0.642416,
"laggy_interval": 39,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 5,
"down_stamp": "2015-04-29 06:35:39.342646",
"laggy_probability": 0.617737,
"laggy_interval": 10,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 6,
"down_stamp": "2015-04-29 06:36:11.510911",
"laggy_probability": 0.642416,
"laggy_interval": 41,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 7,
"down_stamp": "2015-04-29 06:38:05.135599",
"laggy_probability": 0.642416,
"laggy_interval": 66,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 8,
"down_stamp": "2015-04-29 06:36:06.479824",
"laggy_probability": 0.449691,
"laggy_interval": 40,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 9,
"down_stamp": "2015-04-29 06:35:59.293041",
"laggy_probability": 0.643535,
"laggy_interval": 16,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 10,
"down_stamp": "2015-04-29 06:35:39.342646",
"laggy_probability": 0.616699,
"laggy_interval": 48,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 11,
"down_stamp": "2015-04-29 06:38:34.318677",
"laggy_probability": 0.422864,
"laggy_interval": 22,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 12,
"down_stamp": "2015-04-29 06:30:10.761975",
"laggy_probability": 0.594721,
"laggy_interval": 41,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 13,
"down_stamp": "2015-04-29 06:30:08.803695",
"laggy_probability": 0.601756,
"laggy_interval": 29,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 14,
"down_stamp": "2015-04-29 06:34:32.372745",
"laggy_probability": 0.663821,
"laggy_interval": 21,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 15,
"down_stamp": "2015-04-29 06:34:32.372745",
"laggy_probability": 0.661855,
"laggy_interval": 18,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 16,
"down_stamp": "2015-04-29 06:34:32.372745",
"laggy_probability": 0.663889,
"laggy_interval": 13,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 17,
"down_stamp": "2015-04-29 06:34:32.372745",
"laggy_probability": 0.541368,
"laggy_interval": 50,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 18,
"down_stamp": "2015-04-29 06:34:32.372745",
"laggy_probability": 0.622311,
"laggy_interval": 35,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 19,
"down_stamp": "2015-04-29 06:30:02.919322",
"laggy_probability": 0.651860,
"laggy_interval": 20,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 20,
"down_stamp": "2015-04-29 06:30:45.855010",
"laggy_probability": 0.626463,
"laggy_interval": 30,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 21,
"down_stamp": "2015-04-29 06:30:02.919322",
"laggy_probability": 0.653627,
"laggy_interval": 9,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 22,
"down_stamp": "2015-04-29 06:34:32.372745",
"laggy_probability": 0.666169,
"laggy_interval": 12,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 23,
"down_stamp": "2015-04-29 06:34:32.372745",
"laggy_probability": 0.594888,
"laggy_interval": 45,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 24,
"down_stamp": "2015-04-29 06:45:16.246255",
"laggy_probability": 0.193668,
"laggy_interval": 10,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 25,
"down_stamp": "2015-04-29 06:40:01.722875",
"laggy_probability": 0.567685,
"laggy_interval": 36,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 26,
"down_stamp": "2015-04-29 06:40:42.614902",
"laggy_probability": 0.601077,
"laggy_interval": 92,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 27,
"down_stamp": "2015-04-29 06:40:14.223004",
"laggy_probability": 0.557502,
"laggy_interval": 49,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 28,
"down_stamp": "2015-04-29 06:40:01.722875",
"laggy_probability": 0.635835,
"laggy_interval": 27,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 29,
"down_stamp": "2015-04-29 06:40:43.818245",
"laggy_probability": 0.251127,
"laggy_interval": 17,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 30,
"down_stamp": "2015-04-26 14:21:27.940755",
"laggy_probability": 0.606626,
"laggy_interval": 30,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 31,
"down_stamp": "2015-04-29 06:41:16.132199",
"laggy_probability": 0.145557,
"laggy_interval": 7,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 32,
"down_stamp": "2015-04-29 06:40:06.732853",
"laggy_probability": 0.568801,
"laggy_interval": 37,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 33,
"down_stamp": "2015-04-29 06:40:52.364979",
"laggy_probability": 0.273623,
"laggy_interval": 21,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 34,
"down_stamp": "2015-04-29 06:45:19.569449",
"laggy_probability": 0.233592,
"laggy_interval": 36,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 35,
"down_stamp": "2015-04-29 06:39:41.678784",
"laggy_probability": 0.492127,
"laggy_interval": 32,
"features": 1125899906842623,
"old_weight": 65536
},
{
"osd": 36,
"down_stamp": "2015-04-29 06:49:26.582575",
"laggy_probability": 0.048084,
"laggy_interval": 0,
"features": 1125899906842623,
"old_weight": 0
},
{
"osd": 37,
"down_stamp": "2015-04-29 06:49:30.662891",
"laggy_probability": 0.140542,
"laggy_interval": 7,
"features": 1125899906842623,
"old_weight": 65536
}
],
"pg_temp": [
{
"pgid": "0.31",
"osds": [
24,
37
]
},
{
"pgid": "0.ac",
"osds": [
33,
37
]
},
{
"pgid": "0.d4",
"osds": [
24,
36
]
},
{
"pgid": "0.169",
"osds": [
24,
37
]
},
{
"pgid": "0.1b8",
"osds": [
31,
37
]
},
{
"pgid": "0.1d2",
"osds": [
31,
37
]
},
{
"pgid": "0.1f0",
"osds": [
31,
36
]
},
{
"pgid": "0.600",
"osds": [
37,
17,
36
]
},
{
"pgid": "0.855",
"osds": [
29,
37
]
},
{
"pgid": "0.87a",
"osds": [
29,
36
]
},
{
"pgid": "0.8d8",
"osds": [
31,
37
]
},
{
"pgid": "0.97a",
"osds": [
37,
22,
36
]
},
{
"pgid": "0.a12",
"osds": [
36,
28,
37
]
},
{
"pgid": "0.a1c",
"osds": [
37,
14,
36
]
},
{
"pgid": "0.ad4",
"osds": [
34,
36
]
},
{
"pgid": "0.aef",
"osds": [
30,
36
]
},
{
"pgid": "0.b30",
"osds": [
29,
36
]
},
{
"pgid": "0.b7f",
"osds": [
37,
15,
36
]
},
{
"pgid": "0.ba5",
"osds": [
30,
36
]
},
{
"pgid": "0.bc3",
"osds": [
34,
36
]
},
{
"pgid": "0.d9c",
"osds": [
37,
9,
36
]
},
{
"pgid": "0.e16",
"osds": [
33,
37
]
},
{
"pgid": "0.e71",
"osds": [
33,
36
]
},
{
"pgid": "0.eab",
"osds": [
24,
36
]
},
{
"pgid": "0.ef9",
"osds": [
34,
36
]
},
{
"pgid": "0.f09",
"osds": [
36,
35,
37
]
},
{
"pgid": "0.f32",
"osds": [
37,
26,
36
]
},
{
"pgid": "0.f37",
"osds": [
37,
18,
36
]
},
{
"pgid": "0.fc0",
"osds": [
34,
36
]
},
{
"pgid": "0.fd4",
"osds": [
31,
36
]
},
{
"pgid": "0.fdf",
"osds": [
30,
36
]
},
{
"pgid": "1.31",
"osds": [
24,
37
]
},
{
"pgid": "1.39",
"osds": [
33,
37
]
},
{
"pgid": "1.4c",
"osds": [
30,
37
]
},
{
"pgid": "1.68",
"osds": [
31,
36
]
},
{
"pgid": "1.6b",
"osds": [
37,
15,
36
]
},
{
"pgid": "1.9f",
"osds": [
34,
36
]
},
{
"pgid": "1.dd",
"osds": [
33,
37
]
},
{
"pgid": "1.174",
"osds": [
37,
23,
36
]
},
{
"pgid": "1.178",
"osds": [
30,
36
]
},
{
"pgid": "1.1c2",
"osds": [
30,
37
]
},
{
"pgid": "1.1f8",
"osds": [
29,
36
]
},
{
"pgid": "1.1fc",
"osds": [
37,
15,
36
]
},
{
"pgid": "1.40a",
"osds": [
37,
17,
36
]
},
{
"pgid": "1.4b3",
"osds": [
33,
37
]
},
{
"pgid": "1.53f",
"osds": [
37,
16,
36
]
},
{
"pgid": "1.5cc",
"osds": [
37,
16,
36
]
},
{
"pgid": "1.82b",
"osds": [
36,
25,
37
]
},
{
"pgid": "1.90d",
"osds": [
37,
15,
36
]
},
{
"pgid": "1.9ec",
"osds": [
36,
32,
37
]
},
{
"pgid": "1.9ff",
"osds": [
34,
37
]
},
{
"pgid": "1.a6d",
"osds": [
24,
36
]
},
{
"pgid": "1.b76",
"osds": [
33,
36
]
},
{
"pgid": "1.b8a",
"osds": [
29,
36
]
},
{
"pgid": "1.c7a",
"osds": [
31,
36
]
},
{
"pgid": "1.cb9",
"osds": [
33,
36
]
},
{
"pgid": "1.ced",
"osds": [
31,
36
]
},
{
"pgid": "1.d05",
"osds": [
24,
37
]
},
{
"pgid": "1.d30",
"osds": [
31,
36
]
},
{
"pgid": "1.d7a",
"osds": [
31,
36
]
},
{
"pgid": "1.ddf",
"osds": [
34,
36
]
},
{
"pgid": "1.e0f",
"osds": [
33,
37
]
},
{
"pgid": "1.e4f",
"osds": [
29,
37
]
},
{
"pgid": "1.e97",
"osds": [
33,
36
]
},
{
"pgid": "1.efd",
"osds": [
30,
36
]
},
{
"pgid": "1.f2c",
"osds": [
37,
22,
36
]
},
{
"pgid": "1.f3d",
"osds": [
30,
37
]
},
{
"pgid": "1.f4b",
"osds": [
34,
31,
36
]
},
{
"pgid": "1.f9a",
"osds": [
30,
36
]
},
{
"pgid": "1.fca",
"osds": [
30,
37
]
},
{
"pgid": "2.76",
"osds": [
31,
37
]
},
{
"pgid": "2.c4",
"osds": [
34,
37
]
},
{
"pgid": "2.150",
"osds": [
33,
24
]
},
{
"pgid": "2.159",
"osds": [
31,
37
]
},
{
"pgid": "2.1b4",
"osds": [
34,
37
]
},
{
"pgid": "2.1cf",
"osds": [
29,
24
]
},
{
"pgid": "2.1fa",
"osds": [
30,
36
]
},
{
"pgid": "2.545",
"osds": [
33,
24
]
},
{
"pgid": "2.7e4",
"osds": [
31,
24
]
},
{
"pgid": "2.ab7",
"osds": [
29,
24
]
},
{
"pgid": "2.d25",
"osds": [
34,
36
]
},
{
"pgid": "2.dbd",
"osds": [
36,
24
]
},
{
"pgid": "2.e69",
"osds": [
34,
24
]
},
{
"pgid": "2.e8d",
"osds": [
31,
24
]
},
{
"pgid": "2.ef9",
"osds": [
33,
36
]
},
{
"pgid": "2.f50",
"osds": [
31,
37
]
},
{
"pgid": "2.f5f",
"osds": [
30,
24
]
},
{
"pgid": "2.f9b",
"osds": [
30,
24
]
},
{
"pgid": "2.fea",
"osds": [
31,
37
]
},
{
"pgid": "3.64",
"osds": [
37,
18,
36
]
},
{
"pgid": "3.c6",
"osds": [
31,
36
]
},
{
"pgid": "3.f8",
"osds": [
24,
36
]
},
{
"pgid": "3.194",
"osds": [
24,
37
]
},
{
"pgid": "3.1a9",
"osds": [
36,
27,
37
]
},
{
"pgid": "3.686",
"osds": [
37,
16,
36
]
},
{
"pgid": "3.98f",
"osds": [
30,
36
]
},
{
"pgid": "3.a88",
"osds": [
37,
17,
36
]
},
{
"pgid": "3.acb",
"osds": [
37,
15,
36
]
},
{
"pgid": "3.ae0",
"osds": [
29,
36
]
},
{
"pgid": "3.b74",
"osds": [
37,
18,
36
]
},
{
"pgid": "3.c0f",
"osds": [
37,
14,
36
]
},
{
"pgid": "3.c50",
"osds": [
30,
36
]
},
{
"pgid": "3.c65",
"osds": [
37,
9,
36
]
},
{
"pgid": "3.d05",
"osds": [
31,
36
]
},
{
"pgid": "3.d8f",
"osds": [
0,
37,
36
]
},
{
"pgid": "3.de5",
"osds": [
29,
36
]
},
{
"pgid": "3.edd",
"osds": [
37,
1,
36
]
},
{
"pgid": "3.ef5",
"osds": [
34,
31,
36
]
},
{
"pgid": "3.ef6",
"osds": [
30,
36
]
},
{
"pgid": "3.ef7",
"osds": [
29,
36
]
},
{
"pgid": "3.f01",
"osds": [
37,
26,
36
]
},
{
"pgid": "3.f34",
"osds": [
30,
37
]
},
{
"pgid": "3.f35",
"osds": [
30,
36
]
},
{
"pgid": "3.f47",
"osds": [
31,
36
]
},
{
"pgid": "3.f8f",
"osds": [
33,
37
]
},
{
"pgid": "3.fb6",
"osds": [
33,
36
]
},
{
"pgid": "3.fdb",
"osds": [
34,
36
]
},
{
"pgid": "4.5",
"osds": [
31,
36
]
},
{
"pgid": "4.34",
"osds": [
30,
37
]
},
{
"pgid": "4.3f",
"osds": [
29,
37
]
},
{
"pgid": "4.84",
"osds": [
34,
37
]
},
{
"pgid": "4.93",
"osds": [
37,
32,
36
]
},
{
"pgid": "4.156",
"osds": [
31,
36
]
},
{
"pgid": "4.165",
"osds": [
29,
36
]
},
{
"pgid": "4.17b",
"osds": [
30,
36
]
},
{
"pgid": "4.17d",
"osds": [
24,
37
]
},
{
"pgid": "4.17e",
"osds": [
30,
37
]
},
{
"pgid": "4.182",
"osds": [
29,
37
]
},
{
"pgid": "4.194",
"osds": [
37,
26,
36
]
},
{
"pgid": "4.1a3",
"osds": [
29,
37
]
},
{
"pgid": "4.1aa",
"osds": [
37,
18,
36
]
},
{
"pgid": "4.1c1",
"osds": [
34,
36
]
},
{
"pgid": "4.1c2",
"osds": [
31,
37
]
},
{
"pgid": "4.1d6",
"osds": [
37,
15,
36
]
},
{
"pgid": "4.649",
"osds": [
37,
7,
36
]
},
{
"pgid": "4.703",
"osds": [
36,
32,
37
]
},
{
"pgid": "4.73d",
"osds": [
37,
17,
36
]
},
{
"pgid": "4.787",
"osds": [
29,
36
]
},
{
"pgid": "4.90e",
"osds": [
29,
36
]
},
{
"pgid": "4.a5a",
"osds": [
29,
36
]
},
{
"pgid": "4.ab2",
"osds": [
32,
37,
36
]
},
{
"pgid": "4.ab3",
"osds": [
0,
37,
36
]
},
{
"pgid": "4.ae8",
"osds": [
34,
36
]
},
{
"pgid": "4.bc7",
"osds": [
24,
36
]
},
{
"pgid": "4.c04",
"osds": [
33,
24,
37
]
},
{
"pgid": "4.c10",
"osds": [
31,
36
]
},
{
"pgid": "4.c33",
"osds": [
37,
15,
36
]
},
{
"pgid": "4.c46",
"osds": [
34,
36
]
},
{
"pgid": "4.d1b",
"osds": [
9,
36,
37
]
},
{
"pgid": "4.d66",
"osds": [
37,
23,
36
]
},
{
"pgid": "4.d73",
"osds": [
9,
37,
36
]
},
{
"pgid": "4.dc4",
"osds": [
37,
17,
36
]
},
{
"pgid": "4.e1a",
"osds": [
24,
36
]
},
{
"pgid": "4.e3c",
"osds": [
34,
36
]
},
{
"pgid": "4.e60",
"osds": [
33,
36
]
},
{
"pgid": "4.e80",
"osds": [
37,
8,
36
]
},
{
"pgid": "4.e92",
"osds": [
24,
37
]
},
{
"pgid": "4.eb6",
"osds": [
34,
36
]
},
{
"pgid": "4.f08",
"osds": [
37,
34
]
},
{
"pgid": "4.f2e",
"osds": [
33,
37
]
},
{
"pgid": "4.f44",
"osds": [
37,
2,
36
]
},
{
"pgid": "4.f46",
"osds": [
29,
37
]
},
{
"pgid": "4.f6f",
"osds": [
29,
36
]
},
{
"pgid": "4.fbc",
"osds": [
29,
37
]
},
{
"pgid": "4.ff1",
"osds": [
37,
14,
36
]
},
{
"pgid": "4.ff6",
"osds": [
29,
36
]
},
{
"pgid": "4.ffc",
"osds": [
29,
37
]
},
{
"pgid": "6.62",
"osds": [
31,
36
]
},
{
"pgid": "6.90",
"osds": [
29,
36
]
},
{
"pgid": "6.191",
"osds": [
31,
36
]
},
{
"pgid": "6.2f5",
"osds": [
37,
22,
36
]
},
{
"pgid": "6.6b8",
"osds": [
37,
11,
36
]
},
{
"pgid": "6.6d1",
"osds": [
0,
37,
36
]
},
{
"pgid": "6.809",
"osds": [
37,
7,
36
]
},
{
"pgid": "6.968",
"osds": [
33,
36
]
},
{
"pgid": "6.996",
"osds": [
37,
23,
36
]
},
{
"pgid": "6.99e",
"osds": [
37,
17,
36
]
},
{
"pgid": "6.a2a",
"osds": [
37,
14,
36
]
},
{
"pgid": "6.a35",
"osds": [
34,
36
]
},
{
"pgid": "6.aa5",
"osds": [
37,
15,
36
]
},
{
"pgid": "6.aef",
"osds": [
29,
36
]
},
{
"pgid": "6.b3b",
"osds": [
29,
36
]
},
{
"pgid": "6.b41",
"osds": [
2,
37,
36
]
},
{
"pgid": "6.bdc",
"osds": [
29,
36
]
},
{
"pgid": "6.c6b",
"osds": [
27,
37,
36
]
},
{
"pgid": "6.cb1",
"osds": [
31,
36
]
},
{
"pgid": "6.cbb",
"osds": [
24,
36
]
},
{
"pgid": "6.dbd",
"osds": [
37,
27,
36
]
},
{
"pgid": "6.e9f",
"osds": [
37,
18,
36
]
},
{
"pgid": "6.ec5",
"osds": [
29,
36
]
},
{
"pgid": "6.f26",
"osds": [
33,
37
]
},
{
"pgid": "6.f7b",
"osds": [
34,
36
]
},
{
"pgid": "6.f8b",
"osds": [
34,
36
]
},
{
"pgid": "6.fda",
"osds": [
33,
36
]
},
{
"pgid": "6.fdc",
"osds": [
33,
36
]
},
{
"pgid": "6.fe0",
"osds": [
30,
36
]
},
{
"pgid": "7.3b",
"osds": [
24,
37
]
},
{
"pgid": "7.52",
"osds": [
31,
36
]
},
{
"pgid": "7.7e",
"osds": [
30
]
},
{
"pgid": "7.87",
"osds": [
31,
37
]
},
{
"pgid": "7.a5",
"osds": [
31,
37
]
},
{
"pgid": "7.ea",
"osds": [
37,
23,
36
]
},
{
"pgid": "7.161",
"osds": [
34,
36
]
},
{
"pgid": "7.163",
"osds": [
31,
37
]
},
{
"pgid": "7.1d4",
"osds": [
29,
36
]
},
{
"pgid": "7.1da",
"osds": [
33,
36
]
},
{
"pgid": "7.1dd",
"osds": [
30,
36
]
},
{
"pgid": "7.1f0",
"osds": [
34,
36
]
},
{
"pgid": "7.1fd",
"osds": [
34,
37,
24
]
},
{
"pgid": "7.374",
"osds": [
37,
22,
36
]
},
{
"pgid": "7.5ea",
"osds": [
37,
23,
36
]
},
{
"pgid": "7.7f4",
"osds": [
33,
36
]
},
{
"pgid": "7.a31",
"osds": [
37,
22,
36
]
},
{
"pgid": "7.a93",
"osds": [
24,
36
]
},
{
"pgid": "7.b2b",
"osds": [
30,
37
]
},
{
"pgid": "7.c34",
"osds": [
34,
36
]
},
{
"pgid": "7.c50",
"osds": [
24,
36
]
},
{
"pgid": "7.cd9",
"osds": [
34,
36
]
},
{
"pgid": "7.d1b",
"osds": [
31,
36
]
},
{
"pgid": "7.d66",
"osds": [
34,
36
]
},
{
"pgid": "7.e20",
"osds": [
30,
37
]
},
{
"pgid": "7.e8f",
"osds": [
29,
36
]
},
{
"pgid": "7.eaa",
"osds": [
37,
17,
36
]
},
{
"pgid": "7.f0b",
"osds": [
33,
36
]
},
{
"pgid": "7.f48",
"osds": [
30,
37
]
},
{
"pgid": "7.fc2",
"osds": [
33,
36
]
},
{
"pgid": "7.fdd",
"osds": [
37,
24
]
},
{
"pgid": "8.11",
"osds": [
31,
36
]
},
{
"pgid": "8.12",
"osds": [
33,
31
]
},
{
"pgid": "8.18",
"osds": [
24,
31
]
},
{
"pgid": "8.1d",
"osds": [
30,
31
]
},
{
"pgid": "8.37",
"osds": [
31,
34
]
},
{
"pgid": "8.5a",
"osds": [
34,
33
]
},
{
"pgid": "8.7c",
"osds": [
31,
36
]
},
{
"pgid": "8.c0",
"osds": [
24,
34
]
},
{
"pgid": "8.c2",
"osds": [
34,
24
]
},
{
"pgid": "8.d3",
"osds": [
37,
17,
36
]
},
{
"pgid": "8.e3",
"osds": [
29,
24
]
},
{
"pgid": "8.ed",
"osds": [
29,
34
]
},
{
"pgid": "8.103",
"osds": [
29,
33
]
},
{
"pgid": "8.146",
"osds": [
29,
30
]
},
{
"pgid": "8.160",
"osds": [
31,
33
]
},
{
"pgid": "8.16f",
"osds": [
29,
24
]
},
{
"pgid": "8.171",
"osds": [
24,
37
]
},
{
"pgid": "8.175",
"osds": [
34,
30
]
},
{
"pgid": "8.17e",
"osds": [
31,
37
]
},
{
"pgid": "8.182",
"osds": [
34,
31
]
},
{
"pgid": "8.18a",
"osds": [
30,
36
]
},
{
"pgid": "8.1a1",
"osds": [
33,
34
]
},
{
"pgid": "8.1a4",
"osds": [
24,
30
]
},
{
"pgid": "8.1ae",
"osds": [
30,
31
]
},
{
"pgid": "8.1b4",
"osds": [
33,
31
]
},
{
"pgid": "8.1c3",
"osds": [
30,
24
]
},
{
"pgid": "8.1c7",
"osds": [
33,
31
]
},
{
"pgid": "8.1ce",
"osds": [
24,
36
]
},
{
"pgid": "8.1d0",
"osds": [
29,
30
]
},
{
"pgid": "8.1f1",
"osds": [
31,
29
]
},
{
"pgid": "8.1f3",
"osds": [
24,
29
]
},
{
"pgid": "8.1f5",
"osds": [
34,
29
]
},
{
"pgid": "8.240",
"osds": [
30,
24
]
},
{
"pgid": "8.25d",
"osds": [
34,
24
]
},
{
"pgid": "8.26b",
"osds": [
31,
33
]
},
{
"pgid": "8.2ad",
"osds": [
34,
24
]
},
{
"pgid": "8.2c5",
"osds": [
31,
24
]
},
{
"pgid": "8.2e3",
"osds": [
33,
36
]
},
{
"pgid": "8.31b",
"osds": [
31,
24
]
},
{
"pgid": "8.36e",
"osds": [
37,
11,
36
]
},
{
"pgid": "8.3c1",
"osds": [
34,
24
]
},
{
"pgid": "8.3c7",
"osds": [
33,
24
]
},
{
"pgid": "16.32a",
"osds": [
33,
36
]
}
],
"primary_temp": [],
"blacklist": [
"2015-04-29 07:13:18.543543",
"2015-04-29 07:11:05.620929",
"2015-04-29 07:07:39.090155"
],
"erasure_code_profiles": {
"default": {
"directory": "\/usr\/lib\/ceph\/erasure-code",
"k": "2",
"m": "1",
"plugin": "jerasure",
"technique": "reed_sol_van"
}
}
}
[-- Attachment #3: Type: text/plain, Size: 178 bytes --]
_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
[not found] ` <81216125e573cf00539f61cc090b282b-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
@ 2015-04-29 15:38 ` Sage Weil
[not found] ` <alpine.DEB.2.00.1504290838060.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2015-04-29 15:38 UTC (permalink / raw)
To: Tuomas Juntunen
Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA
On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
> Hi
>
> I updated that version and it seems that something did happen, the osd's
> stayed up for a while and 'ceph status' got updated. But then in couple of
> minutes, they all went down the same way.
>
> I have attached new 'ceph osd dump -f json-pretty' and got a new log from
> one of the osd's with osd debug = 20,
> http://beta.xaasbox.com/ceph/ceph-osd.15.log
Sam mentioned that you had said earlier that this was not critical data?
If not, I think the simplest thing is to just drop those pools. The
important thing (from my perspective at least :) is that we understand the
root cause and can prevent this in the future.
sage
>
> Thank you!
>
> Br,
> Tuomas
>
>
>
> -----Original Message-----
> From: Sage Weil [mailto:sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org]
> Sent: 28. huhtikuuta 2015 23:57
> To: Tuomas Juntunen
> Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org; ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
> operations most of the OSD's went down
>
> Hi Tuomas,
>
> I've pushed an updated wip-hammer-snaps branch. Can you please try it?
> The build will appear here
>
>
> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
> 2eb514067f72afda11bcde286
>
> (or a similar url; adjust for your distro).
>
> Thanks!
> sage
>
>
> On Tue, 28 Apr 2015, Sage Weil wrote:
>
> > [adding ceph-devel]
> >
> > Okay, I see the problem. This seems to be unrelated ot the giant ->
> > hammer move... it's a result of the tiering changes you made:
> >
> > > > > > > > The following:
> > > > > > > >
> > > > > > > > ceph osd tier add img images --force-nonempty ceph osd
> > > > > > > > tier cache-mode images forward ceph osd tier set-overlay
> > > > > > > > img images
> >
> > Specifically, --force-nonempty bypassed important safety checks.
> >
> > 1. images had snapshots (and removed_snaps)
> >
> > 2. images was added as a tier *of* img, and img's removed_snaps was
> > copied to images, clobbering the removed_snaps value (see
> > OSDMap::Incremental::propagate_snaps_to_tiers)
> >
> > 3. tiering relation was undone, but removed_snaps was still gone
> >
> > 4. on OSD startup, when we load the PG, removed_snaps is initialized
> > with the older map. later, in PGPool::update(), we assume that
> > removed_snaps alwasy grows (never shrinks) and we trigger an assert.
> >
> > To fix this I think we need to do 2 things:
> >
> > 1. make the OSD forgiving out removed_snaps getting smaller. This is
> > probably a good thing anyway: once we know snaps are removed on all
> > OSDs we can prune the interval_set in the OSDMap. Maybe.
> >
> > 2. Fix the mon to prevent this from happening, *even* when
> > --force-nonempty is specified. (This is the root cause.)
> >
> > I've opened http://tracker.ceph.com/issues/11493 to track this.
> >
> > sage
> >
> >
> >
> > > > > > > >
> > > > > > > > Idea was to make images as a tier to img, move data to img
> > > > > > > > then change
> > > > > > > clients to use the new img pool.
> > > > > > > >
> > > > > > > > Br,
> > > > > > > > Tuomas
> > > > > > > >
> > > > > > > > > Can you explain exactly what you mean by:
> > > > > > > > >
> > > > > > > > > "Also I created one pool for tier to be able to move
> > > > > > > > > data without
> > > > > > > outage."
> > > > > > > > >
> > > > > > > > > -Sam
> > > > > > > > > ----- Original Message -----
> > > > > > > > > From: "tuomas juntunen"
> > > > > > > > > <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > > > To: "Ian Colle" <icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > > > > > > Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer
> > > > > > > > > and after some basic operations most of the OSD's went
> > > > > > > > > down
> > > > > > > > >
> > > > > > > > > Hi
> > > > > > > > >
> > > > > > > > > Any solution for this yet?
> > > > > > > > >
> > > > > > > > > Br,
> > > > > > > > > Tuomas
> > > > > > > > >
> > > > > > > > >> It looks like you may have hit
> > > > > > > > >> http://tracker.ceph.com/issues/7915
> > > > > > > > >>
> > > > > > > > >> Ian R. Colle
> > > > > > > > >> Global Director
> > > > > > > > >> of Software Engineering Red Hat (Inktank is now part of
> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
> > > > > > > > >> http://www.twitter.com/ircolle
> > > > > > > > >> Cell: +1.303.601.7713
> > > > > > > > >> Email: icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > > > > > > > >>
> > > > > > > > >> ----- Original Message -----
> > > > > > > > >> From: "tuomas juntunen"
> > > > > > > > >> <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> > > > > > > > >> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> > > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
> > > > > > > > >> after some basic operations most of the OSD's went down
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> > > > > > > > >>
> > > > > > > > >> Then created new pools and deleted some old ones. Also
> > > > > > > > >> I created one pool for tier to be able to move data
> > > > > > > > >> without
> > > outage.
> > > > > > > > >>
> > > > > > > > >> After these operations all but 10 OSD's are down and
> > > > > > > > >> creating this kind of messages to logs, I get more than
> > > > > > > > >> 100gb of these in a
> > > > > > night:
> > > > > > > > >>
> > > > > > > > >> -19> 2015-04-27 10:17:08.808584 7fd8e748d700 5 osd.23
> > > pg_epoch:
> > > >
> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> > > > > > > > >> n=0
> > > > > > > > >> ec=1 les/c
> > > > > > > > >> 16609/16659
> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > > >> pi=15659-16589/42
> > > > > > > > >> crt=8480'7 lcod
> > > > > > > > >> 0'0 inactive NOTIFY] enter Started
> > > > > > > > >> -18> 2015-04-27 10:17:08.808596 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> > > > > > > > >> n=0
> > > > > > > > >> ec=1 les/c
> > > > > > > > >> 16609/16659
> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > > >> pi=15659-16589/42
> > > > > > > > >> crt=8480'7 lcod
> > > > > > > > >> 0'0 inactive NOTIFY] enter Start
> > > > > > > > >> -17> 2015-04-27 10:17:08.808608 7fd8e748d700 1
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> > > > > > > > >> n=0
> > > > > > > > >> ec=1 les/c
> > > > > > > > >> 16609/16659
> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > > >> pi=15659-16589/42
> > > > > > > > >> crt=8480'7 lcod
> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> > > > > > > > >> -16> 2015-04-27 10:17:08.808621 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> > > > > > > > >> n=0
> > > > > > > > >> ec=1 les/c
> > > > > > > > >> 16609/16659
> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > > >> pi=15659-16589/42
> > > > > > > > >> crt=8480'7 lcod
> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> > > > > > > > >> -15> 2015-04-27 10:17:08.808637 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> > > > > > > > >> n=0
> > > > > > > > >> ec=1 les/c
> > > > > > > > >> 16609/16659
> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> > > > > > > > >> pi=15659-16589/42
> > > > > > > > >> crt=8480'7 lcod
> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> > > > > > > > >> -14> 2015-04-27 10:17:08.808796 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
> > > > > > > > >> -13> 2015-04-27 10:17:08.808817 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > > >> inactive NOTIFY] enter Started
> > > > > > > > >> -12> 2015-04-27 10:17:08.808828 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > > >> inactive NOTIFY] enter Start
> > > > > > > > >> -11> 2015-04-27 10:17:08.808838 7fd8e748d700 1
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > > >> inactive NOTIFY]
> > > > > > > > >> state<Start>: transitioning to Stray
> > > > > > > > >> -10> 2015-04-27 10:17:08.808849 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
> > > > > > > > >> -9> 2015-04-27 10:17:08.808861 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> > > > > > > > >> les/c
> > > > > > > > >> 17879/17879
> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> > > > > > > > >> inactive NOTIFY] enter Started/Stray
> > > > > > > > >> -8> 2015-04-27 10:17:08.809427 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
> > > > > > > > >> -7> 2015-04-27 10:17:08.809445 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > > >> 0'0 inactive] enter Started
> > > > > > > > >> -6> 2015-04-27 10:17:08.809456 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > > >> 0'0 inactive] enter Start
> > > > > > > > >> -5> 2015-04-27 10:17:08.809468 7fd8e748d700 1
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > > >> 0'0 inactive]
> > > > > > > > >> state<Start>: transitioning to Primary
> > > > > > > > >> -4> 2015-04-27 10:17:08.809479 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
> > > > > > > > >> -3> 2015-04-27 10:17:08.809492 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > > >> 0'0 inactive] enter Started/Primary
> > > > > > > > >> -2> 2015-04-27 10:17:08.809502 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
> > > > > > > > >> -1> 2015-04-27 10:17:08.809513 7fd8e748d700 5
> > > > > > > > >> osd.23
> > > > pg_epoch:
> > > > >
> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> > > > > > > > >> 16127/16344
> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> > > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
> > > > > > > > >> 0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> > > > > > > ./include/interval_set.h:
> > > > > > > > >> In
> > > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
> > > snapid_t]'
> > > > > > > > >> thread
> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> > > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >=
> > > > > > > > >> 0)
> > > > > > > > >>
> > > > > > > > >> ceph version 0.94.1
> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> > > > > > > > >> 1: (ceph::__ceph_assert_fail(char const*, char const*,
> > > > > > > > >> int, char
> > > > > > > > >> const*)+0x8b)
> > > > > > > > >> [0xbc271b]
> > > > > > > > >> 2:
> > > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
> > > > > > > > >> >
> > > > > > > > >> const&)+0xb0) [0x82cd50]
> > > > > > > > >> 3: (PGPool::update(std::tr1::shared_ptr<OSDMap
> > > > > > > > >> const>)+0x52e) [0x80113e]
> > > > > > > > >> 4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
> > > > > > > > >> const>std::vector<int,
> > > > > > > > >> std::allocator<int> >&, int, std::vector<int,
> > > > > > > > >> std::allocator<int>
> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> > > > > > > > >> 5: (OSD::advance_pg(unsigned int, PG*,
> > > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
> > > > > > > > >> std::set<boost::intrusive_ptr<PG>,
> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
> > > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
> > > > > > > > >> [0x6b0e43]
> > > > > > > > >> 6: (OSD::process_peering_events(std::list<PG*,
> > > > > > > > >> std::allocator<PG*>
> > > > > > > > >> > const&,
> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> > > > > > > > >> 7: (OSD::PeeringWQ::_process(std::list<PG*,
> > > > > > > > >> std::allocator<PG*>
> > > > > > > > >> > const&,
> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> > > > > > > > >> 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> > > > > > > > >> [0xbb38ae]
> > > > > > > > >> 9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> > > > > > > > >> 10: (()+0x8182) [0x7fd906946182]
> > > > > > > > >> 11: (clone()+0x6d) [0x7fd904eb147d]
> > > > > > > > >>
> > > > > > > > >> Also by monitoring (ceph -w) I get the following
> > > > > > > > >> messages, also lots of
> > > > > > > them.
> > > > > > > > >>
> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> > > > > > > 10.20.0.13:0/1174409'
> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush
> > > > > > > > >> create-or-move",
> > > > "args":
> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]:
>
> > > > > > > > >> dispatch
> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> > > > > > > 10.20.0.13:0/1174483'
> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush
> > > > > > > > >> create-or-move",
> > > > "args":
> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]:
>
> > > > > > > > >> dispatch
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are
> > > > > > > > >> also mons and mds's to save servers. All run Ubuntu
> 14.04.2.
> > > > > > > > >>
> > > > > > > > >> I have pretty much tried everything I could think of.
> > > > > > > > >>
> > > > > > > > >> Restarting daemons doesn't help.
> > > > > > > > >>
> > > > > > > > >> Any help would be appreciated. I can also provide more
> > > > > > > > >> logs if necessary. They just seem to get pretty large
> > > > > > > > >> in few
> > > moments.
> > > > > > > > >>
> > > > > > > > >> Thank you
> > > > > > > > >> Tuomas
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> _______________________________________________
> > > > > > > > >> ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > ceph-users mailing list
> > > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > ceph-users mailing list
> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > ceph-users mailing list
> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
[not found] ` <alpine.DEB.2.00.1504290838060.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2015-04-30 3:31 ` tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g
[not found] ` <928ebb7320e4eb07f14071e997ed7be2-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g @ 2015-04-30 3:31 UTC (permalink / raw)
To: Sage Weil
Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA
Hey
Yes I can drop the images data, you think this will fix it?
Br,
Tuomas
> On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
>> Hi
>>
>> I updated that version and it seems that something did happen, the osd's
>> stayed up for a while and 'ceph status' got updated. But then in couple of
>> minutes, they all went down the same way.
>>
>> I have attached new 'ceph osd dump -f json-pretty' and got a new log from
>> one of the osd's with osd debug = 20,
>> http://beta.xaasbox.com/ceph/ceph-osd.15.log
>
> Sam mentioned that you had said earlier that this was not critical data?
> If not, I think the simplest thing is to just drop those pools. The
> important thing (from my perspective at least :) is that we understand the
> root cause and can prevent this in the future.
>
> sage
>
>
>>
>> Thank you!
>>
>> Br,
>> Tuomas
>>
>>
>>
>> -----Original Message-----
>> From: Sage Weil [mailto:sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org]
>> Sent: 28. huhtikuuta 2015 23:57
>> To: Tuomas Juntunen
>> Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org; ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
>> operations most of the OSD's went down
>>
>> Hi Tuomas,
>>
>> I've pushed an updated wip-hammer-snaps branch. Can you please try it?
>> The build will appear here
>>
>>
>> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
>> 2eb514067f72afda11bcde286
>>
>> (or a similar url; adjust for your distro).
>>
>> Thanks!
>> sage
>>
>>
>> On Tue, 28 Apr 2015, Sage Weil wrote:
>>
>> > [adding ceph-devel]
>> >
>> > Okay, I see the problem. This seems to be unrelated ot the giant ->
>> > hammer move... it's a result of the tiering changes you made:
>> >
>> > > > > > > > The following:
>> > > > > > > >
>> > > > > > > > ceph osd tier add img images --force-nonempty ceph osd
>> > > > > > > > tier cache-mode images forward ceph osd tier set-overlay
>> > > > > > > > img images
>> >
>> > Specifically, --force-nonempty bypassed important safety checks.
>> >
>> > 1. images had snapshots (and removed_snaps)
>> >
>> > 2. images was added as a tier *of* img, and img's removed_snaps was
>> > copied to images, clobbering the removed_snaps value (see
>> > OSDMap::Incremental::propagate_snaps_to_tiers)
>> >
>> > 3. tiering relation was undone, but removed_snaps was still gone
>> >
>> > 4. on OSD startup, when we load the PG, removed_snaps is initialized
>> > with the older map. later, in PGPool::update(), we assume that
>> > removed_snaps alwasy grows (never shrinks) and we trigger an assert.
>> >
>> > To fix this I think we need to do 2 things:
>> >
>> > 1. make the OSD forgiving out removed_snaps getting smaller. This is
>> > probably a good thing anyway: once we know snaps are removed on all
>> > OSDs we can prune the interval_set in the OSDMap. Maybe.
>> >
>> > 2. Fix the mon to prevent this from happening, *even* when
>> > --force-nonempty is specified. (This is the root cause.)
>> >
>> > I've opened http://tracker.ceph.com/issues/11493 to track this.
>> >
>> > sage
>> >
>> >
>> >
>> > > > > > > >
>> > > > > > > > Idea was to make images as a tier to img, move data to img
>> > > > > > > > then change
>> > > > > > > clients to use the new img pool.
>> > > > > > > >
>> > > > > > > > Br,
>> > > > > > > > Tuomas
>> > > > > > > >
>> > > > > > > > > Can you explain exactly what you mean by:
>> > > > > > > > >
>> > > > > > > > > "Also I created one pool for tier to be able to move
>> > > > > > > > > data without
>> > > > > > > outage."
>> > > > > > > > >
>> > > > > > > > > -Sam
>> > > > > > > > > ----- Original Message -----
>> > > > > > > > > From: "tuomas juntunen"
>> > > > > > > > > <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
>> > > > > > > > > To: "Ian Colle" <icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> > > > > > > > > Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
>> > > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer
>> > > > > > > > > and after some basic operations most of the OSD's went
>> > > > > > > > > down
>> > > > > > > > >
>> > > > > > > > > Hi
>> > > > > > > > >
>> > > > > > > > > Any solution for this yet?
>> > > > > > > > >
>> > > > > > > > > Br,
>> > > > > > > > > Tuomas
>> > > > > > > > >
>> > > > > > > > >> It looks like you may have hit
>> > > > > > > > >> http://tracker.ceph.com/issues/7915
>> > > > > > > > >>
>> > > > > > > > >> Ian R. Colle
>> > > > > > > > >> Global Director
>> > > > > > > > >> of Software Engineering Red Hat (Inktank is now part of
>> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
>> > > > > > > > >> http://www.twitter.com/ircolle
>> > > > > > > > >> Cell: +1.303.601.7713
>> > > > > > > > >> Email: icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
>> > > > > > > > >>
>> > > > > > > > >> ----- Original Message -----
>> > > > > > > > >> From: "tuomas juntunen"
>> > > > > > > > >> <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
>> > > > > > > > >> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
>> > > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
>> > > > > > > > >> after some basic operations most of the OSD's went down
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>> > > > > > > > >>
>> > > > > > > > >> Then created new pools and deleted some old ones. Also
>> > > > > > > > >> I created one pool for tier to be able to move data
>> > > > > > > > >> without
>> > > outage.
>> > > > > > > > >>
>> > > > > > > > >> After these operations all but 10 OSD's are down and
>> > > > > > > > >> creating this kind of messages to logs, I get more than
>> > > > > > > > >> 100gb of these in a
>> > > > > > night:
>> > > > > > > > >>
>> > > > > > > > >> -19> 2015-04-27 10:17:08.808584 7fd8e748d700 5 osd.23
>> > > pg_epoch:
>> > > >
>> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> > > > > > > > >> n=0
>> > > > > > > > >> ec=1 les/c
>> > > > > > > > >> 16609/16659
>> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> > > > > > > > >> pi=15659-16589/42
>> > > > > > > > >> crt=8480'7 lcod
>> > > > > > > > >> 0'0 inactive NOTIFY] enter Started
>> > > > > > > > >> -18> 2015-04-27 10:17:08.808596 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> > > > > > > > >> n=0
>> > > > > > > > >> ec=1 les/c
>> > > > > > > > >> 16609/16659
>> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> > > > > > > > >> pi=15659-16589/42
>> > > > > > > > >> crt=8480'7 lcod
>> > > > > > > > >> 0'0 inactive NOTIFY] enter Start
>> > > > > > > > >> -17> 2015-04-27 10:17:08.808608 7fd8e748d700 1
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> > > > > > > > >> n=0
>> > > > > > > > >> ec=1 les/c
>> > > > > > > > >> 16609/16659
>> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> > > > > > > > >> pi=15659-16589/42
>> > > > > > > > >> crt=8480'7 lcod
>> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
>> > > > > > > > >> -16> 2015-04-27 10:17:08.808621 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> > > > > > > > >> n=0
>> > > > > > > > >> ec=1 les/c
>> > > > > > > > >> 16609/16659
>> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> > > > > > > > >> pi=15659-16589/42
>> > > > > > > > >> crt=8480'7 lcod
>> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
>> > > > > > > > >> -15> 2015-04-27 10:17:08.808637 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> > > > > > > > >> n=0
>> > > > > > > > >> ec=1 les/c
>> > > > > > > > >> 16609/16659
>> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> > > > > > > > >> pi=15659-16589/42
>> > > > > > > > >> crt=8480'7 lcod
>> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
>> > > > > > > > >> -14> 2015-04-27 10:17:08.808796 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
>> > > > > > > > >> -13> 2015-04-27 10:17:08.808817 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY] enter Started
>> > > > > > > > >> -12> 2015-04-27 10:17:08.808828 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY] enter Start
>> > > > > > > > >> -11> 2015-04-27 10:17:08.808838 7fd8e748d700 1
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY]
>> > > > > > > > >> state<Start>: transitioning to Stray
>> > > > > > > > >> -10> 2015-04-27 10:17:08.808849 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
>> > > > > > > > >> -9> 2015-04-27 10:17:08.808861 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> > > > > > > > >> les/c
>> > > > > > > > >> 17879/17879
>> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> > > > > > > > >> inactive NOTIFY] enter Started/Stray
>> > > > > > > > >> -8> 2015-04-27 10:17:08.809427 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
>> > > > > > > > >> -7> 2015-04-27 10:17:08.809445 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] enter Started
>> > > > > > > > >> -6> 2015-04-27 10:17:08.809456 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] enter Start
>> > > > > > > > >> -5> 2015-04-27 10:17:08.809468 7fd8e748d700 1
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive]
>> > > > > > > > >> state<Start>: transitioning to Primary
>> > > > > > > > >> -4> 2015-04-27 10:17:08.809479 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
>> > > > > > > > >> -3> 2015-04-27 10:17:08.809492 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] enter Started/Primary
>> > > > > > > > >> -2> 2015-04-27 10:17:08.809502 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
>> > > > > > > > >> -1> 2015-04-27 10:17:08.809513 7fd8e748d700 5
>> > > > > > > > >> osd.23
>> > > > pg_epoch:
>> > > > >
>> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> > > > > > > > >> 16127/16344
>> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> > > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
>> > > > > > > > >> 0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
>> > > > > > > ./include/interval_set.h:
>> > > > > > > > >> In
>> > > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
>> > > snapid_t]'
>> > > > > > > > >> thread
>> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
>> > > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >=
>> > > > > > > > >> 0)
>> > > > > > > > >>
>> > > > > > > > >> ceph version 0.94.1
>> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>> > > > > > > > >> 1: (ceph::__ceph_assert_fail(char const*, char const*,
>> > > > > > > > >> int, char
>> > > > > > > > >> const*)+0x8b)
>> > > > > > > > >> [0xbc271b]
>> > > > > > > > >> 2:
>> > > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
>> > > > > > > > >> >
>> > > > > > > > >> const&)+0xb0) [0x82cd50]
>> > > > > > > > >> 3: (PGPool::update(std::tr1::shared_ptr<OSDMap
>> > > > > > > > >> const>)+0x52e) [0x80113e]
>> > > > > > > > >> 4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
>> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
>> > > > > > > > >> const>std::vector<int,
>> > > > > > > > >> std::allocator<int> >&, int, std::vector<int,
>> > > > > > > > >> std::allocator<int>
>> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
>> > > > > > > > >> 5: (OSD::advance_pg(unsigned int, PG*,
>> > > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
>> > > > > > > > >> std::set<boost::intrusive_ptr<PG>,
>> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
>> > > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
>> > > > > > > > >> [0x6b0e43]
>> > > > > > > > >> 6: (OSD::process_peering_events(std::list<PG*,
>> > > > > > > > >> std::allocator<PG*>
>> > > > > > > > >> > const&,
>> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
>> > > > > > > > >> 7: (OSD::PeeringWQ::_process(std::list<PG*,
>> > > > > > > > >> std::allocator<PG*>
>> > > > > > > > >> > const&,
>> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
>> > > > > > > > >> 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
>> > > > > > > > >> [0xbb38ae]
>> > > > > > > > >> 9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
>> > > > > > > > >> 10: (()+0x8182) [0x7fd906946182]
>> > > > > > > > >> 11: (clone()+0x6d) [0x7fd904eb147d]
>> > > > > > > > >>
>> > > > > > > > >> Also by monitoring (ceph -w) I get the following
>> > > > > > > > >> messages, also lots of
>> > > > > > > them.
>> > > > > > > > >>
>> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
>> > > > > > > 10.20.0.13:0/1174409'
>> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush
>> > > > > > > > >> create-or-move",
>> > > > "args":
>> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]:
>>
>> > > > > > > > >> dispatch
>> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
>> > > > > > > 10.20.0.13:0/1174483'
>> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush
>> > > > > > > > >> create-or-move",
>> > > > "args":
>> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]:
>>
>> > > > > > > > >> dispatch
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are
>> > > > > > > > >> also mons and mds's to save servers. All run Ubuntu
>> 14.04.2.
>> > > > > > > > >>
>> > > > > > > > >> I have pretty much tried everything I could think of.
>> > > > > > > > >>
>> > > > > > > > >> Restarting daemons doesn't help.
>> > > > > > > > >>
>> > > > > > > > >> Any help would be appreciated. I can also provide more
>> > > > > > > > >> logs if necessary. They just seem to get pretty large
>> > > > > > > > >> in few
>> > > moments.
>> > > > > > > > >>
>> > > > > > > > >> Thank you
>> > > > > > > > >> Tuomas
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> _______________________________________________
>> > > > > > > > >> ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > _______________________________________________
>> > > > > > > > > ceph-users mailing list
>> > > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > _______________________________________________
>> > > > > > > > ceph-users mailing list
>> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > _______________________________________________
>> > > > > > > > ceph-users mailing list
>> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
[not found] ` <928ebb7320e4eb07f14071e997ed7be2-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
@ 2015-04-30 15:23 ` Sage Weil
0 siblings, 0 replies; 8+ messages in thread
From: Sage Weil @ 2015-04-30 15:23 UTC (permalink / raw)
To: Tuomas Juntunen
Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA
On Thu, 30 Apr 2015, tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org wrote:
> Hey
>
> Yes I can drop the images data, you think this will fix it?
It's a slightly different assert that (I believe) should not trigger once
the pool is deleted. Please give that a try and if you still hit it I'll
whip up a workaround.
Thanks!
sage
>
>
> Br,
>
> Tuomas
>
> > On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
> >> Hi
> >>
> >> I updated that version and it seems that something did happen, the osd's
> >> stayed up for a while and 'ceph status' got updated. But then in couple of
> >> minutes, they all went down the same way.
> >>
> >> I have attached new 'ceph osd dump -f json-pretty' and got a new log from
> >> one of the osd's with osd debug = 20,
> >> http://beta.xaasbox.com/ceph/ceph-osd.15.log
> >
> > Sam mentioned that you had said earlier that this was not critical data?
> > If not, I think the simplest thing is to just drop those pools. The
> > important thing (from my perspective at least :) is that we understand the
> > root cause and can prevent this in the future.
> >
> > sage
> >
> >
> >>
> >> Thank you!
> >>
> >> Br,
> >> Tuomas
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Sage Weil [mailto:sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org]
> >> Sent: 28. huhtikuuta 2015 23:57
> >> To: Tuomas Juntunen
> >> Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org; ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
> >> operations most of the OSD's went down
> >>
> >> Hi Tuomas,
> >>
> >> I've pushed an updated wip-hammer-snaps branch. Can you please try it?
> >> The build will appear here
> >>
> >>
> >> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
> >> 2eb514067f72afda11bcde286
> >>
> >> (or a similar url; adjust for your distro).
> >>
> >> Thanks!
> >> sage
> >>
> >>
> >> On Tue, 28 Apr 2015, Sage Weil wrote:
> >>
> >> > [adding ceph-devel]
> >> >
> >> > Okay, I see the problem. This seems to be unrelated ot the giant ->
> >> > hammer move... it's a result of the tiering changes you made:
> >> >
> >> > > > > > > > The following:
> >> > > > > > > >
> >> > > > > > > > ceph osd tier add img images --force-nonempty ceph osd
> >> > > > > > > > tier cache-mode images forward ceph osd tier set-overlay
> >> > > > > > > > img images
> >> >
> >> > Specifically, --force-nonempty bypassed important safety checks.
> >> >
> >> > 1. images had snapshots (and removed_snaps)
> >> >
> >> > 2. images was added as a tier *of* img, and img's removed_snaps was
> >> > copied to images, clobbering the removed_snaps value (see
> >> > OSDMap::Incremental::propagate_snaps_to_tiers)
> >> >
> >> > 3. tiering relation was undone, but removed_snaps was still gone
> >> >
> >> > 4. on OSD startup, when we load the PG, removed_snaps is initialized
> >> > with the older map. later, in PGPool::update(), we assume that
> >> > removed_snaps alwasy grows (never shrinks) and we trigger an assert.
> >> >
> >> > To fix this I think we need to do 2 things:
> >> >
> >> > 1. make the OSD forgiving out removed_snaps getting smaller. This is
> >> > probably a good thing anyway: once we know snaps are removed on all
> >> > OSDs we can prune the interval_set in the OSDMap. Maybe.
> >> >
> >> > 2. Fix the mon to prevent this from happening, *even* when
> >> > --force-nonempty is specified. (This is the root cause.)
> >> >
> >> > I've opened http://tracker.ceph.com/issues/11493 to track this.
> >> >
> >> > sage
> >> >
> >> >
> >> >
> >> > > > > > > >
> >> > > > > > > > Idea was to make images as a tier to img, move data to img
> >> > > > > > > > then change
> >> > > > > > > clients to use the new img pool.
> >> > > > > > > >
> >> > > > > > > > Br,
> >> > > > > > > > Tuomas
> >> > > > > > > >
> >> > > > > > > > > Can you explain exactly what you mean by:
> >> > > > > > > > >
> >> > > > > > > > > "Also I created one pool for tier to be able to move
> >> > > > > > > > > data without
> >> > > > > > > outage."
> >> > > > > > > > >
> >> > > > > > > > > -Sam
> >> > > > > > > > > ----- Original Message -----
> >> > > > > > > > > From: "tuomas juntunen"
> >> > > > > > > > > <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> >> > > > > > > > > To: "Ian Colle" <icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >> > > > > > > > > Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> >> > > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer
> >> > > > > > > > > and after some basic operations most of the OSD's went
> >> > > > > > > > > down
> >> > > > > > > > >
> >> > > > > > > > > Hi
> >> > > > > > > > >
> >> > > > > > > > > Any solution for this yet?
> >> > > > > > > > >
> >> > > > > > > > > Br,
> >> > > > > > > > > Tuomas
> >> > > > > > > > >
> >> > > > > > > > >> It looks like you may have hit
> >> > > > > > > > >> http://tracker.ceph.com/issues/7915
> >> > > > > > > > >>
> >> > > > > > > > >> Ian R. Colle
> >> > > > > > > > >> Global Director
> >> > > > > > > > >> of Software Engineering Red Hat (Inktank is now part of
> >> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
> >> > > > > > > > >> http://www.twitter.com/ircolle
> >> > > > > > > > >> Cell: +1.303.601.7713
> >> > > > > > > > >> Email: icolle-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> >> > > > > > > > >>
> >> > > > > > > > >> ----- Original Message -----
> >> > > > > > > > >> From: "tuomas juntunen"
> >> > > > > > > > >> <tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g@public.gmane.org>
> >> > > > > > > > >> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> >> > > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
> >> > > > > > > > >> after some basic operations most of the OSD's went down
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> >> > > > > > > > >>
> >> > > > > > > > >> Then created new pools and deleted some old ones. Also
> >> > > > > > > > >> I created one pool for tier to be able to move data
> >> > > > > > > > >> without
> >> > > outage.
> >> > > > > > > > >>
> >> > > > > > > > >> After these operations all but 10 OSD's are down and
> >> > > > > > > > >> creating this kind of messages to logs, I get more than
> >> > > > > > > > >> 100gb of these in a
> >> > > > > > night:
> >> > > > > > > > >>
> >> > > > > > > > >> -19> 2015-04-27 10:17:08.808584 7fd8e748d700 5 osd.23
> >> > > pg_epoch:
> >> > > >
> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> > > > > > > > >> n=0
> >> > > > > > > > >> ec=1 les/c
> >> > > > > > > > >> 16609/16659
> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> > > > > > > > >> pi=15659-16589/42
> >> > > > > > > > >> crt=8480'7 lcod
> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started
> >> > > > > > > > >> -18> 2015-04-27 10:17:08.808596 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> > > > > > > > >> n=0
> >> > > > > > > > >> ec=1 les/c
> >> > > > > > > > >> 16609/16659
> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> > > > > > > > >> pi=15659-16589/42
> >> > > > > > > > >> crt=8480'7 lcod
> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Start
> >> > > > > > > > >> -17> 2015-04-27 10:17:08.808608 7fd8e748d700 1
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> > > > > > > > >> n=0
> >> > > > > > > > >> ec=1 les/c
> >> > > > > > > > >> 16609/16659
> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> > > > > > > > >> pi=15659-16589/42
> >> > > > > > > > >> crt=8480'7 lcod
> >> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> >> > > > > > > > >> -16> 2015-04-27 10:17:08.808621 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> > > > > > > > >> n=0
> >> > > > > > > > >> ec=1 les/c
> >> > > > > > > > >> 16609/16659
> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> > > > > > > > >> pi=15659-16589/42
> >> > > > > > > > >> crt=8480'7 lcod
> >> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> >> > > > > > > > >> -15> 2015-04-27 10:17:08.808637 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> > > > > > > > >> n=0
> >> > > > > > > > >> ec=1 les/c
> >> > > > > > > > >> 16609/16659
> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> > > > > > > > >> pi=15659-16589/42
> >> > > > > > > > >> crt=8480'7 lcod
> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> >> > > > > > > > >> -14> 2015-04-27 10:17:08.808796 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
> >> > > > > > > > >> -13> 2015-04-27 10:17:08.808817 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY] enter Started
> >> > > > > > > > >> -12> 2015-04-27 10:17:08.808828 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY] enter Start
> >> > > > > > > > >> -11> 2015-04-27 10:17:08.808838 7fd8e748d700 1
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY]
> >> > > > > > > > >> state<Start>: transitioning to Stray
> >> > > > > > > > >> -10> 2015-04-27 10:17:08.808849 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
> >> > > > > > > > >> -9> 2015-04-27 10:17:08.808861 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> > > > > > > > >> les/c
> >> > > > > > > > >> 17879/17879
> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> > > > > > > > >> inactive NOTIFY] enter Started/Stray
> >> > > > > > > > >> -8> 2015-04-27 10:17:08.809427 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
> >> > > > > > > > >> -7> 2015-04-27 10:17:08.809445 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] enter Started
> >> > > > > > > > >> -6> 2015-04-27 10:17:08.809456 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] enter Start
> >> > > > > > > > >> -5> 2015-04-27 10:17:08.809468 7fd8e748d700 1
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive]
> >> > > > > > > > >> state<Start>: transitioning to Primary
> >> > > > > > > > >> -4> 2015-04-27 10:17:08.809479 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
> >> > > > > > > > >> -3> 2015-04-27 10:17:08.809492 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] enter Started/Primary
> >> > > > > > > > >> -2> 2015-04-27 10:17:08.809502 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
> >> > > > > > > > >> -1> 2015-04-27 10:17:08.809513 7fd8e748d700 5
> >> > > > > > > > >> osd.23
> >> > > > pg_epoch:
> >> > > > >
> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> > > > > > > > >> 16127/16344
> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> > > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
> >> > > > > > > > >> 0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> >> > > > > > > ./include/interval_set.h:
> >> > > > > > > > >> In
> >> > > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
> >> > > snapid_t]'
> >> > > > > > > > >> thread
> >> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> >> > > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >=
> >> > > > > > > > >> 0)
> >> > > > > > > > >>
> >> > > > > > > > >> ceph version 0.94.1
> >> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> >> > > > > > > > >> 1: (ceph::__ceph_assert_fail(char const*, char const*,
> >> > > > > > > > >> int, char
> >> > > > > > > > >> const*)+0x8b)
> >> > > > > > > > >> [0xbc271b]
> >> > > > > > > > >> 2:
> >> > > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
> >> > > > > > > > >> >
> >> > > > > > > > >> const&)+0xb0) [0x82cd50]
> >> > > > > > > > >> 3: (PGPool::update(std::tr1::shared_ptr<OSDMap
> >> > > > > > > > >> const>)+0x52e) [0x80113e]
> >> > > > > > > > >> 4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> >> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
> >> > > > > > > > >> const>std::vector<int,
> >> > > > > > > > >> std::allocator<int> >&, int, std::vector<int,
> >> > > > > > > > >> std::allocator<int>
> >> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> >> > > > > > > > >> 5: (OSD::advance_pg(unsigned int, PG*,
> >> > > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
> >> > > > > > > > >> std::set<boost::intrusive_ptr<PG>,
> >> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
> >> > > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
> >> > > > > > > > >> [0x6b0e43]
> >> > > > > > > > >> 6: (OSD::process_peering_events(std::list<PG*,
> >> > > > > > > > >> std::allocator<PG*>
> >> > > > > > > > >> > const&,
> >> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> >> > > > > > > > >> 7: (OSD::PeeringWQ::_process(std::list<PG*,
> >> > > > > > > > >> std::allocator<PG*>
> >> > > > > > > > >> > const&,
> >> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> >> > > > > > > > >> 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> >> > > > > > > > >> [0xbb38ae]
> >> > > > > > > > >> 9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> >> > > > > > > > >> 10: (()+0x8182) [0x7fd906946182]
> >> > > > > > > > >> 11: (clone()+0x6d) [0x7fd904eb147d]
> >> > > > > > > > >>
> >> > > > > > > > >> Also by monitoring (ceph -w) I get the following
> >> > > > > > > > >> messages, also lots of
> >> > > > > > > them.
> >> > > > > > > > >>
> >> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> >> > > > > > > 10.20.0.13:0/1174409'
> >> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush
> >> > > > > > > > >> create-or-move",
> >> > > > "args":
> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]:
> >>
> >> > > > > > > > >> dispatch
> >> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> >> > > > > > > 10.20.0.13:0/1174483'
> >> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush
> >> > > > > > > > >> create-or-move",
> >> > > > "args":
> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]:
> >>
> >> > > > > > > > >> dispatch
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are
> >> > > > > > > > >> also mons and mds's to save servers. All run Ubuntu
> >> 14.04.2.
> >> > > > > > > > >>
> >> > > > > > > > >> I have pretty much tried everything I could think of.
> >> > > > > > > > >>
> >> > > > > > > > >> Restarting daemons doesn't help.
> >> > > > > > > > >>
> >> > > > > > > > >> Any help would be appreciated. I can also provide more
> >> > > > > > > > >> logs if necessary. They just seem to get pretty large
> >> > > > > > > > >> in few
> >> > > moments.
> >> > > > > > > > >>
> >> > > > > > > > >> Thank you
> >> > > > > > > > >> Tuomas
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> _______________________________________________
> >> > > > > > > > >> ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > _______________________________________________
> >> > > > > > > > > ceph-users mailing list
> >> > > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > _______________________________________________
> >> > > > > > > > ceph-users mailing list
> >> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > _______________________________________________
> >> > > > > > > > ceph-users mailing list
> >> > > > > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >> >
> >>
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
2015-05-01 16:04 ` Sage Weil
@ 2015-05-01 18:13 ` tuomas.juntunen
0 siblings, 0 replies; 8+ messages in thread
From: tuomas.juntunen @ 2015-05-01 18:13 UTC (permalink / raw)
To: Sage Weil; +Cc: tuomas.juntunen, ceph-users, ceph-devel
Thanks, I'll do this when the commit is available and report back.
And indeed, I'll change to the official ones after everything is ok.
Br,
Tuomas
> On Fri, 1 May 2015, tuomas.juntunen@databasement.fi wrote:
>> Hi
>>
>> I deleted the images and img pools and started osd's, they still die.
>>
>> Here's a log of one of the osd's after this, if you need it.
>>
>> http://beta.xaasbox.com/ceph/ceph-osd.19.log
>
> I've pushed another commit that should avoid this case, sha1
> 425bd4e1dba00cc2243b0c27232d1f9740b04e34.
>
> Note that once the pools are fully deleted (shouldn't take too long once
> the osds are up and stabilize) you should switch back to the normal
> packages that don't have these workarounds.
>
> sage
>
>
>
>>
>> Br,
>> Tuomas
>>
>>
>> > Thanks man. I'll try it tomorrow. Have a good one.
>> >
>> > Br,T
>> >
>> > -------- Original message --------
>> > From: Sage Weil <sage@newdream.net>
>> > Date: 30/04/2015 18:23 (GMT+02:00)
>> > To: Tuomas Juntunen <tuomas.juntunen@databasement.fi>
>> > Cc: ceph-users@lists.ceph.com, ceph-devel@vger.kernel.org
>> > Subject: RE: [ceph-users] Upgrade from Giant to Hammer and after some basic
>>
>> > operations most of the OSD's went down
>> >
>> > On Thu, 30 Apr 2015, tuomas.juntunen@databasement.fi wrote:
>> >> Hey
>> >>
>> >> Yes I can drop the images data, you think this will fix it?
>> >
>> > It's a slightly different assert that (I believe) should not trigger once
>> > the pool is deleted. Please give that a try and if you still hit it I'll
>> > whip up a workaround.
>> >
>> > Thanks!
>> > sage
>> >
>> > >
>> >>
>> >> Br,
>> >>
>> >> Tuomas
>> >>
>> >> > On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
>> >> >> Hi
>> >> >>
>> >> >> I updated that version and it seems that something did happen, the osd's
>> >> >> stayed up for a while and 'ceph status' got updated. But then in couple
>> of
>> >> >> minutes, they all went down the same way.
>> >> >>
>> >> >> I have attached new 'ceph osd dump -f json-pretty' and got a new log
>> from
>> >> >> one of the osd's with osd debug = 20,
>> >> >> http://beta.xaasbox.com/ceph/ceph-osd.15.log
>> >> >
>> >> > Sam mentioned that you had said earlier that this was not critical data?
>> >> > If not, I think the simplest thing is to just drop those pools. The
>> >> > important thing (from my perspective at least :) is that we understand
>> the
>> >> > root cause and can prevent this in the future.
>> >> >
>> >> > sage
>> >> >
>> >> >
>> >> >>
>> >> >> Thank you!
>> >> >>
>> >> >> Br,
>> >> >> Tuomas
>> >> >>
>> >> >>
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: Sage Weil [mailto:sage@newdream.net]
>> >> >> Sent: 28. huhtikuuta 2015 23:57
>> >> >> To: Tuomas Juntunen
>> >> >> Cc: ceph-users@lists.ceph.com; ceph-devel@vger.kernel.org
>> >> >> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some
>> basic
>> >> >> operations most of the OSD's went down
>> >> >>
>> >> >> Hi Tuomas,
>> >> >>
>> >> >> I've pushed an updated wip-hammer-snaps branch. Can you please try it?
>> >> >> The build will appear here
>> >> >>
>> >> >>
>> >> >> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
>> >> >> 2eb514067f72afda11bcde286
>> >> >>
>> >> >> (or a similar url; adjust for your distro).
>> >> >>
>> >> >> Thanks!
>> >> >> sage
>> >> >>
>> >> >>
>> >> >> On Tue, 28 Apr 2015, Sage Weil wrote:
>> >> >>
>> >> >> > [adding ceph-devel]
>> >> >> >
>> >> >> > Okay, I see the problem. This seems to be unrelated ot the giant ->
>> >> >> > hammer move... it's a result of the tiering changes you made:
>> >> >> >
>> >> >> > > > > > > > The following:
>> >> >> > > > > > > >
>> >> >> > > > > > > > ceph osd tier add img images --force-nonempty ceph osd
>> >> >> > > > > > > > tier cache-mode images forward ceph osd tier set-overlay
>> >> >> > > > > > > > img images
>> >> >> >
>> >> >> > Specifically, --force-nonempty bypassed important safety checks.
>> >> >> >
>> >> >> > 1. images had snapshots (and removed_snaps)
>> >> >> >
>> >> >> > 2. images was added as a tier *of* img, and img's removed_snaps was
>> >> >> > copied to images, clobbering the removed_snaps value (see
>> >> >> > OSDMap::Incremental::propagate_snaps_to_tiers)
>> >> >> >
>> >> >> > 3. tiering relation was undone, but removed_snaps was still gone
>> >> >> >
>> >> >> > 4. on OSD startup, when we load the PG, removed_snaps is initialized
>> >> >> > with the older map. later, in PGPool::update(), we assume that
>> >> >> > removed_snaps alwasy grows (never shrinks) and we trigger an assert.
>> >> >> >
>> >> >> > To fix this I think we need to do 2 things:
>> >> >> >
>> >> >> > 1. make the OSD forgiving out removed_snaps getting smaller. This is
>> >> >> > probably a good thing anyway: once we know snaps are removed on all
>> >> >> > OSDs we can prune the interval_set in the OSDMap. Maybe.
>> >> >> >
>> >> >> > 2. Fix the mon to prevent this from happening, *even* when
>> >> >> > --force-nonempty is specified. (This is the root cause.)
>> >> >> >
>> >> >> > I've opened http://tracker.ceph.com/issues/11493 to track this.
>> >> >> >
>> >> >> > sage
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > > > > > > >
>> >> >> > > > > > > > Idea was to make images as a tier to img, move data to img
>> >> >> > > > > > > > then change
>> >> >> > > > > > > clients to use the new img pool.
>> >> >> > > > > > > >
>> >> >> > > > > > > > Br,
>> >> >> > > > > > > > Tuomas
>> >> >> > > > > > > >
>> >> >> > > > > > > > > Can you explain exactly what you mean by:
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > "Also I created one pool for tier to be able to move
>> >> >> > > > > > > > > data without
>> >> >> > > > > > > outage."
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > -Sam
>> >> >> > > > > > > > > ----- Original Message -----
>> >> >> > > > > > > > > From: "tuomas juntunen"
>> >> >> > > > > > > > > <tuomas.juntunen@databasement.fi>
>> >> >> > > > > > > > > To: "Ian Colle" <icolle@redhat.com>
>> >> >> > > > > > > > > Cc: ceph-users@lists.ceph.com
>> >> >> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
>> >> >> > > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer
>> >> >> > > > > > > > > and after some basic operations most of the OSD's went
>> >> >> > > > > > > > > down
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > Hi
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > Any solution for this yet?
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > Br,
>> >> >> > > > > > > > > Tuomas
>> >> >> > > > > > > > >
>> >> >> > > > > > > > >> It looks like you may have hit
>> >> >> > > > > > > > >> http://tracker.ceph.com/issues/7915
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Ian R. Colle
>> >> >> > > > > > > > >> Global Director
>> >> >> > > > > > > > >> of Software Engineering Red Hat (Inktank is now part of
>> >> >> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
>> >> >> > > > > > > > >> http://www.twitter.com/ircolle
>> >> >> > > > > > > > >> Cell: +1.303.601.7713
>> >> >> > > > > > > > >> Email: icolle@redhat.com
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> ----- Original Message -----
>> >> >> > > > > > > > >> From: "tuomas juntunen"
>> >> >> > > > > > > > >> <tuomas.juntunen@databasement.fi>
>> >> >> > > > > > > > >> To: ceph-users@lists.ceph.com
>> >> >> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
>> >> >> > > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
>> >> >> > > > > > > > >> after some basic operations most of the OSD's went down
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Then created new pools and deleted some old ones. Also
>> >> >> > > > > > > > >> I created one pool for tier to be able to move data
>> >> >> > > > > > > > >> without
>> >> >> > > outage.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> After these operations all but 10 OSD's are down and
>> >> >> > > > > > > > >> creating this kind of messages to logs, I get more than
>> >> >> > > > > > > > >> 100gb of these in a
>> >> >> > > > > > night:
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>Â -19> 2015-04-27 10:17:08.808584 7fd8e748d700Â 5
>> osd.23
>> >> >> > > pg_epoch:
>> >> >> > > >
>> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> >> > > > > > > > >> n=0
>> >> >> > > > > > > > >> ec=1 les/c
>> >> >> > > > > > > > >> 16609/16659
>> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> >> > > > > > > > >> pi=15659-16589/42
>> >> >> > > > > > > > >> crt=8480'7 lcod
>> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started
>> >> >> > > > > > > > >>Â Â Â -18> 2015-04-27 10:17:08.808596 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> >> > > > > > > > >> n=0
>> >> >> > > > > > > > >> ec=1 les/c
>> >> >> > > > > > > > >> 16609/16659
>> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> >> > > > > > > > >> pi=15659-16589/42
>> >> >> > > > > > > > >> crt=8480'7 lcod
>> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Start
>> >> >> > > > > > > > >>Â Â Â -17> 2015-04-27 10:17:08.808608 7fd8e748d700Â 1
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> >> > > > > > > > >> n=0
>> >> >> > > > > > > > >> ec=1 les/c
>> >> >> > > > > > > > >> 16609/16659
>> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> >> > > > > > > > >> pi=15659-16589/42
>> >> >> > > > > > > > >> crt=8480'7 lcod
>> >> >> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to
>> Stray
>> >> >> > > > > > > > >>Â Â Â -16> 2015-04-27 10:17:08.808621 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> >> > > > > > > > >> n=0
>> >> >> > > > > > > > >> ec=1 les/c
>> >> >> > > > > > > > >> 16609/16659
>> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> >> > > > > > > > >> pi=15659-16589/42
>> >> >> > > > > > > > >> crt=8480'7 lcod
>> >> >> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
>> >> >> > > > > > > > >>Â Â Â -15> 2015-04-27 10:17:08.808637 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> >> > > > > > > > >> n=0
>> >> >> > > > > > > > >> ec=1 les/c
>> >> >> > > > > > > > >> 16609/16659
>> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> >> > > > > > > > >> pi=15659-16589/42
>> >> >> > > > > > > > >> crt=8480'7 lcod
>> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
>> >> >> > > > > > > > >>Â Â Â -14> 2015-04-27 10:17:08.808796 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
>> >> >> > > > > > > > >>Â Â Â -13> 2015-04-27 10:17:08.808817 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY] enter Started
>> >> >> > > > > > > > >>Â Â Â -12> 2015-04-27 10:17:08.808828 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY] enter Start
>> >> >> > > > > > > > >>Â Â Â -11> 2015-04-27 10:17:08.808838 7fd8e748d700Â 1
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY]
>> >> >> > > > > > > > >> state<Start>: transitioning to Stray
>> >> >> > > > > > > > >>Â Â Â -10> 2015-04-27 10:17:08.808849 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
>> >> >> > > > > > > > >>Â Â Â Â -9> 2015-04-27 10:17:08.808861 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> >> > > > > > > > >> les/c
>> >> >> > > > > > > > >> 17879/17879
>> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> >> > > > > > > > >> inactive NOTIFY] enter Started/Stray
>> >> >> > > > > > > > >>Â Â Â Â -8> 2015-04-27 10:17:08.809427 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
>> >> >> > > > > > > > >>Â Â Â Â -7> 2015-04-27 10:17:08.809445 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] enter Started
>> >> >> > > > > > > > >>Â Â Â Â -6> 2015-04-27 10:17:08.809456 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] enter Start
>> >> >> > > > > > > > >>Â Â Â Â -5> 2015-04-27 10:17:08.809468 7fd8e748d700Â 1
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive]
>> >> >> > > > > > > > >> state<Start>: transitioning to Primary
>> >> >> > > > > > > > >>Â Â Â Â -4> 2015-04-27 10:17:08.809479 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
>> >> >> > > > > > > > >>Â Â Â Â -3> 2015-04-27 10:17:08.809492 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] enter Started/Primary
>> >> >> > > > > > > > >>Â Â Â Â -2> 2015-04-27 10:17:08.809502 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
>> >> >> > > > > > > > >>Â Â Â Â -1> 2015-04-27 10:17:08.809513 7fd8e748d700Â 5
>> >> >> > > > > > > > >> osd.23
>> >> >> > > > pg_epoch:
>> >> >> > > > >
>> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> >> > > > > > > > >> 16127/16344
>> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> >> > > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
>> >> >> > > > > > > > >>Â Â Â Â Â 0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
>> >> >> > > > > > > ./include/interval_set.h:
>> >> >> > > > > > > > >> In
>> >> >> > > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
>> >> >> > > snapid_t]'
>> >> >> > > > > > > > >> thread
>> >> >> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
>> >> >> > > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >=
>> >> >> > > > > > > > >> 0)
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>Â ceph version 0.94.1
>> >> >> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>> >> >> > > > > > > > >>Â 1: (ceph::__ceph_assert_fail(char const*, char
>> const*,
>> >> >> > > > > > > > >> int, char
>> >> >> > > > > > > > >> const*)+0x8b)
>> >> >> > > > > > > > >> [0xbc271b]
>> >> >> > > > > > > > >>Â 2:
>> >> >> > > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
>> >> >> > > > > > > > >> >
>> >> >> > > > > > > > >> const&)+0xb0) [0x82cd50]
>> >> >> > > > > > > > >>Â 3: (PGPool::update(std::tr1::shared_ptr<OSDMap
>> >> >> > > > > > > > >> const>)+0x52e) [0x80113e]
>> >> >> > > > > > > > >>Â 4:
>> (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
>> >> >> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
>> >> >> > > > > > > > >> const>std::vector<int,
>> >> >> > > > > > > > >> std::allocator<int> >&, int, std::vector<int,
>> >> >> > > > > > > > >> std::allocator<int>
>> >> >> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
>> >> >> > > > > > > > >>Â 5: (OSD::advance_pg(unsigned int, PG*,
>> >> >> > > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
>> >> >> > > > > > > > >> std::set<boost::intrusive_ptr<PG>,
>> >> >> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
>> >> >> > > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
>> >> >> > > > > > > > >> [0x6b0e43]
>> >> >> > > > > > > > >>Â 6: (OSD::process_peering_events(std::list<PG*,
>> >> >> > > > > > > > >> std::allocator<PG*>
>> >> >> > > > > > > > >> > const&,
>> >> >> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
>> >> >> > > > > > > > >>Â 7: (OSD::PeeringWQ::_process(std::list<PG*,
>> >> >> > > > > > > > >> std::allocator<PG*>
>> >> >> > > > > > > > >> > const&,
>> >> >> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
>> >> >> > > > > > > > >>Â 8:
>> (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
>> >> >> > > > > > > > >> [0xbb38ae]
>> >> >> > > > > > > > >>Â 9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
>> >> >> > > > > > > > >>Â 10: (()+0x8182) [0x7fd906946182]
>> >> >> > > > > > > > >>Â 11: (clone()+0x6d) [0x7fd904eb147d]
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Also by monitoring (ceph -w) I get the following
>> >> >> > > > > > > > >> messages, also lots of
>> >> >> > > > > > > them.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
>> >> >> > > > > > > 10.20.0.13:0/1174409'
>> >> >> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush
>> >> >> > > > > > > > >> create-or-move",
>> >> >> > > > "args":
>> >> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight":
>> >> 1.82}]:
>> >> >>
>> >> >> > > > > > > > >> dispatch
>> >> >> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
>> >> >> > > > > > > 10.20.0.13:0/1174483'
>> >> >> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush
>> >> >> > > > > > > > >> create-or-move",
>> >> >> > > > "args":
>> >> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight":
>> >> 1.82}]:
>> >> >>
>> >> >> > > > > > > > >> dispatch
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are
>> >> >> > > > > > > > >> also mons and mds's to save servers. All run Ubuntu
>> >> >> 14.04.2.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> I have pretty much tried everything I could think of.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Restarting daemons doesn't help.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Any help would be appreciated. I can also provide more
>> >> >> > > > > > > > >> logs if necessary. They just seem to get pretty large
>> >> >> > > > > > > > >> in few
>> >> >> > > moments.
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> Thank you
>> >> >> > > > > > > > >> Tuomas
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >> _______________________________________________
>> >> >> > > > > > > > >> ceph-users mailing list ceph-users@lists.ceph.com
>> >> >> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >>
>> >> >> > > > > > > > >
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > _______________________________________________
>> >> >> > > > > > > > > ceph-users mailing list
>> >> >> > > > > > > > > ceph-users@lists.ceph.com
>> >> >> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> > > > > > > > >
>> >> >> > > > > > > > >
>> >> >> > > > > > > > >
>> >> >> > > > > > > >
>> >> >> > > > > > > >
>> >> >> > > > > > > > _______________________________________________
>> >> >> > > > > > > > ceph-users mailing list
>> >> >> > > > > > > > ceph-users@lists.ceph.com
>> >> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> > > > > > > >
>> >> >> > > > > > > >
>> >> >> > > > > > > >
>> >> >> > > > > > > > _______________________________________________
>> >> >> > > > > > > > ceph-users mailing list
>> >> >> > > > > > > > ceph-users@lists.ceph.com
>> >> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> > > > > > > >
>> >> >> > > > > > > >
>> >> >> > > > > > >
>> >> >> > > > > >
>> >> >> > > > > >
>> >> >> > > > >
>> >> >> > > > >
>> >> >> > > >
>> >> >> > >
>> >> >> > >
>> >> >> > _______________________________________________
>> >> >> > ceph-users mailing list
>> >> >> > ceph-users@lists.ceph.com
>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> >
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down
2015-04-30 17:27 tuomas.juntunen
@ 2015-05-01 15:10 ` tuomas.juntunen
2015-05-01 16:04 ` Sage Weil
0 siblings, 1 reply; 8+ messages in thread
From: tuomas.juntunen @ 2015-05-01 15:10 UTC (permalink / raw)
To: tuomas.juntunen; +Cc: Sage Weil, ceph-users, ceph-devel
Hi
I deleted the images and img pools and started osd's, they still die.
Here's a log of one of the osd's after this, if you need it.
http://beta.xaasbox.com/ceph/ceph-osd.19.log
Br,
Tuomas
> Thanks man. I'll try it tomorrow. Have a good one.
>
> Br,T
>
> -------- Original message --------
> From: Sage Weil <sage@newdream.net>
> Date: 30/04/2015 18:23 (GMT+02:00)
> To: Tuomas Juntunen <tuomas.juntunen@databasement.fi>
> Cc: ceph-users@lists.ceph.com, ceph-devel@vger.kernel.org
> Subject: RE: [ceph-users] Upgrade from Giant to Hammer and after some basic
> operations most of the OSD's went down
>
> On Thu, 30 Apr 2015, tuomas.juntunen@databasement.fi wrote:
>> Hey
>>
>> Yes I can drop the images data, you think this will fix it?
>
> It's a slightly different assert that (I believe) should not trigger once
> the pool is deleted. Please give that a try and if you still hit it I'll
> whip up a workaround.
>
> Thanks!
> sage
>
> >
>>
>> Br,
>>
>> Tuomas
>>
>> > On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
>> >> Hi
>> >>
>> >> I updated that version and it seems that something did happen, the osd's
>> >> stayed up for a while and 'ceph status' got updated. But then in couple of
>> >> minutes, they all went down the same way.
>> >>
>> >> I have attached new 'ceph osd dump -f json-pretty' and got a new log from
>> >> one of the osd's with osd debug = 20,
>> >> http://beta.xaasbox.com/ceph/ceph-osd.15.log
>> >
>> > Sam mentioned that you had said earlier that this was not critical data?
>> > If not, I think the simplest thing is to just drop those pools. The
>> > important thing (from my perspective at least :) is that we understand the
>> > root cause and can prevent this in the future.
>> >
>> > sage
>> >
>> >
>> >>
>> >> Thank you!
>> >>
>> >> Br,
>> >> Tuomas
>> >>
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Sage Weil [mailto:sage@newdream.net]
>> >> Sent: 28. huhtikuuta 2015 23:57
>> >> To: Tuomas Juntunen
>> >> Cc: ceph-users@lists.ceph.com; ceph-devel@vger.kernel.org
>> >> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some basic
>> >> operations most of the OSD's went down
>> >>
>> >> Hi Tuomas,
>> >>
>> >> I've pushed an updated wip-hammer-snaps branch. Can you please try it?
>> >> The build will appear here
>> >>
>> >>
>> >> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
>> >> 2eb514067f72afda11bcde286
>> >>
>> >> (or a similar url; adjust for your distro).
>> >>
>> >> Thanks!
>> >> sage
>> >>
>> >>
>> >> On Tue, 28 Apr 2015, Sage Weil wrote:
>> >>
>> >> > [adding ceph-devel]
>> >> >
>> >> > Okay, I see the problem. This seems to be unrelated ot the giant ->
>> >> > hammer move... it's a result of the tiering changes you made:
>> >> >
>> >> > > > > > > > The following:
>> >> > > > > > > >
>> >> > > > > > > > ceph osd tier add img images --force-nonempty ceph osd
>> >> > > > > > > > tier cache-mode images forward ceph osd tier set-overlay
>> >> > > > > > > > img images
>> >> >
>> >> > Specifically, --force-nonempty bypassed important safety checks.
>> >> >
>> >> > 1. images had snapshots (and removed_snaps)
>> >> >
>> >> > 2. images was added as a tier *of* img, and img's removed_snaps was
>> >> > copied to images, clobbering the removed_snaps value (see
>> >> > OSDMap::Incremental::propagate_snaps_to_tiers)
>> >> >
>> >> > 3. tiering relation was undone, but removed_snaps was still gone
>> >> >
>> >> > 4. on OSD startup, when we load the PG, removed_snaps is initialized
>> >> > with the older map. later, in PGPool::update(), we assume that
>> >> > removed_snaps alwasy grows (never shrinks) and we trigger an assert.
>> >> >
>> >> > To fix this I think we need to do 2 things:
>> >> >
>> >> > 1. make the OSD forgiving out removed_snaps getting smaller. This is
>> >> > probably a good thing anyway: once we know snaps are removed on all
>> >> > OSDs we can prune the interval_set in the OSDMap. Maybe.
>> >> >
>> >> > 2. Fix the mon to prevent this from happening, *even* when
>> >> > --force-nonempty is specified. (This is the root cause.)
>> >> >
>> >> > I've opened http://tracker.ceph.com/issues/11493 to track this.
>> >> >
>> >> > sage
>> >> >
>> >> >
>> >> >
>> >> > > > > > > >
>> >> > > > > > > > Idea was to make images as a tier to img, move data to img
>> >> > > > > > > > then change
>> >> > > > > > > clients to use the new img pool.
>> >> > > > > > > >
>> >> > > > > > > > Br,
>> >> > > > > > > > Tuomas
>> >> > > > > > > >
>> >> > > > > > > > > Can you explain exactly what you mean by:
>> >> > > > > > > > >
>> >> > > > > > > > > "Also I created one pool for tier to be able to move
>> >> > > > > > > > > data without
>> >> > > > > > > outage."
>> >> > > > > > > > >
>> >> > > > > > > > > -Sam
>> >> > > > > > > > > ----- Original Message -----
>> >> > > > > > > > > From: "tuomas juntunen"
>> >> > > > > > > > > <tuomas.juntunen@databasement.fi>
>> >> > > > > > > > > To: "Ian Colle" <icolle@redhat.com>
>> >> > > > > > > > > Cc: ceph-users@lists.ceph.com
>> >> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
>> >> > > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant to Hammer
>> >> > > > > > > > > and after some basic operations most of the OSD's went
>> >> > > > > > > > > down
>> >> > > > > > > > >
>> >> > > > > > > > > Hi
>> >> > > > > > > > >
>> >> > > > > > > > > Any solution for this yet?
>> >> > > > > > > > >
>> >> > > > > > > > > Br,
>> >> > > > > > > > > Tuomas
>> >> > > > > > > > >
>> >> > > > > > > > >> It looks like you may have hit
>> >> > > > > > > > >> http://tracker.ceph.com/issues/7915
>> >> > > > > > > > >>
>> >> > > > > > > > >> Ian R. Colle
>> >> > > > > > > > >> Global Director
>> >> > > > > > > > >> of Software Engineering Red Hat (Inktank is now part of
>> >> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
>> >> > > > > > > > >> http://www.twitter.com/ircolle
>> >> > > > > > > > >> Cell: +1.303.601.7713
>> >> > > > > > > > >> Email: icolle@redhat.com
>> >> > > > > > > > >>
>> >> > > > > > > > >> ----- Original Message -----
>> >> > > > > > > > >> From: "tuomas juntunen"
>> >> > > > > > > > >> <tuomas.juntunen@databasement.fi>
>> >> > > > > > > > >> To: ceph-users@lists.ceph.com
>> >> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
>> >> > > > > > > > >> Subject: [ceph-users] Upgrade from Giant to Hammer and
>> >> > > > > > > > >> after some basic operations most of the OSD's went down
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
>> >> > > > > > > > >>
>> >> > > > > > > > >> Then created new pools and deleted some old ones. Also
>> >> > > > > > > > >> I created one pool for tier to be able to move data
>> >> > > > > > > > >> without
>> >> > > outage.
>> >> > > > > > > > >>
>> >> > > > > > > > >> After these operations all but 10 OSD's are down and
>> >> > > > > > > > >> creating this kind of messages to logs, I get more than
>> >> > > > > > > > >> 100gb of these in a
>> >> > > > > > night:
>> >> > > > > > > > >>
>> >> > > > > > > > >>Â -19> 2015-04-27 10:17:08.808584 7fd8e748d700Â 5 osd.23
>> >> > > pg_epoch:
>> >> > > >
>> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> > > > > > > > >> n=0
>> >> > > > > > > > >> ec=1 les/c
>> >> > > > > > > > >> 16609/16659
>> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> > > > > > > > >> pi=15659-16589/42
>> >> > > > > > > > >> crt=8480'7 lcod
>> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started
>> >> > > > > > > > >>Â Â Â -18> 2015-04-27 10:17:08.808596 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> > > > > > > > >> n=0
>> >> > > > > > > > >> ec=1 les/c
>> >> > > > > > > > >> 16609/16659
>> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> > > > > > > > >> pi=15659-16589/42
>> >> > > > > > > > >> crt=8480'7 lcod
>> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Start
>> >> > > > > > > > >>Â Â Â -17> 2015-04-27 10:17:08.808608 7fd8e748d700Â 1
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> > > > > > > > >> n=0
>> >> > > > > > > > >> ec=1 les/c
>> >> > > > > > > > >> 16609/16659
>> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> > > > > > > > >> pi=15659-16589/42
>> >> > > > > > > > >> crt=8480'7 lcod
>> >> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
>> >> > > > > > > > >>Â Â Â -16> 2015-04-27 10:17:08.808621 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> > > > > > > > >> n=0
>> >> > > > > > > > >> ec=1 les/c
>> >> > > > > > > > >> 16609/16659
>> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> > > > > > > > >> pi=15659-16589/42
>> >> > > > > > > > >> crt=8480'7 lcod
>> >> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
>> >> > > > > > > > >>Â Â Â -15> 2015-04-27 10:17:08.808637 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
>> >> > > > > > > > >> n=0
>> >> > > > > > > > >> ec=1 les/c
>> >> > > > > > > > >> 16609/16659
>> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
>> >> > > > > > > > >> pi=15659-16589/42
>> >> > > > > > > > >> crt=8480'7 lcod
>> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
>> >> > > > > > > > >>Â Â Â -14> 2015-04-27 10:17:08.808796 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
>> >> > > > > > > > >>Â Â Â -13> 2015-04-27 10:17:08.808817 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY] enter Started
>> >> > > > > > > > >>Â Â Â -12> 2015-04-27 10:17:08.808828 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY] enter Start
>> >> > > > > > > > >>Â Â Â -11> 2015-04-27 10:17:08.808838 7fd8e748d700Â 1
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY]
>> >> > > > > > > > >> state<Start>: transitioning to Stray
>> >> > > > > > > > >>Â Â Â -10> 2015-04-27 10:17:08.808849 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
>> >> > > > > > > > >>Â Â Â Â -9> 2015-04-27 10:17:08.808861 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
>> >> > > > > > > > >> les/c
>> >> > > > > > > > >> 17879/17879
>> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
>> >> > > > > > > > >> inactive NOTIFY] enter Started/Stray
>> >> > > > > > > > >>Â Â Â Â -8> 2015-04-27 10:17:08.809427 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
>> >> > > > > > > > >>Â Â Â Â -7> 2015-04-27 10:17:08.809445 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] enter Started
>> >> > > > > > > > >>Â Â Â Â -6> 2015-04-27 10:17:08.809456 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] enter Start
>> >> > > > > > > > >>Â Â Â Â -5> 2015-04-27 10:17:08.809468 7fd8e748d700Â 1
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive]
>> >> > > > > > > > >> state<Start>: transitioning to Primary
>> >> > > > > > > > >>Â Â Â Â -4> 2015-04-27 10:17:08.809479 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
>> >> > > > > > > > >>Â Â Â Â -3> 2015-04-27 10:17:08.809492 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] enter Started/Primary
>> >> > > > > > > > >>Â Â Â Â -2> 2015-04-27 10:17:08.809502 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
>> >> > > > > > > > >>Â Â Â Â -1> 2015-04-27 10:17:08.809513 7fd8e748d700Â 5
>> >> > > > > > > > >> osd.23
>> >> > > > pg_epoch:
>> >> > > > >
>> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
>> >> > > > > > > > >> 16127/16344
>> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
>> >> > > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
>> >> > > > > > > > >>Â Â Â Â Â 0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
>> >> > > > > > > ./include/interval_set.h:
>> >> > > > > > > > >> In
>> >> > > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
>> >> > > snapid_t]'
>> >> > > > > > > > >> thread
>> >> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
>> >> > > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >=
>> >> > > > > > > > >> 0)
>> >> > > > > > > > >>
>> >> > > > > > > > >>Â ceph version 0.94.1
>> >> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>> >> > > > > > > > >>Â 1: (ceph::__ceph_assert_fail(char const*, char const*,
>> >> > > > > > > > >> int, char
>> >> > > > > > > > >> const*)+0x8b)
>> >> > > > > > > > >> [0xbc271b]
>> >> > > > > > > > >>Â 2:
>> >> > > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
>> >> > > > > > > > >> >
>> >> > > > > > > > >> const&)+0xb0) [0x82cd50]
>> >> > > > > > > > >>Â 3: (PGPool::update(std::tr1::shared_ptr<OSDMap
>> >> > > > > > > > >> const>)+0x52e) [0x80113e]
>> >> > > > > > > > >>Â 4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
>> >> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
>> >> > > > > > > > >> const>std::vector<int,
>> >> > > > > > > > >> std::allocator<int> >&, int, std::vector<int,
>> >> > > > > > > > >> std::allocator<int>
>> >> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
>> >> > > > > > > > >>Â 5: (OSD::advance_pg(unsigned int, PG*,
>> >> > > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
>> >> > > > > > > > >> std::set<boost::intrusive_ptr<PG>,
>> >> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
>> >> > > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
>> >> > > > > > > > >> [0x6b0e43]
>> >> > > > > > > > >>Â 6: (OSD::process_peering_events(std::list<PG*,
>> >> > > > > > > > >> std::allocator<PG*>
>> >> > > > > > > > >> > const&,
>> >> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
>> >> > > > > > > > >>Â 7: (OSD::PeeringWQ::_process(std::list<PG*,
>> >> > > > > > > > >> std::allocator<PG*>
>> >> > > > > > > > >> > const&,
>> >> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
>> >> > > > > > > > >>Â 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
>> >> > > > > > > > >> [0xbb38ae]
>> >> > > > > > > > >>Â 9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
>> >> > > > > > > > >>Â 10: (()+0x8182) [0x7fd906946182]
>> >> > > > > > > > >>Â 11: (clone()+0x6d) [0x7fd904eb147d]
>> >> > > > > > > > >>
>> >> > > > > > > > >> Also by monitoring (ceph -w) I get the following
>> >> > > > > > > > >> messages, also lots of
>> >> > > > > > > them.
>> >> > > > > > > > >>
>> >> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
>> >> > > > > > > 10.20.0.13:0/1174409'
>> >> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush
>> >> > > > > > > > >> create-or-move",
>> >> > > > "args":
>> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight":
>> 1.82}]:
>> >>
>> >> > > > > > > > >> dispatch
>> >> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
>> >> > > > > > > 10.20.0.13:0/1174483'
>> >> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush
>> >> > > > > > > > >> create-or-move",
>> >> > > > "args":
>> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight":
>> 1.82}]:
>> >>
>> >> > > > > > > > >> dispatch
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are
>> >> > > > > > > > >> also mons and mds's to save servers. All run Ubuntu
>> >> 14.04.2.
>> >> > > > > > > > >>
>> >> > > > > > > > >> I have pretty much tried everything I could think of.
>> >> > > > > > > > >>
>> >> > > > > > > > >> Restarting daemons doesn't help.
>> >> > > > > > > > >>
>> >> > > > > > > > >> Any help would be appreciated. I can also provide more
>> >> > > > > > > > >> logs if necessary. They just seem to get pretty large
>> >> > > > > > > > >> in few
>> >> > > moments.
>> >> > > > > > > > >>
>> >> > > > > > > > >> Thank you
>> >> > > > > > > > >> Tuomas
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >> _______________________________________________
>> >> > > > > > > > >> ceph-users mailing list ceph-users@lists.ceph.com
>> >> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >>
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > _______________________________________________
>> >> > > > > > > > > ceph-users mailing list
>> >> > > > > > > > > ceph-users@lists.ceph.com
>> >> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > _______________________________________________
>> >> > > > > > > > ceph-users mailing list
>> >> > > > > > > > ceph-users@lists.ceph.com
>> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > _______________________________________________
>> >> > > > > > > > ceph-users mailing list
>> >> > > > > > > > ceph-users@lists.ceph.com
>> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> > >
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >
>> >> >
>> >>
>> >
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-05-01 18:13 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <479273764e377f37b81dc6b0ccd55fb3@mail.meizo.com>
[not found] ` <770484917.5624554.1430133524268.JavaMail.zimbra@redhat.com>
[not found] ` <813bbcbbf7d7e7ab4a8e2dba2e5cf6a2@mail.meizo.com>
[not found] ` <1551034631.7094890.1430134900209.JavaMail.zimbra@redhat.com>
[not found] ` <964da36ebed90592d8f5794ac2617a36@mail.meizo.com>
[not found] ` <1226598674.7136470.1430138991322.JavaMail.zimbra@redhat.com>
[not found] ` <76bac95ebd000308018bf900d11fae1e@mail.meizo.com>
[not found] ` <alpine.DEB.2.00.1504270919020.5458@cobra.newdream.net>
[not found] ` <03cd5dfba8f5fec3f80458a92d377a60@mail.meizo.com>
[not found] ` <alpine.DEB.2.00.1504271034560.5458@cobra.newdream.net>
[not found] ` <a06d58aa527edec6225737f18abb055b@mail.meizo.com>
[not found] ` <alpine.DEB.2.00.1504271222002.5458@cobra.newdream.net>
[not found] ` <8bed4ff8a05a8b96ed848e9f1aafa576@mail.meizo.com>
[not found] ` <alpine.DEB.2.00.1504280959280.5458@cobra.newdream.net>
[not found] ` <bb760e0f01a667a582f6bda67cc31684@mail.meizo.com>
[not found] ` <alpine.DEB.2.00.1504281155530.5458@cobra.newdream.net>
[not found] ` <f9adb4b2dcada947f418b6f95ad7a8d1@mail.meizo.com>
2015-04-28 20:19 ` [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down Sage Weil
[not found] ` <alpine.DEB.2.00.1504281256440.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-04-28 20:57 ` Sage Weil
[not found] ` <alpine.DEB.2.00.1504281355130.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-04-29 4:16 ` Tuomas Juntunen
[not found] ` <81216125e573cf00539f61cc090b282b-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
2015-04-29 15:38 ` Sage Weil
[not found] ` <alpine.DEB.2.00.1504290838060.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-04-30 3:31 ` tuomas.juntunen-TGwGjfj4lcphU2BovMVX9g
[not found] ` <928ebb7320e4eb07f14071e997ed7be2-Mp+lKDbUk+6SvdrsE3bNcA@public.gmane.org>
2015-04-30 15:23 ` Sage Weil
2015-04-30 17:27 tuomas.juntunen
2015-05-01 15:10 ` [ceph-users] " tuomas.juntunen
2015-05-01 16:04 ` Sage Weil
2015-05-01 18:13 ` [ceph-users] " tuomas.juntunen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.