From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: Ceph Luminous - pg is down due to src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) Date: Mon, 12 Feb 2018 19:31:42 +0000 (UTC) Message-ID: References: <8f1028e2-d322-0006-30c9-6f7f4922cb40@profihost.ag> <4f670be1-e730-f07c-c1c4-7be8e7ffce85@profihost.ag> <0db57db6-8542-70ae-07d3-beb061a1a66c@profihost.ag> <0edbd7e3-9607-4009-c475-237f514446e6@profihost.ag> <87644c12-981a-f39d-c744-6073c3beb82c@profihost.ag> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Return-path: Received: from mx3-rdu2.redhat.com ([66.187.233.73]:53456 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750869AbeBLTbp (ORCPT ); Mon, 12 Feb 2018 14:31:45 -0500 In-Reply-To: <87644c12-981a-f39d-c744-6073c3beb82c@profihost.ag> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Stefan Priebe - Profihost AG Cc: Gregory Farnum , "ceph-devel@vger.kernel.org" , n.fahldieck@profihost.ag On Mon, 12 Feb 2018, Stefan Priebe - Profihost AG wrote: > Hi Sage, > > did you had any time to look at the logs? Sorry, I thought I replied earlier. See below > Am 05.02.2018 um 14:39 schrieb Stefan Priebe - Profihost AG: > > Hi Sage, > > > > Am 02.02.2018 um 22:05 schrieb Sage Weil: > >> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: > >>> Am 02.02.2018 um 20:28 schrieb Sage Weil: > >>>> On Fri, 2 Feb 2018, Stefan Priebe - Profihost AG wrote: > >>>>> Hi Sage, > >>>>> > >>>>> some days ago i reverted to stock luminous with > >>>>> https://github.com/liewegas/ceph/commit/e6a04446d21689ae131b0b683b4d76c7c30d8e6a > >>>>> + > >>>>> https://github.com/liewegas/ceph/commit/e17493bdffd7b9a349f377e7877fac741221edfa > >>>>> on top. I scrubbed and deep-scrubbed all pgs before and no errors were > >>>>> found. > >>>>> > >>>>> this worked fine until the ceph balancer runned and started remapping > >>>>> some pgs. > >>>>> > >>>>> This again trigged this assert: > >>>>> > >>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>>> SnapMapper::get_snaps(const hobject_t&, SnapMapper::object_snaps*)' > >>>>> thread 7f4289bff700 time 2018-02-02 17:46:33.249579 > >>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>>> > >>>>> Any idea how to fix it? > >>>> > >>>> Cheap workaround is to comment out that assert. I'm still tryign to sort > >>> > >>> Yes i did but i feel unconforable with it as the snapmapper value is > >>> still wrong. > >>> > >>>> out how this happened, though. Not sure if you shared this already, but: > >>>> - bluestore or filestore? > >>> > >>> As i first reported it only filestore. Now also some bluestore OSDs. > >>> > >>>> - no cache tiering, right? nowhere on the cluster? > >>> Yes and Yes. Never used cache tiering at all. > >>> > >>>> It looks like the SnapMapper entry with an empty snaps vector came in via > >>>> add_oid() (update_oid() asserts if the new_snaps is empty; add_oid() > >>>> doesn't have that check). There are a few place it could come from, > >>>> including the on_local_recover() call and the update_snap_map() call. > >>>> That last one is interesting because it pulls the snaps for a new clone > >>>> out of the pg log entry.. maybe those aren't getting populated properly. > >>> > >>> Is it also possible that they got corrupted in the past and now ceph is > >>> unable to detect it and repair it with a scrub or deep scrub? > >>> > >>>> Can you confirm that all of the OSDs on the cluster are running the > >>>> latest? (ceph versions). > >>> > >>> Yes they do - i always check this after each update and i rechecked it > >>> again. > >> > >> Ok thanks! > >> > >> I pushed a branch wip-snapmapper-debug to github.com/liewegas/ceph that > >> has those commits above plus another one that adds debug messages in the > >> various places that the empty snap vectors might be coming from. > >> > >> This won't help it get past the rebalance issue you hit above, but > >> hopefully it will trigger when you create a new clone so we can see how > >> this is happening. > >> > >> If we need to, I think we can ignore the assert (by commenting it out), > >> but one downside there is that we'll see my warnings pop up when > >> recovering those objects so we won't be able to tell where they really > >> came from. Since you were seeing this on new snaps/clones, though, I'm > >> hoping we won't end up in that situation! > > > > the biggest part of the logs is now this stuff: > > 2018-02-05 07:56:32.712366 7fb6893ff700 0 log_channel(cluster) log > > [DBG] : 3.740 scrub ok > > 2018-02-05 08:00:16.033843 7fb68a7fe700 -1 osd.0 pg_epoch: 945191 > > pg[3.bbf( v 945191'63020848 (945191'63019283,945191'63020848] > > local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f > > 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 > > luod=0'0 crt=945191'63020848 lcod 945191'63020847 active] _scan_snaps no > > head for 3:fdd87083:::rbd_data.67b0ee6b8b4567.00000000000103a1:b0d77 > > (have MIN) > > 2018-02-05 08:00:42.123804 7fb6833ff700 -1 osd.0 pg_epoch: 945191 > > pg[3.bbf( v 945191'63020901 (945191'63019383,945191'63020901] > > local-lis/les=944620/944621 n=1612 ec=15/15 lis/c 944620/944620 les/c/f > > 944621/944621/937357 944620/944620/944507) [51,54,0] r=2 lpr=944620 > > luod=0'0 crt=945191'63020901 lcod 945191'63020900 active] _scan_snaps no > > head for 3:fddb41cc:::rbd_data.9772fb6b8b4567.0000000000000380:b0d9f > > (have MIN) > > 2018-02-05 08:03:49.583574 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.caa( v 945191'52090546 (945191'52088948,945191'52090546] > > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > > luod=0'0 crt=945191'52090546 lcod 945191'52090545 active] _scan_snaps no > > head for 3:5530360d:::rbd_data.52a6596b8b4567.0000000000000929:b0d55 > > (have MIN) > > 2018-02-05 08:03:50.432394 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] > > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no > > head for 3:553243fa:::rbd_data.67b0ee6b8b4567.0000000000009956:b0d77 > > (have MIN) > > 2018-02-05 08:03:52.523643 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] > > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no > > head for 3:5536f170:::rbd_data.c1a3d66b8b4567.000000000000b729:b0dc5 > > (have MIN) > > 2018-02-05 08:03:55.471369 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.caa( v 945191'52090549 (945191'52089048,945191'52090549] > > local-lis/les=944694/944695 n=1709 ec=15/15 lis/c 944694/944694 les/c/f > > 944695/944697/937357 944694/944694/944653) [70,0,93] r=1 lpr=944694 > > luod=0'0 crt=945191'52090549 lcod 945191'52090548 active] _scan_snaps no > > head for 3:553d2114:::rbd_data.154a9e6b8b4567.0000000000001ea1:b0e4e > > (have MIN) > > 2018-02-05 08:08:55.440715 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.7aa( v 945191'144763900 (945191'144762311,945191'144763900] > > local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f > > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > > luod=0'0 crt=945191'144763900 lcod 945191'144763899 active] _scan_snaps > > no head for 3:55e14d0f:::rbd_data.67b0ee6b8b4567.00000000000025f2:b0d77 > > (have MIN) > > 2018-02-05 08:09:09.295844 7fb68efff700 -1 osd.0 pg_epoch: 945191 > > pg[3.7aa( v 945191'144763970 (945191'144762411,945191'144763970] > > local-lis/les=944522/944523 n=1761 ec=15/15 lis/c 944522/944522 les/c/f > > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > > luod=0'0 crt=945191'144763970 lcod 945191'144763969 active] _scan_snaps > > no head for 3:55e2d7a9:::rbd_data.e40a416b8b4567.000000000000576f:b0df3 > > (have MIN) > > 2018-02-05 08:09:26.285493 7fb68efff700 -1 osd.0 pg_epoch: 945191 > > pg[3.7aa( v 945191'144764043 (945191'144762511,945191'144764043] > > local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f > > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > > luod=0'0 crt=945191'144764043 lcod 945191'144764042 active] _scan_snaps > > no head for 3:55e4cbad:::rbd_data.163e9c96b8b4567.000000000000202d:b0d42 > > (have MIN) > > 2018-02-05 08:10:36.490452 7fb687bfe700 -1 osd.0 pg_epoch: 945191 > > pg[3.7aa( v 945191'144764243 (945191'144762711,945191'144764243] > > local-lis/les=944522/944523 n=1762 ec=15/15 lis/c 944522/944522 les/c/f > > 944523/944551/937357 944522/944522/944522) [39,37,0] r=2 lpr=944522 > > luod=0'0 crt=945191'144764243 lcod 945191'144764242 active] _scan_snaps > > no head for 3:55ece3c3:::rbd_data.e60f906b8b4567.00000000000000de:b0dd7 > > (have MIN) > > > > Found nothing else in the looks until now. Should i search for more OSDs > > or is this already something we should consider? Yeah, those messages aren't super informative. What would be useful is to check for any of those (or, more importantly the other new messages that branch adds) on *new* object clones for snapshots that have just been created. (Earlier, you mentioned that the errors you were seeing referred to snapshots that had just been created, right? The key question is how those new clones are getting registered in snapmapper with an empty list of snaps.) Thanks! sage > > Stefan > > > >> Thanks! > >> sage > >> > >> > >> > >> > > >>> Stefan > >>> > >>> > >>>>> > >>>>> Greets, > >>>>> Stefan > >>>>> > >>>>> Am 22.01.2018 um 20:01 schrieb Sage Weil: > >>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>> Hi Sage, > >>>>>>> > >>>>>>> thanks! That one fixed it! Now i'm just waiting if the log ever stops > >>>>>>> telling me: > >>>>>>> 2018-01-22 19:48:44.204004 osd.45 [ERR] osd.45 found snap mapper error > >>>>>>> on pg 3.33e oid > >>>>>>> 3:7cc2e79f:::rbd_data.52ab946b8b4567.00000000000043fb:b0d7c snaps in > >>>>>>> mapper: , oi: b0d7c...repaired > >>>>>>> > >>>>>>> Currently it seems it does an endless repair and never ends. > >>>>>> > >>>>>> On many objects or the same object? > >>>>>> > >>>>>> sage > >>>>>> > >>>>>>> > >>>>>>> Greets, > >>>>>>> Stefan > >>>>>>> Am 22.01.2018 um 15:30 schrieb Sage Weil: > >>>>>>>> On Mon, 22 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>> Hi Sage, > >>>>>>>>> > >>>>>>>>> get_snaps returned the error as there are no snaps in the snapmapper. So > >>>>>>>>> it's OK to skip the assert. > >>>>>>>>> > >>>>>>>>> My biggest problem is now that i'm not able to run a scrub on all PGs to > >>>>>>>>> fix all those errors because in around 1-3% the osd seem to deadlock and > >>>>>>>>> needs to get killed to proceed. > >>>>>>>> > >>>>>>>> I believe this will fix it: > >>>>>>>> > >>>>>>>> https://github.com/liewegas/ceph/commit/wip-snapmapper > >>>>>>>> > >>>>>>>> Cherry-pick that and let me know? > >>>>>>>> > >>>>>>>> Thanks! > >>>>>>>> sage > >>>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> The log output is always: > >>>>>>>>> -3> 2018-01-22 14:02:41.725678 7f10025fe700 0 log_channel(cluster) > >>>>>>>>> log [WRN] : slow request 141.951820 seconds old, received at > >>>>>>>>> 2018-01-22 14:00:19.773706: osd_repop(client.142132381.0:1310126 3.a65 > >>>>>>>>> e927921/927902 3:a65be829:::rbd_data.4fbc586b8b4567.00000000 > >>>>>>>>> 00000157:head v 927921'526559757) currently commit_sent > >>>>>>>>> -2> 2018-01-22 14:02:41.725679 7f10025fe700 0 log_channel(cluster) > >>>>>>>>> log [WRN] : slow request 141.643853 seconds old, received at > >>>>>>>>> 2018-01-22 14:00:20.081674: osd_repop(client.139148241.0:15454227 3.ff7 > >>>>>>>>> e927921/927798 3:eff12952:::rbd_data.2494d06b8b4567.0000000 > >>>>>>>>> 00000147d:head v 927921'29105383) currently commit_sent > >>>>>>>>> -1> 2018-01-22 14:02:41.725684 7f10025fe700 0 log_channel(cluster) > >>>>>>>>> log [WRN] : slow request 141.313220 seconds old, received at > >>>>>>>>> 2018-01-22 14:00:20.412307: osd_repop(client.138984198.0:82792023 3.de6 > >>>>>>>>> e927921/927725 3:67b9dc7d:::rbd_data.7750506b8b4567.0000000 > >>>>>>>>> 000000aff:head v 927921'164552063) currently commit_sent > >>>>>>>>> 0> 2018-01-22 14:02:42.426910 7f0fc23fe700 -1 *** Caught signal > >>>>>>>>> (Aborted) ** > >>>>>>>>> in thread 7f0fc23fe700 thread_name:tp_osd_tp > >>>>>>>>> > >>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>> 1: (()+0xa4423c) [0x561d876ab23c] > >>>>>>>>> 2: (()+0xf890) [0x7f100b974890] > >>>>>>>>> 3: (pthread_cond_wait()+0xbf) [0x7f100b97104f] > >>>>>>>>> 4: (ObjectStore::apply_transactions(ObjectStore::Sequencer*, > >>>>>>>>> std::vector >>>>>>>>> saction> >&, Context*)+0x254) [0x561d8746ac54] > >>>>>>>>> 5: (ObjectStore::apply_transaction(ObjectStore::Sequencer*, > >>>>>>>>> ObjectStore::Transaction&&, Context*)+0x5c) [0x561d87199b2c] > >>>>>>>>> 6: (PG::_scan_snaps(ScrubMap&)+0x8f7) [0x561d871ebf77] > >>>>>>>>> 7: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x561d871ecc4f] > >>>>>>>>> 8: (PG::replica_scrub(boost::intrusive_ptr, > >>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x561d871ed574] > >>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr&, > >>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x561d872ac756] > >>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr, > >>>>>>>>> boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>> [0x561d87138bc7] > >>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr > >>>>>>>>> const&)+0x57) [0x561d873b0db7] > >>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561d87167d1c] > >>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>> [0x561d876f42bd] > >>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561d876f6280] > >>>>>>>>> 15: (()+0x8064) [0x7f100b96d064] > >>>>>>>>> 16: (clone()+0x6d) [0x7f100aa6162d] > >>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS ` is > >>>>>>>>> needed to interpret this. > >>>>>>>>> > >>>>>>>>> --- logging levels --- > >>>>>>>>> 0/ 5 none > >>>>>>>>> 0/ 0 lockdep > >>>>>>>>> 0/ 0 context > >>>>>>>>> 0/ 0 crush > >>>>>>>>> 0/ 0 mds > >>>>>>>>> 0/ 0 mds_balancer > >>>>>>>>> 0/ 0 mds_locker > >>>>>>>>> 0/ 0 mds_log > >>>>>>>>> 0/ 0 mds_log_expire > >>>>>>>>> 0/ 0 mds_migrator > >>>>>>>>> 0/ 0 buffer > >>>>>>>>> 0/ 0 timer > >>>>>>>>> 0/ 0 filer > >>>>>>>>> 0/ 1 striper > >>>>>>>>> 0/ 0 objecter > >>>>>>>>> 0/ 0 rados > >>>>>>>>> 0/ 0 rbd > >>>>>>>>> 0/ 5 rbd_mirror > >>>>>>>>> 0/ 5 rbd_replay > >>>>>>>>> 0/ 0 journaler > >>>>>>>>> 0/ 0 objectcacher > >>>>>>>>> 0/ 0 client > >>>>>>>>> 0/ 0 osd > >>>>>>>>> 0/ 0 optracker > >>>>>>>>> 0/ 0 objclass > >>>>>>>>> 0/ 0 filestore > >>>>>>>>> 0/ 0 journal > >>>>>>>>> 0/ 0 ms > >>>>>>>>> 0/ 0 mon > >>>>>>>>> 0/ 0 monc > >>>>>>>>> 0/ 0 paxos > >>>>>>>>> 0/ 0 tp > >>>>>>>>> 0/ 0 auth > >>>>>>>>> 1/ 5 crypto > >>>>>>>>> 0/ 0 finisher > >>>>>>>>> 1/ 1 reserver > >>>>>>>>> 0/ 0 heartbeatmap > >>>>>>>>> 0/ 0 perfcounter > >>>>>>>>> 0/ 0 rgw > >>>>>>>>> 1/10 civetweb > >>>>>>>>> 1/ 5 javaclient > >>>>>>>>> 0/ 0 asok > >>>>>>>>> 0/ 0 throttle > >>>>>>>>> 0/ 0 refs > >>>>>>>>> 1/ 5 xio > >>>>>>>>> 1/ 5 compressor > >>>>>>>>> 1/ 5 bluestore > >>>>>>>>> 1/ 5 bluefs > >>>>>>>>> 1/ 3 bdev > >>>>>>>>> 1/ 5 kstore > >>>>>>>>> 4/ 5 rocksdb > >>>>>>>>> 4/ 5 leveldb > >>>>>>>>> 4/ 5 memdb > >>>>>>>>> 1/ 5 kinetic > >>>>>>>>> 1/ 5 fuse > >>>>>>>>> 1/ 5 mgr > >>>>>>>>> 1/ 5 mgrc > >>>>>>>>> 1/ 5 dpdk > >>>>>>>>> 1/ 5 eventtrace > >>>>>>>>> -2/-2 (syslog threshold) > >>>>>>>>> -1/-1 (stderr threshold) > >>>>>>>>> max_recent 10000 > >>>>>>>>> max_new 1000 > >>>>>>>>> log_file /var/log/ceph/ceph-osd.59.log > >>>>>>>>> --- end dump of recent events --- > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Greets, > >>>>>>>>> Stefan > >>>>>>>>> > >>>>>>>>> Am 21.01.2018 um 21:27 schrieb Sage Weil: > >>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>> Hi Sage, > >>>>>>>>>>> > >>>>>>>>>>> thanks for your reply. I'll do this. What i still don't understand is > >>>>>>>>>>> the following and i hope you might have an idea: > >>>>>>>>>>> 1.) All the snaps mentioned here are fresh - i created them today > >>>>>>>>>>> running luminous? How can they be missing in snapmapper again? > >>>>>>>>>> > >>>>>>>>>> I'm not sure. It might help if you can set debug_osd=1/10 (or 1/20) so > >>>>>>>>>> that when you hit the crash we have some context as to what is going on. > >>>>>>>>>> > >>>>>>>>>> Did you track down which part of get_snaps() is returning the error? > >>>>>>>>>> > >>>>>>>>>>> 2.) How did this happen? The cluster was jewel and was updated to > >>>>>>>>>>> luminous when all this happened. > >>>>>>>>>> > >>>>>>>>>> Did this start right after the upgrade? > >>>>>>>>>> > >>>>>>>>>> I started a PR that relaxes some of these assertions so that they clean up > >>>>>>>>>> instead (unless the debug option is enabled, for our regression testing), > >>>>>>>>>> but I'm still unclear about what the inconsistency is... any loggin you > >>>>>>>>>> can provide would help! > >>>>>>>>>> > >>>>>>>>>> Thanks- > >>>>>>>>>> sage > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Greets, > >>>>>>>>>>> Stefan > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Am 19.01.2018 um 21:19 schrieb Sage Weil: > >>>>>>>>>>>> I'm guessing it is coming from > >>>>>>>>>>>> > >>>>>>>>>>>> object_snaps out; > >>>>>>>>>>>> int r = get_snaps(oid, &out); > >>>>>>>>>>>> if (r < 0) > >>>>>>>>>>>> return r; > >>>>>>>>>>>> > >>>>>>>>>>>> in SnapMapper::update_snaps(). I would add a debug statement to to > >>>>>>>>>>>> confirm that, and possibly also add debug lines to the various return > >>>>>>>>>>>> points in get_snaps() so you can see why get_snaps is returning an error. > >>>>>>>>>>>> > >>>>>>>>>>>> I'm guessing the way to proceed is to warn but ignore the error, but it's > >>>>>>>>>>>> hard to say without seeing what the inconsistency there is and whether we > >>>>>>>>>>>> can tolerate it... > >>>>>>>>>>>> > >>>>>>>>>>>> sage > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, 19 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> HI Sage, > >>>>>>>>>>>>> > >>>>>>>>>>>>> any ideas for this one sawing while doing a deep scrub: > >>>>>>>>>>>>> > >>>>>>>>>>>>> -1> 2018-01-19 21:14:20.474287 7fbcdeffe700 -1 osd.5 pg_epoch: > >>>>>>>>>>>>> 925542 pg[3.306( v 925542'78349255 (925334'78347749,925542'78349255] > >>>>>>>>>>>>> local-lis/les=925503/925504 n=1752 ec=15/15 lis/c 925503/925503 les/c/f > >>>>>>>>>>>>> 925504/925517/0 925503/925503/925503) [75,31,5] r=2 lpr=925503 luod=0'0 > >>>>>>>>>>>>> crt=925542'78349255 lcod 925542'78349254 active] _scan_snaps no head for > >>>>>>>>>>>>> 3:60c0267d:::rbd_data.103fdc6b8b4567.0000000000004c8e:b0ce7 (have MIN) > >>>>>>>>>>>>> > >>>>>>>>>>>>> 0> 2018-01-19 21:14:20.477650 7fbcd3bfe700 -1 > >>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: In function 'void PG::update_snap_map(const > >>>>>>>>>>>>> std::vector&, ObjectStore::Transaction&)' thread > >>>>>>>>>>>>> 7fbcd3bfe700 time 2018-01-19 21:14:20.474752 > >>>>>>>>>>>>> /build/ceph/src/osd/PG.cc: 3420: FAILED assert(r == 0) > >>>>>>>>>>>>> > >>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>> const*)+0x102) [0x55c0100d0372] > >>>>>>>>>>>>> 2: (PG::update_snap_map(std::vector >>>>>>>>>>>>> std::allocator > const&, > >>>>>>>>>>>>> anObjectStore::Transaction&)+0x3a5) [0x55c00fbcfe35] > >>>>>>>>>>>>> 3: (PG::append_log(std::vector >>>>>>>>>>>>> std::allocator > const&, eversion_t, eversion_t, > >>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x540) [0x55c00fbf5fc0] > >>>>>>>>>>>>> 4: (PrimaryLogPG::log_operation(std::vector >>>>>>>>>>>>> std::allocator > const&, > >>>>>>>>>>>>> boost::optional const&, eversion_t const&, > >>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x55c00fceaeb4] > >>>>>>>>>>>>> 5: (ReplicatedBackend::do_repop(boost::intrusive_ptr)+0xa92) > >>>>>>>>>>>>> [0x55c00fdf6642] > >>>>>>>>>>>>> 6: > >>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr)+0x2a4) > >>>>>>>>>>>>> [0x55c00fdfc684] > >>>>>>>>>>>>> 7: (PGBackend::handle_message(boost::intrusive_ptr)+0x50) > >>>>>>>>>>>>> [0x55c00fd22030] > >>>>>>>>>>>>> 8: (PrimaryLogPG::do_request(boost::intrusive_ptr&, > >>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55c00fc8e33b] > >>>>>>>>>>>>> 9: (OSD::dequeue_op(boost::intrusive_ptr, > >>>>>>>>>>>>> boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>> [0x55c00fb1abc7] > >>>>>>>>>>>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr > >>>>>>>>>>>>> const&)+0x57) [0x55c00fd92ad7] > >>>>>>>>>>>>> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55c00fb49d1c] > >>>>>>>>>>>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>> [0x55c0100d5ffd] > >>>>>>>>>>>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c0100d7fc0] > >>>>>>>>>>>>> 14: (()+0x8064) [0x7fbd284fb064] > >>>>>>>>>>>>> 15: (clone()+0x6d) [0x7fbd275ef62d] > >>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS ` is > >>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Stefan > >>>>>>>>>>>>> > >>>>>>>>>>>>> Am 18.01.2018 um 15:24 schrieb Sage Weil: > >>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Am 18.01.2018 um 14:16 schrieb Sage Weil: > >>>>>>>>>>>>>>>> On Thu, 18 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> it also crashes in (marked with HERE): > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> int SnapMapper::get_snaps( > >>>>>>>>>>>>>>>>> const hobject_t &oid, > >>>>>>>>>>>>>>>>> object_snaps *out) > >>>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>>> assert(check(oid)); > >>>>>>>>>>>>>>>>> set keys; > >>>>>>>>>>>>>>>>> map got; > >>>>>>>>>>>>>>>>> keys.insert(to_object_key(oid)); > >>>>>>>>>>>>>>>>> int r = backend.get_keys(keys, &got); > >>>>>>>>>>>>>>>>> if (r < 0) > >>>>>>>>>>>>>>>>> return r; > >>>>>>>>>>>>>>>>> if (got.empty()) > >>>>>>>>>>>>>>>>> return -ENOENT; > >>>>>>>>>>>>>>>>> if (out) { > >>>>>>>>>>>>>>>>> bufferlist::iterator bp = got.begin()->second.begin(); > >>>>>>>>>>>>>>>>> ::decode(*out, bp); > >>>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " " << out->snaps << dendl; > >>>>>>>>>>>>>>>>> assert(!out->snaps.empty()); ########### HERE ########### > >>>>>>>>>>>>>>>>> } else { > >>>>>>>>>>>>>>>>> dout(20) << __func__ << " " << oid << " (out == NULL)" << dendl; > >>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>> return 0; > >>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> is it save to comment that assert? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I think so. All of these asserts are just about ensuring the snapmapper > >>>>>>>>>>>>>>>> state is consistent, and it's only purpose is to find snaps to trim. > >>>>>>>>>>>>>>>> Since your snapmapper metadata is clearly not consistent, there isn't a > >>>>>>>>>>>>>>>> whole lot of risk here. You might want to set nosnaptrim for the time > >>>>>>>>>>>>>>>> being so you don't get a surprise if trimming kicks in. The next step is > >>>>>>>>>>>>>>>> probably going to be to do a scrub and see what it finds, repairs, or > >>>>>>>>>>>>>>>> can't repair. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> snap trimming works fine i already trimmed and removed some snaps. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I was able to get the cluster into a state where all is backfilled. But > >>>>>>>>>>>>>>> enabling / doing deep-scrubs results into this one: > >>>>>>>>>>>>>>> 0> 2018-01-18 13:00:54.843076 7fbd253ff700 -1 > >>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'int > >>>>>>>>>>>>>>> SnapMapper::get_snaps(const h > >>>>>>>>>>>>>>> object_t&, SnapMapper::object_snaps*)' thread 7fbd253ff700 time > >>>>>>>>>>>>>>> 2018-01-18 13:00:54.840396 > >>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 154: FAILED assert(!out->snaps.empty()) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I think if you switch that assert to a warning it will repair... > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> sage > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>> const*)+0x102) [0x5561ce28d1f2] > >>>>>>>>>>>>>>> 2: (SnapMapper::get_snaps(hobject_t const&, > >>>>>>>>>>>>>>> SnapMapper::object_snaps*)+0x46b) [0x5561cdef755b] > >>>>>>>>>>>>>>> 3: (SnapMapper::get_snaps(hobject_t const&, std::set >>>>>>>>>>>>>>> std::less, std::allocator >*)+0xd7) [0x5561cdef7 > >>>>>>>>>>>>>>> 697] > >>>>>>>>>>>>>>> 4: (PG::_scan_snaps(ScrubMap&)+0x692) [0x5561cdd8ad22] > >>>>>>>>>>>>>>> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, > >>>>>>>>>>>>>>> unsigned int, ThreadPool::TPHandle&)+0x22f) [0x5561cdd8bc5f] > >>>>>>>>>>>>>>> 6: (PG::replica_scrub(boost::intrusive_ptr, > >>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x624) [0x5561cdd8c584] > >>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr&, > >>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x8b6) [0x5561cde4b326] > >>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr, > >>>>>>>>>>>>>>> boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>> [0x5561cdcd7bc7] > >>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr > >>>>>>>>>>>>>>> const&)+0x57) [0x5561cdf4f957] > >>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x5561cdd06d1c] > >>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>> [0x5561ce292e7d] > >>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5561ce294e40] > >>>>>>>>>>>>>>> 13: (()+0x8064) [0x7fbd70d86064] > >>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7fbd6fe7a62d] > >>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS ` is > >>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Greets, > >>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> sage > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:56 schrieb Sage Weil: > >>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 19:48 schrieb Sage Weil: > >>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>>>>> Hi Sage, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> this gives me another crash while that pg is recovering: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> 0> 2018-01-17 19:25:09.328935 7f48f8fff700 -1 > >>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: In function 'virtual void > >>>>>>>>>>>>>>>>>>>>> PrimaryLogPG::on_l > >>>>>>>>>>>>>>>>>>>>> ocal_recover(const hobject_t&, const ObjectRecoveryInfo&, > >>>>>>>>>>>>>>>>>>>>> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7f48f8fff700 ti > >>>>>>>>>>>>>>>>>>>>> me 2018-01-17 19:25:09.322287 > >>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != > >>>>>>>>>>>>>>>>>>>>> recovery_info.ss.clone_snaps.end()) > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Is this a cache tiering pool? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> no normal 3 replica but the pg is degraded: > >>>>>>>>>>>>>>>>>>> ceph pg dump | grep 3.80e > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 3.80e 1709 0 1579 0 0 6183674368 > >>>>>>>>>>>>>>>>>>> 10014 10014 active+undersized+degraded+remapped+backfill_wait > >>>>>>>>>>>>>>>>>>> 2018-01-17 19:42:55.840884 918403'70041375 918403:77171331 [50,54,86] > >>>>>>>>>>>>>>>>>>> 50 [39] 39 907737'69776430 2018-01-14 > >>>>>>>>>>>>>>>>>>> 22:19:54.110864 907737'69776430 2018-01-14 22:19:54.110864 > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hrm, no real clues on teh root cause then. Something like this will work > >>>>>>>>>>>>>>>>>> around the current assert: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>>>>> index d42f3a401b..0f76134f74 100644 > >>>>>>>>>>>>>>>>>> --- a/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>>>>> +++ b/src/osd/PrimaryLogPG.cc > >>>>>>>>>>>>>>>>>> @@ -372,8 +372,11 @@ void PrimaryLogPG::on_local_recover( > >>>>>>>>>>>>>>>>>> set snaps; > >>>>>>>>>>>>>>>>>> dout(20) << " snapset " << recovery_info.ss << dendl; > >>>>>>>>>>>>>>>>>> auto p = recovery_info.ss.clone_snaps.find(hoid.snap); > >>>>>>>>>>>>>>>>>> - assert(p != recovery_info.ss.clone_snaps.end()); // hmm, should we warn? > >>>>>>>>>>>>>>>>>> - snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>>>>>>> + if (p != recovery_info.ss.clone_snaps.end()) { > >>>>>>>>>>>>>>>>>> + derr << __func__ << " no clone_snaps for " << hoid << dendl; > >>>>>>>>>>>>>>>>>> + } else { > >>>>>>>>>>>>>>>>>> + snaps.insert(p->second.begin(), p->second.end()); > >>>>>>>>>>>>>>>>>> + } > >>>>>>>>>>>>>>>>>> dout(20) << " snaps " << snaps << dendl; > >>>>>>>>>>>>>>>>>> snap_mapper.add_oid( > >>>>>>>>>>>>>>>>>> recovery_info.soid, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> s > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-94-g92923ef > >>>>>>>>>>>>>>>>>>>>> (92923ef323d32d8321e86703ce1f9016f19472fb) luminous (stable) > >>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x55addb5eb1f2] > >>>>>>>>>>>>>>>>>>>>> 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo > >>>>>>>>>>>>>>>>>>>>> const&, std::shared_ptr, bool, > >>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction*)+0x11f0) [0x55addb1957a0] > >>>>>>>>>>>>>>>>>>>>> 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, > >>>>>>>>>>>>>>>>>>>>> PushReplyOp*, ObjectStore::Transaction*)+0x31d) [0x55addb3071ed] > >>>>>>>>>>>>>>>>>>>>> 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr)+0x18f) > >>>>>>>>>>>>>>>>>>>>> [0x55addb30748f] > >>>>>>>>>>>>>>>>>>>>> 5: > >>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr)+0x2d1) > >>>>>>>>>>>>>>>>>>>>> [0x55addb317531] > >>>>>>>>>>>>>>>>>>>>> 6: (PGBackend::handle_message(boost::intrusive_ptr)+0x50) > >>>>>>>>>>>>>>>>>>>>> [0x55addb23cf10] > >>>>>>>>>>>>>>>>>>>>> 7: (PrimaryLogPG::do_request(boost::intrusive_ptr&, > >>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x55addb1a91eb] > >>>>>>>>>>>>>>>>>>>>> 8: (OSD::dequeue_op(boost::intrusive_ptr, > >>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>>>>> [0x55addb035bc7] > >>>>>>>>>>>>>>>>>>>>> 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr > >>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x55addb2ad957] > >>>>>>>>>>>>>>>>>>>>> 10: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x55addb064d1c] > >>>>>>>>>>>>>>>>>>>>> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>>>>> [0x55addb5f0e7d] > >>>>>>>>>>>>>>>>>>>>> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55addb5f2e40] > >>>>>>>>>>>>>>>>>>>>> 13: (()+0x8064) [0x7f4955b68064] > >>>>>>>>>>>>>>>>>>>>> 14: (clone()+0x6d) [0x7f4954c5c62d] > >>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS ` is > >>>>>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Greets, > >>>>>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Am 17.01.2018 um 15:28 schrieb Sage Weil: > >>>>>>>>>>>>>>>>>>>>>> On Wed, 17 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> i there any chance to fix this instead of removing manually all the clones? > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I believe you can avoid the immediate problem and get the PG up by > >>>>>>>>>>>>>>>>>>>>>> commenting out the assert. set_snaps() will overwrite the object->snap > >>>>>>>>>>>>>>>>>>>>>> list mapping. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> The problem is you'll probably still a stray snapid -> object mapping, so > >>>>>>>>>>>>>>>>>>>>>> when snaptrimming runs you might end up with a PG in the snaptrim_error > >>>>>>>>>>>>>>>>>>>>>> state that won't trim (although from a quick look at the code it won't > >>>>>>>>>>>>>>>>>>>>>> crash). I'd probably remove the assert and deal with that if/when it > >>>>>>>>>>>>>>>>>>>>>> happens. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I'm adding a ticket to relax these asserts for production but keep them > >>>>>>>>>>>>>>>>>>>>>> enabled for qa. This isn't something that needs to take down the OSD! > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> sage > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Stefan > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Am 16.01.2018 um 23:24 schrieb Gregory Farnum: > >>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 15, 2018 at 5:23 PM, Stefan Priebe - Profihost AG > >>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> currently one of my clusters is missing a whole pg due to all 3 osds > >>>>>>>>>>>>>>>>>>>>>>>>> being down. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> All of them fail with: > >>>>>>>>>>>>>>>>>>>>>>>>> 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 > >>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: In function 'void > >>>>>>>>>>>>>>>>>>>>>>>>> SnapMapper::add_oid(const hobject_t&, const std::set&, > >>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction, ceph::buffer::list>*)' > >>>>>>>>>>>>>>>>>>>>>>>>> thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 > >>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> ceph version 12.2.2-93-gd6da8d7 > >>>>>>>>>>>>>>>>>>>>>>>>> (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) > >>>>>>>>>>>>>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>>>>>>>>>>>>>>>>>>>>>>>> const*)+0x102) [0x561f9ff0b1e2] > >>>>>>>>>>>>>>>>>>>>>>>>> 2: (SnapMapper::add_oid(hobject_t const&, std::set >>>>>>>>>>>>>>>>>>>>>>>>> std::less, std::allocator > const&, > >>>>>>>>>>>>>>>>>>>>>>>>> MapCacher::Transaction*)+0x64b) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb76f3b] > >>>>>>>>>>>>>>>>>>>>>>>>> 3: (PG::update_snap_map(std::vector >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator > const&, > >>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] > >>>>>>>>>>>>>>>>>>>>>>>>> 4: (PG::append_log(std::vector >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator > const&, eversion_t, eversion_t, > >>>>>>>>>>>>>>>>>>>>>>>>> ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] > >>>>>>>>>>>>>>>>>>>>>>>>> 5: (PrimaryLogPG::log_operation(std::vector >>>>>>>>>>>>>>>>>>>>>>>>> std::allocator > const&, > >>>>>>>>>>>>>>>>>>>>>>>>> boost::optional const&, eversion_t const&, > >>>>>>>>>>>>>>>>>>>>>>>>> eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] > >>>>>>>>>>>>>>>>>>>>>>>>> 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr)+0xa92) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc314b2] > >>>>>>>>>>>>>>>>>>>>>>>>> 7: > >>>>>>>>>>>>>>>>>>>>>>>>> (ReplicatedBackend::_handle_message(boost::intrusive_ptr)+0x2a4) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fc374f4] > >>>>>>>>>>>>>>>>>>>>>>>>> 8: (PGBackend::handle_message(boost::intrusive_ptr)+0x50) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9fb5cf10] > >>>>>>>>>>>>>>>>>>>>>>>>> 9: (PrimaryLogPG::do_request(boost::intrusive_ptr&, > >>>>>>>>>>>>>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x77b) [0x561f9fac91eb] > >>>>>>>>>>>>>>>>>>>>>>>>> 10: (OSD::dequeue_op(boost::intrusive_ptr, > >>>>>>>>>>>>>>>>>>>>>>>>> boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f7) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9f955bc7] > >>>>>>>>>>>>>>>>>>>>>>>>> 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr > >>>>>>>>>>>>>>>>>>>>>>>>> const&)+0x57) [0x561f9fbcd947] > >>>>>>>>>>>>>>>>>>>>>>>>> 12: (OSD::ShardedOpWQ::_process(unsigned int, > >>>>>>>>>>>>>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x108c) [0x561f9f984d1c] > >>>>>>>>>>>>>>>>>>>>>>>>> 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) > >>>>>>>>>>>>>>>>>>>>>>>>> [0x561f9ff10e6d] > >>>>>>>>>>>>>>>>>>>>>>>>> 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561f9ff12e30] > >>>>>>>>>>>>>>>>>>>>>>>>> 15: (()+0x8064) [0x7f949afcb064] > >>>>>>>>>>>>>>>>>>>>>>>>> 16: (clone()+0x6d) [0x7f949a0bf62d] > >>>>>>>>>>>>>>>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS ` is > >>>>>>>>>>>>>>>>>>>>>>>>> needed to interpret this. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> By the time it gets there, something else has gone wrong. The OSD is > >>>>>>>>>>>>>>>>>>>>>>>> adding a snapid/object pair to its "SnapMapper", and discovering that > >>>>>>>>>>>>>>>>>>>>>>>> there are already entries (which it thinks there shouldn't be). > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> You'll need to post more of a log, along with background, if anybody's > >>>>>>>>>>>>>>>>>>>>>>>> going to diagnose it: is there cache tiering on the cluster? What is > >>>>>>>>>>>>>>>>>>>>>>>> this pool used for? Were there other errors on this PG in the past? > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> I also notice a separate email about deleting the data; I don't have > >>>>>>>>>>>>>>>>>>>>>>>> any experience with this but you'd probably have to export the PG > >>>>>>>>>>>>>>>>>>>>>>>> using ceph-objectstore-tool and then find a way to delete the object > >>>>>>>>>>>>>>>>>>>>>>>> out of it. I see options to remove both an object and > >>>>>>>>>>>>>>>>>>>>>>>> "remove-clone-metadata" on a particular ID, but I've not used any of > >>>>>>>>>>>>>>>>>>>>>>>> them myself. > >>>>>>>>>>>>>>>>>>>>>>>> -Greg > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>> > >>> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > >