* Problem with query and any operation on PGs [not found] <175484591.20170523135449@tlen.pl> @ 2017-05-23 12:48 ` Łukasz Chrustek 2017-05-23 14:17 ` Sage Weil 0 siblings, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-23 12:48 UTC (permalink / raw) To: ceph-devel Cześć, Hello, After terrible outage coused by failure of 10Gbit switch, ceph cluster went to HEALTH_ERR (three whole storage servers go offline in the same time and didn't back in short time). After cluster recovery two PGs goto to incomplite state, I can't them query, and can't do with them anything, what would allow back working cluster back. here is strace of this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off: [root@cc1 ~]# rbd ls management-vms os-mongodb1 os-mongodb1-database os-gitlab-root os-mongodb1-database2 os-wiki-root [root@cc1 ~]# rbd ls volumes ^C [root@cc1 ~]# and for all mon hosts (don't put all three here) [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms os-mongodb1 os-mongodb1-database os-gitlab-root os-mongodb1-database2 os-wiki-root [root@cc1 ~]# rbd -m 192.168.128.1 list volumes ^C [root@cc1 ~]# and all other POOLs from list, except (most important) volumes, I can list images. Fanny thing, I can list rbd info for particular image: [root@cc1 ~]# rbd info volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497 rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497': size 20480 MB in 1280 objects order 24 (16384 kB objects) block_name_prefix: rbd_data.64a21a0a9acf52 format: 2 features: layering flags: parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap overlap: 3072 MB but can't list the whole content of pool volumes. [root@cc1 ~]# ceph osd pool ls volumes images backups volumes-ssd-intel-s3700 management-vms .rgw.root .rgw.control .rgw .rgw.gc .log .users.uid .rgw.buckets.index .users .rgw.buckets.extra .rgw.buckets volumes-cached cache-ssd here is ceph osd tree: ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -7 20.88388 root ssd-intel-s3700 -11 3.19995 host ssd-stor1 56 0.79999 osd.56 up 1.00000 1.00000 57 0.79999 osd.57 up 1.00000 1.00000 58 0.79999 osd.58 up 1.00000 1.00000 59 0.79999 osd.59 up 1.00000 1.00000 -9 2.12999 host ssd-stor2 60 0.70999 osd.60 up 1.00000 1.00000 61 0.70999 osd.61 up 1.00000 1.00000 62 0.70999 osd.62 up 1.00000 1.00000 -8 2.12999 host ssd-stor3 63 0.70999 osd.63 up 1.00000 1.00000 64 0.70999 osd.64 up 1.00000 1.00000 65 0.70999 osd.65 up 1.00000 1.00000 -10 4.19998 host ssd-stor4 25 0.70000 osd.25 up 1.00000 1.00000 26 0.70000 osd.26 up 1.00000 1.00000 27 0.70000 osd.27 up 1.00000 1.00000 28 0.70000 osd.28 up 1.00000 1.00000 29 0.70000 osd.29 up 1.00000 1.00000 24 0.70000 osd.24 up 1.00000 1.00000 -12 3.41199 host ssd-stor5 73 0.85300 osd.73 up 1.00000 1.00000 74 0.85300 osd.74 up 1.00000 1.00000 75 0.85300 osd.75 up 1.00000 1.00000 76 0.85300 osd.76 up 1.00000 1.00000 -13 3.41199 host ssd-stor6 77 0.85300 osd.77 up 1.00000 1.00000 78 0.85300 osd.78 up 1.00000 1.00000 79 0.85300 osd.79 up 1.00000 1.00000 80 0.85300 osd.80 up 1.00000 1.00000 -15 2.39999 host ssd-stor7 90 0.79999 osd.90 up 1.00000 1.00000 91 0.79999 osd.91 up 1.00000 1.00000 92 0.79999 osd.92 up 1.00000 1.00000 -1 167.69969 root default -2 33.99994 host stor1 6 3.39999 osd.6 down 0 1.00000 7 3.39999 osd.7 up 1.00000 1.00000 8 3.39999 osd.8 up 1.00000 1.00000 9 3.39999 osd.9 up 1.00000 1.00000 10 3.39999 osd.10 down 0 1.00000 11 3.39999 osd.11 down 0 1.00000 69 3.39999 osd.69 up 1.00000 1.00000 70 3.39999 osd.70 up 1.00000 1.00000 71 3.39999 osd.71 down 0 1.00000 81 3.39999 osd.81 up 1.00000 1.00000 -3 20.99991 host stor2 13 2.09999 osd.13 up 1.00000 1.00000 12 2.09999 osd.12 up 1.00000 1.00000 14 2.09999 osd.14 up 1.00000 1.00000 15 2.09999 osd.15 up 1.00000 1.00000 16 2.09999 osd.16 up 1.00000 1.00000 17 2.09999 osd.17 up 1.00000 1.00000 18 2.09999 osd.18 down 0 1.00000 19 2.09999 osd.19 up 1.00000 1.00000 20 2.09999 osd.20 up 1.00000 1.00000 21 2.09999 osd.21 up 1.00000 1.00000 -4 25.00000 host stor3 30 2.50000 osd.30 up 1.00000 1.00000 31 2.50000 osd.31 up 1.00000 1.00000 32 2.50000 osd.32 up 1.00000 1.00000 33 2.50000 osd.33 down 0 1.00000 34 2.50000 osd.34 up 1.00000 1.00000 35 2.50000 osd.35 up 1.00000 1.00000 66 2.50000 osd.66 up 1.00000 1.00000 67 2.50000 osd.67 up 1.00000 1.00000 68 2.50000 osd.68 up 1.00000 1.00000 72 2.50000 osd.72 down 0 1.00000 -5 25.00000 host stor4 44 2.50000 osd.44 up 1.00000 1.00000 45 2.50000 osd.45 up 1.00000 1.00000 46 2.50000 osd.46 down 0 1.00000 47 2.50000 osd.47 up 1.00000 1.00000 0 2.50000 osd.0 up 1.00000 1.00000 1 2.50000 osd.1 up 1.00000 1.00000 2 2.50000 osd.2 up 1.00000 1.00000 3 2.50000 osd.3 up 1.00000 1.00000 4 2.50000 osd.4 up 1.00000 1.00000 5 2.50000 osd.5 up 1.00000 1.00000 -6 14.19991 host stor5 48 1.79999 osd.48 up 1.00000 1.00000 49 1.59999 osd.49 up 1.00000 1.00000 50 1.79999 osd.50 up 1.00000 1.00000 51 1.79999 osd.51 down 0 1.00000 52 1.79999 osd.52 up 1.00000 1.00000 53 1.79999 osd.53 up 1.00000 1.00000 54 1.79999 osd.54 up 1.00000 1.00000 55 1.79999 osd.55 up 1.00000 1.00000 -14 14.39999 host stor6 82 1.79999 osd.82 up 1.00000 1.00000 83 1.79999 osd.83 up 1.00000 1.00000 84 1.79999 osd.84 up 1.00000 1.00000 85 1.79999 osd.85 up 1.00000 1.00000 86 1.79999 osd.86 up 1.00000 1.00000 87 1.79999 osd.87 up 1.00000 1.00000 88 1.79999 osd.88 up 1.00000 1.00000 89 1.79999 osd.89 up 1.00000 1.00000 -16 12.59999 host stor7 93 1.79999 osd.93 up 1.00000 1.00000 94 1.79999 osd.94 up 1.00000 1.00000 95 1.79999 osd.95 up 1.00000 1.00000 96 1.79999 osd.96 up 1.00000 1.00000 97 1.79999 osd.97 up 1.00000 1.00000 98 1.79999 osd.98 up 1.00000 1.00000 99 1.79999 osd.99 up 1.00000 1.00000 -17 21.49995 host stor8 22 1.59999 osd.22 up 1.00000 1.00000 23 1.59999 osd.23 up 1.00000 1.00000 36 2.09999 osd.36 up 1.00000 1.00000 37 2.09999 osd.37 up 1.00000 1.00000 38 2.50000 osd.38 up 1.00000 1.00000 39 2.50000 osd.39 up 1.00000 1.00000 40 2.50000 osd.40 up 1.00000 1.00000 41 2.50000 osd.41 down 0 1.00000 42 2.50000 osd.42 up 1.00000 1.00000 43 1.59999 osd.43 up 1.00000 1.00000 [root@cc1 ~]# and ceph health detail: ceph health detail | grep down HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs undersized; recovery 176211/14148564 objects degraded (1.245%); recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,69,40] pg 1.165 is stuck inactive since forever, current state down+remapped+peering, last acting [37] pg 1.60 is stuck unclean since forever, current state down+remapped+peering, last acting [66,69,40] pg 1.165 is stuck unclean since forever, current state down+remapped+peering, last acting [37] pg 1.165 is down+remapped+peering, acting [37] pg 1.60 is down+remapped+peering, acting [66,69,40] problematic pgs are 1.165 and 1.60. Please advice how to unblock pool volumes and/or make this two pgs working - in a last night and day, when we tried to solve this issue these pgs are for 100% empty from data. -- Pozdrowienia, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-23 12:48 ` Problem with query and any operation on PGs Łukasz Chrustek @ 2017-05-23 14:17 ` Sage Weil 2017-05-23 14:43 ` Łukasz Chrustek [not found] ` <1464688590.20170523185052@tlen.pl> 0 siblings, 2 replies; 35+ messages in thread From: Sage Weil @ 2017-05-23 14:17 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 11624 bytes --] On Tue, 23 May 2017, Łukasz Chrustek wrote: > Cześć, > > Hello, > > After terrible outage coused by failure of 10Gbit switch, ceph cluster > went to HEALTH_ERR (three whole storage servers go offline in the same time > and didn't back in short time). After cluster recovery two PGs goto to > incomplite state, I can't them query, and can't do with them anything, The thing where you can't query a PG is because the OSD is throttling incoming work and the throttle is exhausted (the PG can't do work so it isn't making progress). A workaround for jewel is to restart the OSD serving the PG and do the query quickly after that (probably in a loop so that you catch it after it starts up but before the throttle is exhausted again). (In luminous this is fixed.) Once you have the query output ('ceph tell $pgid query') you'll be able to tell what is preventing the PG from peering. You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'. HTH! sage > what would allow back working cluster back. here is strace of > this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off: > > [root@cc1 ~]# rbd ls management-vms > os-mongodb1 > os-mongodb1-database > os-gitlab-root > os-mongodb1-database2 > os-wiki-root > [root@cc1 ~]# rbd ls volumes > ^C > [root@cc1 ~]# > > and for all mon hosts (don't put all three here) > > [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms > os-mongodb1 > os-mongodb1-database > os-gitlab-root > os-mongodb1-database2 > os-wiki-root > [root@cc1 ~]# rbd -m 192.168.128.1 list volumes > ^C > [root@cc1 ~]# > > and all other POOLs from list, except (most important) volumes, I can > list images. > > Fanny thing, I can list rbd info for particular image: > > [root@cc1 ~]# rbd info > volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497 > rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497': > size 20480 MB in 1280 objects > order 24 (16384 kB objects) > block_name_prefix: rbd_data.64a21a0a9acf52 > format: 2 > features: layering > flags: > parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap > overlap: 3072 MB > > but can't list the whole content of pool volumes. > > [root@cc1 ~]# ceph osd pool ls > volumes > images > backups > volumes-ssd-intel-s3700 > management-vms > .rgw.root > .rgw.control > .rgw > .rgw.gc > .log > .users.uid > .rgw.buckets.index > .users > .rgw.buckets.extra > .rgw.buckets > volumes-cached > cache-ssd > > here is ceph osd tree: > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -7 20.88388 root ssd-intel-s3700 > -11 3.19995 host ssd-stor1 > 56 0.79999 osd.56 up 1.00000 1.00000 > 57 0.79999 osd.57 up 1.00000 1.00000 > 58 0.79999 osd.58 up 1.00000 1.00000 > 59 0.79999 osd.59 up 1.00000 1.00000 > -9 2.12999 host ssd-stor2 > 60 0.70999 osd.60 up 1.00000 1.00000 > 61 0.70999 osd.61 up 1.00000 1.00000 > 62 0.70999 osd.62 up 1.00000 1.00000 > -8 2.12999 host ssd-stor3 > 63 0.70999 osd.63 up 1.00000 1.00000 > 64 0.70999 osd.64 up 1.00000 1.00000 > 65 0.70999 osd.65 up 1.00000 1.00000 > -10 4.19998 host ssd-stor4 > 25 0.70000 osd.25 up 1.00000 1.00000 > 26 0.70000 osd.26 up 1.00000 1.00000 > 27 0.70000 osd.27 up 1.00000 1.00000 > 28 0.70000 osd.28 up 1.00000 1.00000 > 29 0.70000 osd.29 up 1.00000 1.00000 > 24 0.70000 osd.24 up 1.00000 1.00000 > -12 3.41199 host ssd-stor5 > 73 0.85300 osd.73 up 1.00000 1.00000 > 74 0.85300 osd.74 up 1.00000 1.00000 > 75 0.85300 osd.75 up 1.00000 1.00000 > 76 0.85300 osd.76 up 1.00000 1.00000 > -13 3.41199 host ssd-stor6 > 77 0.85300 osd.77 up 1.00000 1.00000 > 78 0.85300 osd.78 up 1.00000 1.00000 > 79 0.85300 osd.79 up 1.00000 1.00000 > 80 0.85300 osd.80 up 1.00000 1.00000 > -15 2.39999 host ssd-stor7 > 90 0.79999 osd.90 up 1.00000 1.00000 > 91 0.79999 osd.91 up 1.00000 1.00000 > 92 0.79999 osd.92 up 1.00000 1.00000 > -1 167.69969 root default > -2 33.99994 host stor1 > 6 3.39999 osd.6 down 0 1.00000 > 7 3.39999 osd.7 up 1.00000 1.00000 > 8 3.39999 osd.8 up 1.00000 1.00000 > 9 3.39999 osd.9 up 1.00000 1.00000 > 10 3.39999 osd.10 down 0 1.00000 > 11 3.39999 osd.11 down 0 1.00000 > 69 3.39999 osd.69 up 1.00000 1.00000 > 70 3.39999 osd.70 up 1.00000 1.00000 > 71 3.39999 osd.71 down 0 1.00000 > 81 3.39999 osd.81 up 1.00000 1.00000 > -3 20.99991 host stor2 > 13 2.09999 osd.13 up 1.00000 1.00000 > 12 2.09999 osd.12 up 1.00000 1.00000 > 14 2.09999 osd.14 up 1.00000 1.00000 > 15 2.09999 osd.15 up 1.00000 1.00000 > 16 2.09999 osd.16 up 1.00000 1.00000 > 17 2.09999 osd.17 up 1.00000 1.00000 > 18 2.09999 osd.18 down 0 1.00000 > 19 2.09999 osd.19 up 1.00000 1.00000 > 20 2.09999 osd.20 up 1.00000 1.00000 > 21 2.09999 osd.21 up 1.00000 1.00000 > -4 25.00000 host stor3 > 30 2.50000 osd.30 up 1.00000 1.00000 > 31 2.50000 osd.31 up 1.00000 1.00000 > 32 2.50000 osd.32 up 1.00000 1.00000 > 33 2.50000 osd.33 down 0 1.00000 > 34 2.50000 osd.34 up 1.00000 1.00000 > 35 2.50000 osd.35 up 1.00000 1.00000 > 66 2.50000 osd.66 up 1.00000 1.00000 > 67 2.50000 osd.67 up 1.00000 1.00000 > 68 2.50000 osd.68 up 1.00000 1.00000 > 72 2.50000 osd.72 down 0 1.00000 > -5 25.00000 host stor4 > 44 2.50000 osd.44 up 1.00000 1.00000 > 45 2.50000 osd.45 up 1.00000 1.00000 > 46 2.50000 osd.46 down 0 1.00000 > 47 2.50000 osd.47 up 1.00000 1.00000 > 0 2.50000 osd.0 up 1.00000 1.00000 > 1 2.50000 osd.1 up 1.00000 1.00000 > 2 2.50000 osd.2 up 1.00000 1.00000 > 3 2.50000 osd.3 up 1.00000 1.00000 > 4 2.50000 osd.4 up 1.00000 1.00000 > 5 2.50000 osd.5 up 1.00000 1.00000 > -6 14.19991 host stor5 > 48 1.79999 osd.48 up 1.00000 1.00000 > 49 1.59999 osd.49 up 1.00000 1.00000 > 50 1.79999 osd.50 up 1.00000 1.00000 > 51 1.79999 osd.51 down 0 1.00000 > 52 1.79999 osd.52 up 1.00000 1.00000 > 53 1.79999 osd.53 up 1.00000 1.00000 > 54 1.79999 osd.54 up 1.00000 1.00000 > 55 1.79999 osd.55 up 1.00000 1.00000 > -14 14.39999 host stor6 > 82 1.79999 osd.82 up 1.00000 1.00000 > 83 1.79999 osd.83 up 1.00000 1.00000 > 84 1.79999 osd.84 up 1.00000 1.00000 > 85 1.79999 osd.85 up 1.00000 1.00000 > 86 1.79999 osd.86 up 1.00000 1.00000 > 87 1.79999 osd.87 up 1.00000 1.00000 > 88 1.79999 osd.88 up 1.00000 1.00000 > 89 1.79999 osd.89 up 1.00000 1.00000 > -16 12.59999 host stor7 > 93 1.79999 osd.93 up 1.00000 1.00000 > 94 1.79999 osd.94 up 1.00000 1.00000 > 95 1.79999 osd.95 up 1.00000 1.00000 > 96 1.79999 osd.96 up 1.00000 1.00000 > 97 1.79999 osd.97 up 1.00000 1.00000 > 98 1.79999 osd.98 up 1.00000 1.00000 > 99 1.79999 osd.99 up 1.00000 1.00000 > -17 21.49995 host stor8 > 22 1.59999 osd.22 up 1.00000 1.00000 > 23 1.59999 osd.23 up 1.00000 1.00000 > 36 2.09999 osd.36 up 1.00000 1.00000 > 37 2.09999 osd.37 up 1.00000 1.00000 > 38 2.50000 osd.38 up 1.00000 1.00000 > 39 2.50000 osd.39 up 1.00000 1.00000 > 40 2.50000 osd.40 up 1.00000 1.00000 > 41 2.50000 osd.41 down 0 1.00000 > 42 2.50000 osd.42 up 1.00000 1.00000 > 43 1.59999 osd.43 up 1.00000 1.00000 > [root@cc1 ~]# > > and ceph health detail: > > ceph health detail | grep down > HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs > peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs > undersized; recovery 176211/14148564 objects degraded (1.245%); > recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set > pg 1.60 is stuck inactive since forever, current state > down+remapped+peering, last acting [66,69,40] > pg 1.165 is stuck inactive since forever, current state > down+remapped+peering, last acting [37] > pg 1.60 is stuck unclean since forever, current state > down+remapped+peering, last acting [66,69,40] > pg 1.165 is stuck unclean since forever, current state > down+remapped+peering, last acting [37] > pg 1.165 is down+remapped+peering, acting [37] > pg 1.60 is down+remapped+peering, acting [66,69,40] > > > problematic pgs are 1.165 and 1.60. > > Please advice how to unblock pool volumes and/or make this two pgs > working - in a last night and day, when we tried to solve this issue > these pgs are for 100% empty from data. > > > > > -- > Pozdrowienia, > Łukasz Chrustek > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-23 14:17 ` Sage Weil @ 2017-05-23 14:43 ` Łukasz Chrustek [not found] ` <1464688590.20170523185052@tlen.pl> 1 sibling, 0 replies; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-23 14:43 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Cześć, > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> Cześć, >> >> Hello, >> >> After terrible outage coused by failure of 10Gbit switch, ceph cluster >> went to HEALTH_ERR (three whole storage servers go offline in the same time >> and didn't back in short time). After cluster recovery two PGs goto to >> incomplite state, I can't them query, and can't do with them anything, > The thing where you can't query a PG is because the OSD is throttling > incoming work and the throttle is exhausted (the PG can't do work so it > isn't making progress). A workaround for jewel is to restart the OSD > serving the PG and do the query quickly after that (probably in a loop so > that you catch it after it starts up but before the throttle is > exhausted again). (In luminous this is fixed.) Thank You for claryfication. > Once you have the query output ('ceph tell $pgid query') you'll be able to > tell what is preventing the PG from peering. Hm.. what kind of loop You sugests ? When I do ceph tell $pgid query it hangs, not relasing to the console. > You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'. it is somehting strange here for 1.165, how it is posible, that acting is 37 and it isn't in range of [84,38,48] ?: ceph pg map 1.165 osdmap e114855 pg 1.165 (1.165) -> up [84,38,48] acting [37] second one is ok, but also no ability to make pg query: [root@cc1 ~]# ceph pg map 1.60 osdmap e114855 pg 1.60 (1.60) -> up [66,84,40] acting [66,69,40] do I need to restart all three osds in the same time ? Can You advice how to unblock access to one of pool for this kind of command: [root@cc1 ~]# rbd ls volumes ^C strace for this is here: https://pastebin.com/hpbDg6gP - this time it hangs on some futex function. Are this cases (pg query hang and this rbd ls problem) are connected each other ? If I find solution for this, You will make my day (and night :) ). Regards Lukasz > HTH! > sage >> what would allow back working cluster back. here is strace of >> this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off: >> >> [root@cc1 ~]# rbd ls management-vms >> os-mongodb1 >> os-mongodb1-database >> os-gitlab-root >> os-mongodb1-database2 >> os-wiki-root >> [root@cc1 ~]# rbd ls volumes >> ^C >> [root@cc1 ~]# >> >> and for all mon hosts (don't put all three here) >> >> [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms >> os-mongodb1 >> os-mongodb1-database >> os-gitlab-root >> os-mongodb1-database2 >> os-wiki-root >> [root@cc1 ~]# rbd -m 192.168.128.1 list volumes >> ^C >> [root@cc1 ~]# >> >> and all other POOLs from list, except (most important) volumes, I can >> list images. >> >> Fanny thing, I can list rbd info for particular image: >> >> [root@cc1 ~]# rbd info >> volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497 >> rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497': >> size 20480 MB in 1280 objects >> order 24 (16384 kB objects) >> block_name_prefix: rbd_data.64a21a0a9acf52 >> format: 2 >> features: layering >> flags: >> parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap >> overlap: 3072 MB >> >> but can't list the whole content of pool volumes. >> >> [root@cc1 ~]# ceph osd pool ls >> volumes >> images >> backups >> volumes-ssd-intel-s3700 >> management-vms >> .rgw.root >> .rgw.control >> .rgw >> .rgw.gc >> .log >> .users.uid >> .rgw.buckets.index >> .users >> .rgw.buckets.extra >> .rgw.buckets >> volumes-cached >> cache-ssd >> >> here is ceph osd tree: >> >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -7 20.88388 root ssd-intel-s3700 >> -11 3.19995 host ssd-stor1 >> 56 0.79999 osd.56 up 1.00000 1.00000 >> 57 0.79999 osd.57 up 1.00000 1.00000 >> 58 0.79999 osd.58 up 1.00000 1.00000 >> 59 0.79999 osd.59 up 1.00000 1.00000 >> -9 2.12999 host ssd-stor2 >> 60 0.70999 osd.60 up 1.00000 1.00000 >> 61 0.70999 osd.61 up 1.00000 1.00000 >> 62 0.70999 osd.62 up 1.00000 1.00000 >> -8 2.12999 host ssd-stor3 >> 63 0.70999 osd.63 up 1.00000 1.00000 >> 64 0.70999 osd.64 up 1.00000 1.00000 >> 65 0.70999 osd.65 up 1.00000 1.00000 >> -10 4.19998 host ssd-stor4 >> 25 0.70000 osd.25 up 1.00000 1.00000 >> 26 0.70000 osd.26 up 1.00000 1.00000 >> 27 0.70000 osd.27 up 1.00000 1.00000 >> 28 0.70000 osd.28 up 1.00000 1.00000 >> 29 0.70000 osd.29 up 1.00000 1.00000 >> 24 0.70000 osd.24 up 1.00000 1.00000 >> -12 3.41199 host ssd-stor5 >> 73 0.85300 osd.73 up 1.00000 1.00000 >> 74 0.85300 osd.74 up 1.00000 1.00000 >> 75 0.85300 osd.75 up 1.00000 1.00000 >> 76 0.85300 osd.76 up 1.00000 1.00000 >> -13 3.41199 host ssd-stor6 >> 77 0.85300 osd.77 up 1.00000 1.00000 >> 78 0.85300 osd.78 up 1.00000 1.00000 >> 79 0.85300 osd.79 up 1.00000 1.00000 >> 80 0.85300 osd.80 up 1.00000 1.00000 >> -15 2.39999 host ssd-stor7 >> 90 0.79999 osd.90 up 1.00000 1.00000 >> 91 0.79999 osd.91 up 1.00000 1.00000 >> 92 0.79999 osd.92 up 1.00000 1.00000 >> -1 167.69969 root default >> -2 33.99994 host stor1 >> 6 3.39999 osd.6 down 0 1.00000 >> 7 3.39999 osd.7 up 1.00000 1.00000 >> 8 3.39999 osd.8 up 1.00000 1.00000 >> 9 3.39999 osd.9 up 1.00000 1.00000 >> 10 3.39999 osd.10 down 0 1.00000 >> 11 3.39999 osd.11 down 0 1.00000 >> 69 3.39999 osd.69 up 1.00000 1.00000 >> 70 3.39999 osd.70 up 1.00000 1.00000 >> 71 3.39999 osd.71 down 0 1.00000 >> 81 3.39999 osd.81 up 1.00000 1.00000 >> -3 20.99991 host stor2 >> 13 2.09999 osd.13 up 1.00000 1.00000 >> 12 2.09999 osd.12 up 1.00000 1.00000 >> 14 2.09999 osd.14 up 1.00000 1.00000 >> 15 2.09999 osd.15 up 1.00000 1.00000 >> 16 2.09999 osd.16 up 1.00000 1.00000 >> 17 2.09999 osd.17 up 1.00000 1.00000 >> 18 2.09999 osd.18 down 0 1.00000 >> 19 2.09999 osd.19 up 1.00000 1.00000 >> 20 2.09999 osd.20 up 1.00000 1.00000 >> 21 2.09999 osd.21 up 1.00000 1.00000 >> -4 25.00000 host stor3 >> 30 2.50000 osd.30 up 1.00000 1.00000 >> 31 2.50000 osd.31 up 1.00000 1.00000 >> 32 2.50000 osd.32 up 1.00000 1.00000 >> 33 2.50000 osd.33 down 0 1.00000 >> 34 2.50000 osd.34 up 1.00000 1.00000 >> 35 2.50000 osd.35 up 1.00000 1.00000 >> 66 2.50000 osd.66 up 1.00000 1.00000 >> 67 2.50000 osd.67 up 1.00000 1.00000 >> 68 2.50000 osd.68 up 1.00000 1.00000 >> 72 2.50000 osd.72 down 0 1.00000 >> -5 25.00000 host stor4 >> 44 2.50000 osd.44 up 1.00000 1.00000 >> 45 2.50000 osd.45 up 1.00000 1.00000 >> 46 2.50000 osd.46 down 0 1.00000 >> 47 2.50000 osd.47 up 1.00000 1.00000 >> 0 2.50000 osd.0 up 1.00000 1.00000 >> 1 2.50000 osd.1 up 1.00000 1.00000 >> 2 2.50000 osd.2 up 1.00000 1.00000 >> 3 2.50000 osd.3 up 1.00000 1.00000 >> 4 2.50000 osd.4 up 1.00000 1.00000 >> 5 2.50000 osd.5 up 1.00000 1.00000 >> -6 14.19991 host stor5 >> 48 1.79999 osd.48 up 1.00000 1.00000 >> 49 1.59999 osd.49 up 1.00000 1.00000 >> 50 1.79999 osd.50 up 1.00000 1.00000 >> 51 1.79999 osd.51 down 0 1.00000 >> 52 1.79999 osd.52 up 1.00000 1.00000 >> 53 1.79999 osd.53 up 1.00000 1.00000 >> 54 1.79999 osd.54 up 1.00000 1.00000 >> 55 1.79999 osd.55 up 1.00000 1.00000 >> -14 14.39999 host stor6 >> 82 1.79999 osd.82 up 1.00000 1.00000 >> 83 1.79999 osd.83 up 1.00000 1.00000 >> 84 1.79999 osd.84 up 1.00000 1.00000 >> 85 1.79999 osd.85 up 1.00000 1.00000 >> 86 1.79999 osd.86 up 1.00000 1.00000 >> 87 1.79999 osd.87 up 1.00000 1.00000 >> 88 1.79999 osd.88 up 1.00000 1.00000 >> 89 1.79999 osd.89 up 1.00000 1.00000 >> -16 12.59999 host stor7 >> 93 1.79999 osd.93 up 1.00000 1.00000 >> 94 1.79999 osd.94 up 1.00000 1.00000 >> 95 1.79999 osd.95 up 1.00000 1.00000 >> 96 1.79999 osd.96 up 1.00000 1.00000 >> 97 1.79999 osd.97 up 1.00000 1.00000 >> 98 1.79999 osd.98 up 1.00000 1.00000 >> 99 1.79999 osd.99 up 1.00000 1.00000 >> -17 21.49995 host stor8 >> 22 1.59999 osd.22 up 1.00000 1.00000 >> 23 1.59999 osd.23 up 1.00000 1.00000 >> 36 2.09999 osd.36 up 1.00000 1.00000 >> 37 2.09999 osd.37 up 1.00000 1.00000 >> 38 2.50000 osd.38 up 1.00000 1.00000 >> 39 2.50000 osd.39 up 1.00000 1.00000 >> 40 2.50000 osd.40 up 1.00000 1.00000 >> 41 2.50000 osd.41 down 0 1.00000 >> 42 2.50000 osd.42 up 1.00000 1.00000 >> 43 1.59999 osd.43 up 1.00000 1.00000 >> [root@cc1 ~]# >> >> and ceph health detail: >> >> ceph health detail | grep down >> HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs >> peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs >> undersized; recovery 176211/14148564 objects degraded (1.245%); >> recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set >> pg 1.60 is stuck inactive since forever, current state >> down+remapped+peering, last acting [66,69,40] >> pg 1.165 is stuck inactive since forever, current state >> down+remapped+peering, last acting [37] >> pg 1.60 is stuck unclean since forever, current state >> down+remapped+peering, last acting [66,69,40] >> pg 1.165 is stuck unclean since forever, current state >> down+remapped+peering, last acting [37] >> pg 1.165 is down+remapped+peering, acting [37] >> pg 1.60 is down+remapped+peering, acting [66,69,40] >> >> >> problematic pgs are 1.165 and 1.60. >> >> Please advice how to unblock pool volumes and/or make this two pgs >> working - in a last night and day, when we tried to solve this issue >> these pgs are for 100% empty from data. >> >> >> >> >> -- >> Pozdrowienia, >> Łukasz Chrustek >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- Pozdrowienia, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <1464688590.20170523185052@tlen.pl>]
* Re: Problem with query and any operation on PGs [not found] ` <1464688590.20170523185052@tlen.pl> @ 2017-05-23 17:40 ` Sage Weil 2017-05-23 21:43 ` Łukasz Chrustek 0 siblings, 1 reply; 35+ messages in thread From: Sage Weil @ 2017-05-23 17:40 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 13855 bytes --] On Tue, 23 May 2017, Łukasz Chrustek wrote: > I'm not sleeping for over 30 hours, and still can't find solution. I > did, as You wrote, but turning off this > (https://pastebin.com/1npBXeMV) osds didn't resolve issue... The important bit is: "blocked": "peering is blocked due to down osds", "down_osds_we_would_probe": [ 6, 10, 33, 37, 72 ], "peering_blocked_by": [ { "osd": 6, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" }, { "osd": 10, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" }, { "osd": 37, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" }, { "osd": 72, "current_lost_at": 113771, "comment": "starting or marking this osd lost may let us proceed" } ] }, Are any of those OSDs startable? sage > > Regards > Lukasz Chrustek > > > > On Tue, 23 May 2017, Łukasz Chrustek wrote: > >> Cześć, > >> > >> Hello, > >> > >> After terrible outage coused by failure of 10Gbit switch, ceph cluster > >> went to HEALTH_ERR (three whole storage servers go offline in the same time > >> and didn't back in short time). After cluster recovery two PGs goto to > >> incomplite state, I can't them query, and can't do with them anything, > > > The thing where you can't query a PG is because the OSD is throttling > > incoming work and the throttle is exhausted (the PG can't do work so it > > isn't making progress). A workaround for jewel is to restart the OSD > > serving the PG and do the query quickly after that (probably in a loop so > > that you catch it after it starts up but before the throttle is > > exhausted again). (In luminous this is fixed.) > > > Once you have the query output ('ceph tell $pgid query') you'll be able to > > tell what is preventing the PG from peering. > > > You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'. > > > HTH! > > sage > > > >> what would allow back working cluster back. here is strace of > >> this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off: > >> > >> [root@cc1 ~]# rbd ls management-vms > >> os-mongodb1 > >> os-mongodb1-database > >> os-gitlab-root > >> os-mongodb1-database2 > >> os-wiki-root > >> [root@cc1 ~]# rbd ls volumes > >> ^C > >> [root@cc1 ~]# > >> > >> and for all mon hosts (don't put all three here) > >> > >> [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms > >> os-mongodb1 > >> os-mongodb1-database > >> os-gitlab-root > >> os-mongodb1-database2 > >> os-wiki-root > >> [root@cc1 ~]# rbd -m 192.168.128.1 list volumes > >> ^C > >> [root@cc1 ~]# > >> > >> and all other POOLs from list, except (most important) volumes, I can > >> list images. > >> > >> Fanny thing, I can list rbd info for particular image: > >> > >> [root@cc1 ~]# rbd info > >> volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497 > >> rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497': > >> size 20480 MB in 1280 objects > >> order 24 (16384 kB objects) > >> block_name_prefix: rbd_data.64a21a0a9acf52 > >> format: 2 > >> features: layering > >> flags: > >> parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap > >> overlap: 3072 MB > >> > >> but can't list the whole content of pool volumes. > >> > >> [root@cc1 ~]# ceph osd pool ls > >> volumes > >> images > >> backups > >> volumes-ssd-intel-s3700 > >> management-vms > >> .rgw.root > >> .rgw.control > >> .rgw > >> .rgw.gc > >> .log > >> .users.uid > >> .rgw.buckets.index > >> .users > >> .rgw.buckets.extra > >> .rgw.buckets > >> volumes-cached > >> cache-ssd > >> > >> here is ceph osd tree: > >> > >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > >> -7 20.88388 root ssd-intel-s3700 > >> -11 3.19995 host ssd-stor1 > >> 56 0.79999 osd.56 up 1.00000 1.00000 > >> 57 0.79999 osd.57 up 1.00000 1.00000 > >> 58 0.79999 osd.58 up 1.00000 1.00000 > >> 59 0.79999 osd.59 up 1.00000 1.00000 > >> -9 2.12999 host ssd-stor2 > >> 60 0.70999 osd.60 up 1.00000 1.00000 > >> 61 0.70999 osd.61 up 1.00000 1.00000 > >> 62 0.70999 osd.62 up 1.00000 1.00000 > >> -8 2.12999 host ssd-stor3 > >> 63 0.70999 osd.63 up 1.00000 1.00000 > >> 64 0.70999 osd.64 up 1.00000 1.00000 > >> 65 0.70999 osd.65 up 1.00000 1.00000 > >> -10 4.19998 host ssd-stor4 > >> 25 0.70000 osd.25 up 1.00000 1.00000 > >> 26 0.70000 osd.26 up 1.00000 1.00000 > >> 27 0.70000 osd.27 up 1.00000 1.00000 > >> 28 0.70000 osd.28 up 1.00000 1.00000 > >> 29 0.70000 osd.29 up 1.00000 1.00000 > >> 24 0.70000 osd.24 up 1.00000 1.00000 > >> -12 3.41199 host ssd-stor5 > >> 73 0.85300 osd.73 up 1.00000 1.00000 > >> 74 0.85300 osd.74 up 1.00000 1.00000 > >> 75 0.85300 osd.75 up 1.00000 1.00000 > >> 76 0.85300 osd.76 up 1.00000 1.00000 > >> -13 3.41199 host ssd-stor6 > >> 77 0.85300 osd.77 up 1.00000 1.00000 > >> 78 0.85300 osd.78 up 1.00000 1.00000 > >> 79 0.85300 osd.79 up 1.00000 1.00000 > >> 80 0.85300 osd.80 up 1.00000 1.00000 > >> -15 2.39999 host ssd-stor7 > >> 90 0.79999 osd.90 up 1.00000 1.00000 > >> 91 0.79999 osd.91 up 1.00000 1.00000 > >> 92 0.79999 osd.92 up 1.00000 1.00000 > >> -1 167.69969 root default > >> -2 33.99994 host stor1 > >> 6 3.39999 osd.6 down 0 1.00000 > >> 7 3.39999 osd.7 up 1.00000 1.00000 > >> 8 3.39999 osd.8 up 1.00000 1.00000 > >> 9 3.39999 osd.9 up 1.00000 1.00000 > >> 10 3.39999 osd.10 down 0 1.00000 > >> 11 3.39999 osd.11 down 0 1.00000 > >> 69 3.39999 osd.69 up 1.00000 1.00000 > >> 70 3.39999 osd.70 up 1.00000 1.00000 > >> 71 3.39999 osd.71 down 0 1.00000 > >> 81 3.39999 osd.81 up 1.00000 1.00000 > >> -3 20.99991 host stor2 > >> 13 2.09999 osd.13 up 1.00000 1.00000 > >> 12 2.09999 osd.12 up 1.00000 1.00000 > >> 14 2.09999 osd.14 up 1.00000 1.00000 > >> 15 2.09999 osd.15 up 1.00000 1.00000 > >> 16 2.09999 osd.16 up 1.00000 1.00000 > >> 17 2.09999 osd.17 up 1.00000 1.00000 > >> 18 2.09999 osd.18 down 0 1.00000 > >> 19 2.09999 osd.19 up 1.00000 1.00000 > >> 20 2.09999 osd.20 up 1.00000 1.00000 > >> 21 2.09999 osd.21 up 1.00000 1.00000 > >> -4 25.00000 host stor3 > >> 30 2.50000 osd.30 up 1.00000 1.00000 > >> 31 2.50000 osd.31 up 1.00000 1.00000 > >> 32 2.50000 osd.32 up 1.00000 1.00000 > >> 33 2.50000 osd.33 down 0 1.00000 > >> 34 2.50000 osd.34 up 1.00000 1.00000 > >> 35 2.50000 osd.35 up 1.00000 1.00000 > >> 66 2.50000 osd.66 up 1.00000 1.00000 > >> 67 2.50000 osd.67 up 1.00000 1.00000 > >> 68 2.50000 osd.68 up 1.00000 1.00000 > >> 72 2.50000 osd.72 down 0 1.00000 > >> -5 25.00000 host stor4 > >> 44 2.50000 osd.44 up 1.00000 1.00000 > >> 45 2.50000 osd.45 up 1.00000 1.00000 > >> 46 2.50000 osd.46 down 0 1.00000 > >> 47 2.50000 osd.47 up 1.00000 1.00000 > >> 0 2.50000 osd.0 up 1.00000 1.00000 > >> 1 2.50000 osd.1 up 1.00000 1.00000 > >> 2 2.50000 osd.2 up 1.00000 1.00000 > >> 3 2.50000 osd.3 up 1.00000 1.00000 > >> 4 2.50000 osd.4 up 1.00000 1.00000 > >> 5 2.50000 osd.5 up 1.00000 1.00000 > >> -6 14.19991 host stor5 > >> 48 1.79999 osd.48 up 1.00000 1.00000 > >> 49 1.59999 osd.49 up 1.00000 1.00000 > >> 50 1.79999 osd.50 up 1.00000 1.00000 > >> 51 1.79999 osd.51 down 0 1.00000 > >> 52 1.79999 osd.52 up 1.00000 1.00000 > >> 53 1.79999 osd.53 up 1.00000 1.00000 > >> 54 1.79999 osd.54 up 1.00000 1.00000 > >> 55 1.79999 osd.55 up 1.00000 1.00000 > >> -14 14.39999 host stor6 > >> 82 1.79999 osd.82 up 1.00000 1.00000 > >> 83 1.79999 osd.83 up 1.00000 1.00000 > >> 84 1.79999 osd.84 up 1.00000 1.00000 > >> 85 1.79999 osd.85 up 1.00000 1.00000 > >> 86 1.79999 osd.86 up 1.00000 1.00000 > >> 87 1.79999 osd.87 up 1.00000 1.00000 > >> 88 1.79999 osd.88 up 1.00000 1.00000 > >> 89 1.79999 osd.89 up 1.00000 1.00000 > >> -16 12.59999 host stor7 > >> 93 1.79999 osd.93 up 1.00000 1.00000 > >> 94 1.79999 osd.94 up 1.00000 1.00000 > >> 95 1.79999 osd.95 up 1.00000 1.00000 > >> 96 1.79999 osd.96 up 1.00000 1.00000 > >> 97 1.79999 osd.97 up 1.00000 1.00000 > >> 98 1.79999 osd.98 up 1.00000 1.00000 > >> 99 1.79999 osd.99 up 1.00000 1.00000 > >> -17 21.49995 host stor8 > >> 22 1.59999 osd.22 up 1.00000 1.00000 > >> 23 1.59999 osd.23 up 1.00000 1.00000 > >> 36 2.09999 osd.36 up 1.00000 1.00000 > >> 37 2.09999 osd.37 up 1.00000 1.00000 > >> 38 2.50000 osd.38 up 1.00000 1.00000 > >> 39 2.50000 osd.39 up 1.00000 1.00000 > >> 40 2.50000 osd.40 up 1.00000 1.00000 > >> 41 2.50000 osd.41 down 0 1.00000 > >> 42 2.50000 osd.42 up 1.00000 1.00000 > >> 43 1.59999 osd.43 up 1.00000 1.00000 > >> [root@cc1 ~]# > >> > >> and ceph health detail: > >> > >> ceph health detail | grep down > >> HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs > >> peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs > >> undersized; recovery 176211/14148564 objects degraded (1.245%); > >> recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set > >> pg 1.60 is stuck inactive since forever, current state > >> down+remapped+peering, last acting [66,69,40] > >> pg 1.165 is stuck inactive since forever, current state > >> down+remapped+peering, last acting [37] > >> pg 1.60 is stuck unclean since forever, current state > >> down+remapped+peering, last acting [66,69,40] > >> pg 1.165 is stuck unclean since forever, current state > >> down+remapped+peering, last acting [37] > >> pg 1.165 is down+remapped+peering, acting [37] > >> pg 1.60 is down+remapped+peering, acting [66,69,40] > >> > >> > >> problematic pgs are 1.165 and 1.60. > >> > >> Please advice how to unblock pool volumes and/or make this two pgs > >> working - in a last night and day, when we tried to solve this issue > >> these pgs are for 100% empty from data. > >> > >> > >> > >> > >> -- > >> Pozdrowienia, > >> Łukasz Chrustek > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > > > > -- > Pozdrowienia, > Łukasz Chrustek > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-23 17:40 ` Sage Weil @ 2017-05-23 21:43 ` Łukasz Chrustek 2017-05-23 21:48 ` Sage Weil 0 siblings, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-23 21:43 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Cześć, > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> I'm not sleeping for over 30 hours, and still can't find solution. I >> did, as You wrote, but turning off this >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... > The important bit is: > "blocked": "peering is blocked due to down osds", > "down_osds_we_would_probe": [ > 6, > 10, > 33, > 37, > 72 > ], > "peering_blocked_by": [ > { > "osd": 6, > "current_lost_at": 0, > "comment": "starting or marking this osd lost may let > us proceed" > }, > { > "osd": 10, > "current_lost_at": 0, > "comment": "starting or marking this osd lost may let > us proceed" > }, > { > "osd": 37, > "current_lost_at": 0, > "comment": "starting or marking this osd lost may let > us proceed" > }, > { > "osd": 72, > "current_lost_at": 113771, > "comment": "starting or marking this osd lost may let > us proceed" > } > ] > }, > Are any of those OSDs startable? They were all up and running - but I decided to shut them down and out them from ceph, now it looks like ceph working ok, but still two PGs are in down state, how to get rid of it ? ceph health detail HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive pg 1.165 is stuck inactive since forever, current state down+remapped+peering, last acting [38,48] pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40] pg 1.60 is down+remapped+peering, acting [66,40] pg 1.165 is down+remapped+peering, acting [38,48] [root@cc1 ~]# ceph -s cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60 health HEALTH_WARN 2 pgs down 2 pgs peering 2 pgs stuck inactive monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0} election epoch 872, quorum 0,1,2 cc1,cc2,cc3 osdmap e115175: 100 osds: 88 up, 86 in; 2 remapped pgs pgmap v67583069: 3520 pgs, 17 pools, 26675 GB data, 4849 kobjects 76638 GB used, 107 TB / 182 TB avail 3515 active+clean 3 active+clean+scrubbing+deep 2 down+remapped+peering client io 0 B/s rd, 869 kB/s wr, 14 op/s rd, 113 op/s wr -- Regards Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-23 21:43 ` Łukasz Chrustek @ 2017-05-23 21:48 ` Sage Weil 2017-05-24 13:19 ` Łukasz Chrustek 0 siblings, 1 reply; 35+ messages in thread From: Sage Weil @ 2017-05-23 21:48 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 3462 bytes --] On Tue, 23 May 2017, Łukasz Chrustek wrote: > Cześć, > > > On Tue, 23 May 2017, Łukasz Chrustek wrote: > >> I'm not sleeping for over 30 hours, and still can't find solution. I > >> did, as You wrote, but turning off this > >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... > > > The important bit is: > > > "blocked": "peering is blocked due to down osds", > > "down_osds_we_would_probe": [ > > 6, > > 10, > > 33, > > 37, > > 72 > > ], > > "peering_blocked_by": [ > > { > > "osd": 6, > > "current_lost_at": 0, > > "comment": "starting or marking this osd lost may let > > us proceed" > > }, > > { > > "osd": 10, > > "current_lost_at": 0, > > "comment": "starting or marking this osd lost may let > > us proceed" > > }, > > { > > "osd": 37, > > "current_lost_at": 0, > > "comment": "starting or marking this osd lost may let > > us proceed" > > }, > > { > > "osd": 72, > > "current_lost_at": 113771, > > "comment": "starting or marking this osd lost may let > > us proceed" > > } > > ] > > }, > > > Are any of those OSDs startable? > > They were all up and running - but I decided to shut them down and out > them from ceph, now it looks like ceph working ok, but still two PGs > are in down state, how to get rid of it ? If you haven't deleted the data, you should start the OSDs back up. If they are partially damanged you can use ceph-objectstore-tool to extract just the PGs in question to make sure you haven't lost anything, inject them on some other OSD(s) and restart those, and *then* mark the bad OSDs as 'lost'. If all else fails, you can just mark those OSDs 'lost', but in doing so you might be telling the cluster to lose data. The best thing to do is definitely to get those OSDs started again. sage > > ceph health detail > HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive > pg 1.165 is stuck inactive since forever, current state down+remapped+peering, last acting [38,48] > pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40] > pg 1.60 is down+remapped+peering, acting [66,40] > pg 1.165 is down+remapped+peering, acting [38,48] > [root@cc1 ~]# ceph -s > cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60 > health HEALTH_WARN > 2 pgs down > 2 pgs peering > 2 pgs stuck inactive > monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0} > election epoch 872, quorum 0,1,2 cc1,cc2,cc3 > osdmap e115175: 100 osds: 88 up, 86 in; 2 remapped pgs > pgmap v67583069: 3520 pgs, 17 pools, 26675 GB data, 4849 kobjects > 76638 GB used, 107 TB / 182 TB avail > 3515 active+clean > 3 active+clean+scrubbing+deep > 2 down+remapped+peering > client io 0 B/s rd, 869 kB/s wr, 14 op/s rd, 113 op/s wr > > -- > Regards > Łukasz Chrustek > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-23 21:48 ` Sage Weil @ 2017-05-24 13:19 ` Łukasz Chrustek 2017-05-24 13:37 ` Sage Weil 0 siblings, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 13:19 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Cześć, > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> Cześć, >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> I'm not sleeping for over 30 hours, and still can't find solution. I >> >> did, as You wrote, but turning off this >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... >> >> > The important bit is: >> >> > "blocked": "peering is blocked due to down osds", >> > "down_osds_we_would_probe": [ >> > 6, >> > 10, >> > 33, >> > 37, >> > 72 >> > ], >> > "peering_blocked_by": [ >> > { >> > "osd": 6, >> > "current_lost_at": 0, >> > "comment": "starting or marking this osd lost may let >> > us proceed" >> > }, >> > { >> > "osd": 10, >> > "current_lost_at": 0, >> > "comment": "starting or marking this osd lost may let >> > us proceed" >> > }, >> > { >> > "osd": 37, >> > "current_lost_at": 0, >> > "comment": "starting or marking this osd lost may let >> > us proceed" >> > }, >> > { >> > "osd": 72, >> > "current_lost_at": 113771, >> > "comment": "starting or marking this osd lost may let >> > us proceed" >> > } >> > ] >> > }, >> >> > Are any of those OSDs startable? >> >> They were all up and running - but I decided to shut them down and out >> them from ceph, now it looks like ceph working ok, but still two PGs >> are in down state, how to get rid of it ? > If you haven't deleted the data, you should start the OSDs back up. > If they are partially damanged you can use ceph-objectstore-tool to > extract just the PGs in question to make sure you haven't lost anything, > inject them on some other OSD(s) and restart those, and *then* mark the > bad OSDs as 'lost'. > If all else fails, you can just mark those OSDs 'lost', but in doing so > you might be telling the cluster to lose data. > The best thing to do is definitely to get those OSDs started again. Now situation looks like this: [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b': size 500 GB in 128000 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.ed9d394a851426 format: 2 features: layering flags: [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426 (output cutted) rbd_data.ed9d394a851426.000000000000447c rbd_data.ed9d394a851426.0000000000010857 rbd_data.ed9d394a851426.000000000000ec8b rbd_data.ed9d394a851426.000000000000fa43 rbd_data.ed9d394a851426.000000000001ef2d ^C it hangs on this object and isn't going further. rbd cp also hangs... rbd map - also... can You advice what can be solution for this case ? -- Regards, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 13:19 ` Łukasz Chrustek @ 2017-05-24 13:37 ` Sage Weil 2017-05-24 13:58 ` Łukasz Chrustek 0 siblings, 1 reply; 35+ messages in thread From: Sage Weil @ 2017-05-24 13:37 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 3882 bytes --] On Wed, 24 May 2017, Łukasz Chrustek wrote: > Cześć, > > > On Tue, 23 May 2017, Łukasz Chrustek wrote: > >> Cześć, > >> > >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: > >> >> I'm not sleeping for over 30 hours, and still can't find solution. I > >> >> did, as You wrote, but turning off this > >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... > >> > >> > The important bit is: > >> > >> > "blocked": "peering is blocked due to down osds", > >> > "down_osds_we_would_probe": [ > >> > 6, > >> > 10, > >> > 33, > >> > 37, > >> > 72 > >> > ], > >> > "peering_blocked_by": [ > >> > { > >> > "osd": 6, > >> > "current_lost_at": 0, > >> > "comment": "starting or marking this osd lost may let > >> > us proceed" > >> > }, > >> > { > >> > "osd": 10, > >> > "current_lost_at": 0, > >> > "comment": "starting or marking this osd lost may let > >> > us proceed" > >> > }, > >> > { > >> > "osd": 37, > >> > "current_lost_at": 0, > >> > "comment": "starting or marking this osd lost may let > >> > us proceed" > >> > }, > >> > { > >> > "osd": 72, > >> > "current_lost_at": 113771, > >> > "comment": "starting or marking this osd lost may let > >> > us proceed" > >> > } > >> > ] > >> > }, > >> > >> > Are any of those OSDs startable? > >> > >> They were all up and running - but I decided to shut them down and out > >> them from ceph, now it looks like ceph working ok, but still two PGs > >> are in down state, how to get rid of it ? > > > If you haven't deleted the data, you should start the OSDs back up. > > > If they are partially damanged you can use ceph-objectstore-tool to > > extract just the PGs in question to make sure you haven't lost anything, > > inject them on some other OSD(s) and restart those, and *then* mark the > > bad OSDs as 'lost'. > > > If all else fails, you can just mark those OSDs 'lost', but in doing so > > you might be telling the cluster to lose data. > > > The best thing to do is definitely to get those OSDs started again. > > Now situation looks like this: > > [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b > rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b': > size 500 GB in 128000 objects > order 22 (4096 kB objects) > block_name_prefix: rbd_data.ed9d394a851426 > format: 2 > features: layering > flags: > > [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426 > (output cutted) > rbd_data.ed9d394a851426.000000000000447c > rbd_data.ed9d394a851426.0000000000010857 > rbd_data.ed9d394a851426.000000000000ec8b > rbd_data.ed9d394a851426.000000000000fa43 > rbd_data.ed9d394a851426.000000000001ef2d > ^C > > it hangs on this object and isn't going further. rbd cp also hangs... > rbd map - also... > > can You advice what can be solution for this case ? The hang is due to OSD throttling (see my first reply for how to wrok around that and get a pg query). But you already did that and the cluster told you which OSDs it needs to see up in order for it to peer and recover. If you haven't destroyed those disks, you should start those osds and it shoudl be fine. If you've destroyed the data or the disks are truly broken and dead, then you can mark those OSDs lost and the cluster *maybe* recover (but hard to say given the information you've shared). sage ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 13:37 ` Sage Weil @ 2017-05-24 13:58 ` Łukasz Chrustek 2017-05-24 14:02 ` Sage Weil 0 siblings, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 13:58 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Cześć, > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> Cześć, >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> Cześć, >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I >> >> >> did, as You wrote, but turning off this >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... >> >> >> >> > The important bit is: >> >> >> >> > "blocked": "peering is blocked due to down osds", >> >> > "down_osds_we_would_probe": [ >> >> > 6, >> >> > 10, >> >> > 33, >> >> > 37, >> >> > 72 >> >> > ], >> >> > "peering_blocked_by": [ >> >> > { >> >> > "osd": 6, >> >> > "current_lost_at": 0, >> >> > "comment": "starting or marking this osd lost may let >> >> > us proceed" >> >> > }, >> >> > { >> >> > "osd": 10, >> >> > "current_lost_at": 0, >> >> > "comment": "starting or marking this osd lost may let >> >> > us proceed" >> >> > }, >> >> > { >> >> > "osd": 37, >> >> > "current_lost_at": 0, >> >> > "comment": "starting or marking this osd lost may let >> >> > us proceed" >> >> > }, >> >> > { >> >> > "osd": 72, >> >> > "current_lost_at": 113771, >> >> > "comment": "starting or marking this osd lost may let >> >> > us proceed" >> >> > } >> >> > ] >> >> > }, >> >> >> >> > Are any of those OSDs startable? >> >> >> >> They were all up and running - but I decided to shut them down and out >> >> them from ceph, now it looks like ceph working ok, but still two PGs >> >> are in down state, how to get rid of it ? >> >> > If you haven't deleted the data, you should start the OSDs back up. >> >> > If they are partially damanged you can use ceph-objectstore-tool to >> > extract just the PGs in question to make sure you haven't lost anything, >> > inject them on some other OSD(s) and restart those, and *then* mark the >> > bad OSDs as 'lost'. >> >> > If all else fails, you can just mark those OSDs 'lost', but in doing so >> > you might be telling the cluster to lose data. >> >> > The best thing to do is definitely to get those OSDs started again. >> >> Now situation looks like this: >> >> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b >> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b': >> size 500 GB in 128000 objects >> order 22 (4096 kB objects) >> block_name_prefix: rbd_data.ed9d394a851426 >> format: 2 >> features: layering >> flags: >> >> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426 >> (output cutted) >> rbd_data.ed9d394a851426.000000000000447c >> rbd_data.ed9d394a851426.0000000000010857 >> rbd_data.ed9d394a851426.000000000000ec8b >> rbd_data.ed9d394a851426.000000000000fa43 >> rbd_data.ed9d394a851426.000000000001ef2d >> ^C >> >> it hangs on this object and isn't going further. rbd cp also hangs... >> rbd map - also... >> >> can You advice what can be solution for this case ? > The hang is due to OSD throttling (see my first reply for how to wrok > around that and get a pg query). But you already did that and the cluster > told you which OSDs it needs to see up in order for it to peer and > recover. If you haven't destroyed those disks, you should start those > osds and it shoudl be fine. If you've destroyed the data or the disks are > truly broken and dead, then you can mark those OSDs lost and the cluster > *maybe* recover (but hard to say given the information you've shared). > sage What information I can bring to You to say it is recoverable ? here are ceph -s and ceph health detail: [root@cc1 ~]# ceph -s cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60 health HEALTH_WARN 2 pgs down 2 pgs peering 2 pgs stuck inactive monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0} election epoch 872, quorum 0,1,2 cc1,cc2,cc3 osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects 76705 GB used, 107 TB / 182 TB avail 4030 active+clean 1 down+remapped+peering 1 down+peering client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr [root@cc1 ~]# ceph health detail HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48] pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40] pg 1.60 is down+remapped+peering, acting [66,40] pg 1.165 is down+peering, acting [67,88,48] [root@cc1 ~]# -- Regards, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 13:58 ` Łukasz Chrustek @ 2017-05-24 14:02 ` Sage Weil 2017-05-24 14:18 ` Łukasz Chrustek 0 siblings, 1 reply; 35+ messages in thread From: Sage Weil @ 2017-05-24 14:02 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 5806 bytes --] On Wed, 24 May 2017, Łukasz Chrustek wrote: > Cześć, > > > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> Cześć, > >> > >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: > >> >> Cześć, > >> >> > >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: > >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I > >> >> >> did, as You wrote, but turning off this > >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... > >> >> > >> >> > The important bit is: > >> >> > >> >> > "blocked": "peering is blocked due to down osds", > >> >> > "down_osds_we_would_probe": [ > >> >> > 6, > >> >> > 10, > >> >> > 33, > >> >> > 37, > >> >> > 72 > >> >> > ], > >> >> > "peering_blocked_by": [ > >> >> > { > >> >> > "osd": 6, > >> >> > "current_lost_at": 0, > >> >> > "comment": "starting or marking this osd lost may let > >> >> > us proceed" > >> >> > }, > >> >> > { > >> >> > "osd": 10, > >> >> > "current_lost_at": 0, > >> >> > "comment": "starting or marking this osd lost may let > >> >> > us proceed" > >> >> > }, > >> >> > { > >> >> > "osd": 37, > >> >> > "current_lost_at": 0, > >> >> > "comment": "starting or marking this osd lost may let > >> >> > us proceed" > >> >> > }, > >> >> > { > >> >> > "osd": 72, > >> >> > "current_lost_at": 113771, > >> >> > "comment": "starting or marking this osd lost may let > >> >> > us proceed" These are the osds (6, 10, 37, 72). > >> >> > } > >> >> > ] > >> >> > }, > >> >> > >> >> > Are any of those OSDs startable? This > >> >> > >> >> They were all up and running - but I decided to shut them down and out > >> >> them from ceph, now it looks like ceph working ok, but still two PGs > >> >> are in down state, how to get rid of it ? > >> > >> > If you haven't deleted the data, you should start the OSDs back up. This > >> > >> > If they are partially damanged you can use ceph-objectstore-tool to > >> > extract just the PGs in question to make sure you haven't lost anything, > >> > inject them on some other OSD(s) and restart those, and *then* mark the > >> > bad OSDs as 'lost'. > >> > >> > If all else fails, you can just mark those OSDs 'lost', but in doing so > >> > you might be telling the cluster to lose data. > >> > >> > The best thing to do is definitely to get those OSDs started again. This > >> > >> Now situation looks like this: > >> > >> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b > >> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b': > >> size 500 GB in 128000 objects > >> order 22 (4096 kB objects) > >> block_name_prefix: rbd_data.ed9d394a851426 > >> format: 2 > >> features: layering > >> flags: > >> > >> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426 > >> (output cutted) > >> rbd_data.ed9d394a851426.000000000000447c > >> rbd_data.ed9d394a851426.0000000000010857 > >> rbd_data.ed9d394a851426.000000000000ec8b > >> rbd_data.ed9d394a851426.000000000000fa43 > >> rbd_data.ed9d394a851426.000000000001ef2d > >> ^C > >> > >> it hangs on this object and isn't going further. rbd cp also hangs... > >> rbd map - also... > >> > >> can You advice what can be solution for this case ? > > > The hang is due to OSD throttling (see my first reply for how to wrok > > around that and get a pg query). But you already did that and the cluster > > told you which OSDs it needs to see up in order for it to peer and > > recover. If you haven't destroyed those disks, you should start those > > osds and it shoudl be fine. If you've destroyed the data or the disks are > > truly broken and dead, then you can mark those OSDs lost and the cluster > > *maybe* recover (but hard to say given the information you've shared). This > > > sage > > What information I can bring to You to say it is recoverable ? > > here are ceph -s and ceph health detail: > > [root@cc1 ~]# ceph -s > cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60 > health HEALTH_WARN > 2 pgs down > 2 pgs peering > 2 pgs stuck inactive > monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0} > election epoch 872, quorum 0,1,2 cc1,cc2,cc3 > osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs > pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects > 76705 GB used, 107 TB / 182 TB avail > 4030 active+clean > 1 down+remapped+peering > 1 down+peering > client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr > [root@cc1 ~]# ceph health detail > HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive > pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48] > pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40] > pg 1.60 is down+remapped+peering, acting [66,40] > pg 1.165 is down+peering, acting [67,88,48] > [root@cc1 ~]# > > -- > Regards, > Łukasz Chrustek > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 14:02 ` Sage Weil @ 2017-05-24 14:18 ` Łukasz Chrustek 2017-05-24 14:47 ` Sage Weil 0 siblings, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 14:18 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Cześć, > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> Cześć, >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> Cześć, >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> >> Cześć, >> >> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I >> >> >> >> did, as You wrote, but turning off this >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... >> >> >> >> >> >> > The important bit is: >> >> >> >> >> >> > "blocked": "peering is blocked due to down osds", >> >> >> > "down_osds_we_would_probe": [ >> >> >> > 6, >> >> >> > 10, >> >> >> > 33, >> >> >> > 37, >> >> >> > 72 >> >> >> > ], >> >> >> > "peering_blocked_by": [ >> >> >> > { >> >> >> > "osd": 6, >> >> >> > "current_lost_at": 0, >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> > us proceed" >> >> >> > }, >> >> >> > { >> >> >> > "osd": 10, >> >> >> > "current_lost_at": 0, >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> > us proceed" >> >> >> > }, >> >> >> > { >> >> >> > "osd": 37, >> >> >> > "current_lost_at": 0, >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> > us proceed" >> >> >> > }, >> >> >> > { >> >> >> > "osd": 72, >> >> >> > "current_lost_at": 113771, >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> > us proceed" > These are the osds (6, 10, 37, 72). >> >> >> > } >> >> >> > ] >> >> >> > }, >> >> >> >> >> >> > Are any of those OSDs startable? > This osd 6 - isn't startable osd 10, 37, 72 are startable >> >> >> >> >> >> They were all up and running - but I decided to shut them down and out >> >> >> them from ceph, now it looks like ceph working ok, but still two PGs >> >> >> are in down state, how to get rid of it ? >> >> >> >> > If you haven't deleted the data, you should start the OSDs back up. > This By OSDs backup You mean copy /var/lib/ceph/osd/ceph-72/* to some other (non ceph) disk ? >> >> >> >> > If they are partially damanged you can use ceph-objectstore-tool to >> >> > extract just the PGs in question to make sure you haven't lost anything, >> >> > inject them on some other OSD(s) and restart those, and *then* mark the >> >> > bad OSDs as 'lost'. >> >> >> >> > If all else fails, you can just mark those OSDs 'lost', but in doing so >> >> > you might be telling the cluster to lose data. >> >> >> >> > The best thing to do is definitely to get those OSDs started again. > This There were actions on this PGs, that make them destroy. I started this osds (these three, which are startable) - this dosn't solved situation. I need to add, that on this cluster are other pools, only with pool with broken/down PGs is problem. >> >> >> >> Now situation looks like this: >> >> >> >> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b >> >> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b': >> >> size 500 GB in 128000 objects >> >> order 22 (4096 kB objects) >> >> block_name_prefix: rbd_data.ed9d394a851426 >> >> format: 2 >> >> features: layering >> >> flags: >> >> >> >> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426 >> >> (output cutted) >> >> rbd_data.ed9d394a851426.000000000000447c >> >> rbd_data.ed9d394a851426.0000000000010857 >> >> rbd_data.ed9d394a851426.000000000000ec8b >> >> rbd_data.ed9d394a851426.000000000000fa43 >> >> rbd_data.ed9d394a851426.000000000001ef2d >> >> ^C >> >> >> >> it hangs on this object and isn't going further. rbd cp also hangs... >> >> rbd map - also... >> >> >> >> can You advice what can be solution for this case ? >> >> > The hang is due to OSD throttling (see my first reply for how to wrok >> > around that and get a pg query). But you already did that and the cluster >> > told you which OSDs it needs to see up in order for it to peer and >> > recover. If you haven't destroyed those disks, you should start those >> > osds and it shoudl be fine. If you've destroyed the data or the disks are >> > truly broken and dead, then you can mark those OSDs lost and the cluster >> > *maybe* recover (but hard to say given the information you've shared). > This [root@cc1 ~]# ceph osd lost 10 --yes-i-really-mean-it marked osd lost in epoch 115310 [root@cc1 ~]# ceph osd lost 37 --yes-i-really-mean-it marked osd lost in epoch 115314 [root@cc1 ~]# ceph osd lost 72 --yes-i-really-mean-it marked osd lost in epoch 115317 [root@cc1 ~]# ceph -s cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60 health HEALTH_WARN 2 pgs down 2 pgs peering 2 pgs stuck inactive monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0} election epoch 872, quorum 0,1,2 cc1,cc2,cc3 osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs pgmap v67642483: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects 76718 GB used, 107 TB / 182 TB avail 4030 active+clean 1 down+remapped+peering 1 down+peering client io 14624 kB/s rd, 31619 kB/s wr, 382 op/s rd, 228 op/s wr [root@cc1 ~]# ceph -s cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60 health HEALTH_WARN 2 pgs down 2 pgs peering 2 pgs stuck inactive monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0} election epoch 872, quorum 0,1,2 cc1,cc2,cc3 osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs pgmap v67642485: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects 76718 GB used, 107 TB / 182 TB avail 4030 active+clean 1 down+remapped+peering 1 down+peering client io 17805 kB/s rd, 18787 kB/s wr, 215 op/s rd, 107 op/s wr >> >> > sage >> >> What information I can bring to You to say it is recoverable ? >> >> here are ceph -s and ceph health detail: >> >> [root@cc1 ~]# ceph -s >> cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60 >> health HEALTH_WARN >> 2 pgs down >> 2 pgs peering >> 2 pgs stuck inactive >> monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0} >> election epoch 872, quorum 0,1,2 cc1,cc2,cc3 >> osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs >> pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects >> 76705 GB used, 107 TB / 182 TB avail >> 4030 active+clean >> 1 down+remapped+peering >> 1 down+peering >> client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr >> [root@cc1 ~]# ceph health detail >> HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive >> pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48] >> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40] >> pg 1.60 is down+remapped+peering, acting [66,40] >> pg 1.165 is down+peering, acting [67,88,48] >> [root@cc1 ~]# >> >> -- >> Regards, >> Łukasz Chrustek >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- Pozdrowienia, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 14:18 ` Łukasz Chrustek @ 2017-05-24 14:47 ` Sage Weil 2017-05-24 15:00 ` Łukasz Chrustek 2017-05-24 21:38 ` Łukasz Chrustek 0 siblings, 2 replies; 35+ messages in thread From: Sage Weil @ 2017-05-24 14:47 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 9357 bytes --] On Wed, 24 May 2017, Łukasz Chrustek wrote: > Cześć, > > > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> Cześć, > >> > >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> >> Cześć, > >> >> > >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: > >> >> >> Cześć, > >> >> >> > >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: > >> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I > >> >> >> >> did, as You wrote, but turning off this > >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... > >> >> >> > >> >> >> > The important bit is: > >> >> >> > >> >> >> > "blocked": "peering is blocked due to down osds", > >> >> >> > "down_osds_we_would_probe": [ > >> >> >> > 6, > >> >> >> > 10, > >> >> >> > 33, > >> >> >> > 37, > >> >> >> > 72 > >> >> >> > ], > >> >> >> > "peering_blocked_by": [ > >> >> >> > { > >> >> >> > "osd": 6, > >> >> >> > "current_lost_at": 0, > >> >> >> > "comment": "starting or marking this osd lost may let > >> >> >> > us proceed" > >> >> >> > }, > >> >> >> > { > >> >> >> > "osd": 10, > >> >> >> > "current_lost_at": 0, > >> >> >> > "comment": "starting or marking this osd lost may let > >> >> >> > us proceed" > >> >> >> > }, > >> >> >> > { > >> >> >> > "osd": 37, > >> >> >> > "current_lost_at": 0, > >> >> >> > "comment": "starting or marking this osd lost may let > >> >> >> > us proceed" > >> >> >> > }, > >> >> >> > { > >> >> >> > "osd": 72, > >> >> >> > "current_lost_at": 113771, > >> >> >> > "comment": "starting or marking this osd lost may let > >> >> >> > us proceed" > > > These are the osds (6, 10, 37, 72). > > >> >> >> > } > >> >> >> > ] > >> >> >> > }, > >> >> >> > >> >> >> > Are any of those OSDs startable? > > > This > > osd 6 - isn't startable Disk completely 100% dead, or just borken enough that ceph-osd won't start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs from this osd to recover any important writes on that osd. > osd 10, 37, 72 are startable With those started, I'd repeat the original sequence and get a fresh pg query to confirm that it still wants just osd.6. use ceph-objectstore-tool to export the pg from osd.6, stop some other ranodm osd (not one of these ones), import the pg into that osd, and start again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that point. repeat with the same basic process with the other pg. s > > >> >> >> > >> >> >> They were all up and running - but I decided to shut them down and out > >> >> >> them from ceph, now it looks like ceph working ok, but still two PGs > >> >> >> are in down state, how to get rid of it ? > >> >> > >> >> > If you haven't deleted the data, you should start the OSDs back up. > > > This > > By OSDs backup You mean copy /var/lib/ceph/osd/ceph-72/* to some other > (non ceph) disk ? > > >> >> > >> >> > If they are partially damanged you can use ceph-objectstore-tool to > >> >> > extract just the PGs in question to make sure you haven't lost anything, > >> >> > inject them on some other OSD(s) and restart those, and *then* mark the > >> >> > bad OSDs as 'lost'. > >> >> > >> >> > If all else fails, you can just mark those OSDs 'lost', but in doing so > >> >> > you might be telling the cluster to lose data. > >> >> > >> >> > The best thing to do is definitely to get those OSDs started again. > > > This > > There were actions on this PGs, that make them destroy. I started this > osds (these three, which are startable) - this dosn't solved > situation. I need to add, that on this cluster are other pools, only > with pool with broken/down PGs is problem. > >> >> > >> >> Now situation looks like this: > >> >> > >> >> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b > >> >> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b': > >> >> size 500 GB in 128000 objects > >> >> order 22 (4096 kB objects) > >> >> block_name_prefix: rbd_data.ed9d394a851426 > >> >> format: 2 > >> >> features: layering > >> >> flags: > >> >> > >> >> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426 > >> >> (output cutted) > >> >> rbd_data.ed9d394a851426.000000000000447c > >> >> rbd_data.ed9d394a851426.0000000000010857 > >> >> rbd_data.ed9d394a851426.000000000000ec8b > >> >> rbd_data.ed9d394a851426.000000000000fa43 > >> >> rbd_data.ed9d394a851426.000000000001ef2d > >> >> ^C > >> >> > >> >> it hangs on this object and isn't going further. rbd cp also hangs... > >> >> rbd map - also... > >> >> > >> >> can You advice what can be solution for this case ? > >> > >> > The hang is due to OSD throttling (see my first reply for how to wrok > >> > around that and get a pg query). But you already did that and the cluster > >> > told you which OSDs it needs to see up in order for it to peer and > >> > recover. If you haven't destroyed those disks, you should start those > > >> > osds and it shoudl be fine. If you've destroyed the data or the disks are > >> > truly broken and dead, then you can mark those OSDs lost and the cluster > >> > *maybe* recover (but hard to say given the information you've shared). > > > This > > > [root@cc1 ~]# ceph osd lost 10 --yes-i-really-mean-it > marked osd lost in epoch 115310 > [root@cc1 ~]# ceph osd lost 37 --yes-i-really-mean-it > marked osd lost in epoch 115314 > [root@cc1 ~]# ceph osd lost 72 --yes-i-really-mean-it > marked osd lost in epoch 115317 > [root@cc1 ~]# ceph -s > cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60 > health HEALTH_WARN > 2 pgs down > 2 pgs peering > 2 pgs stuck inactive > monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0} > election epoch 872, quorum 0,1,2 cc1,cc2,cc3 > osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs > pgmap v67642483: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects > 76718 GB used, 107 TB / 182 TB avail > 4030 active+clean > 1 down+remapped+peering > 1 down+peering > client io 14624 kB/s rd, 31619 kB/s wr, 382 op/s rd, 228 op/s wr > [root@cc1 ~]# ceph -s > cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60 > health HEALTH_WARN > 2 pgs down > 2 pgs peering > 2 pgs stuck inactive > monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0} > election epoch 872, quorum 0,1,2 cc1,cc2,cc3 > osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs > pgmap v67642485: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects > 76718 GB used, 107 TB / 182 TB avail > 4030 active+clean > 1 down+remapped+peering > 1 down+peering > client io 17805 kB/s rd, 18787 kB/s wr, 215 op/s rd, 107 op/s wr > > >> > >> > sage > >> > >> What information I can bring to You to say it is recoverable ? > >> > >> here are ceph -s and ceph health detail: > >> > >> [root@cc1 ~]# ceph -s > >> cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60 > >> health HEALTH_WARN > >> 2 pgs down > >> 2 pgs peering > >> 2 pgs stuck inactive > >> monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0} > >> election epoch 872, quorum 0,1,2 cc1,cc2,cc3 > >> osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs > >> pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects > >> 76705 GB used, 107 TB / 182 TB avail > >> 4030 active+clean > >> 1 down+remapped+peering > >> 1 down+peering > >> client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr > >> [root@cc1 ~]# ceph health detail > >> HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive > >> pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48] > >> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40] > >> pg 1.60 is down+remapped+peering, acting [66,40] > >> pg 1.165 is down+peering, acting [67,88,48] > >> [root@cc1 ~]# > >> > >> -- > >> Regards, > >> Łukasz Chrustek > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > > > > -- > Pozdrowienia, > Łukasz Chrustek > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 14:47 ` Sage Weil @ 2017-05-24 15:00 ` Łukasz Chrustek 2017-05-24 15:07 ` Łukasz Chrustek 2017-05-24 15:11 ` Sage Weil 2017-05-24 21:38 ` Łukasz Chrustek 1 sibling, 2 replies; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 15:00 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hello, > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> Cześć, >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> Cześć, >> >> >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> >> Cześć, >> >> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> >> >> Cześć, >> >> >> >> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I >> >> >> >> >> did, as You wrote, but turning off this >> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... >> >> >> >> >> >> >> >> > The important bit is: >> >> >> >> >> >> >> >> > "blocked": "peering is blocked due to down osds", >> >> >> >> > "down_osds_we_would_probe": [ >> >> >> >> > 6, >> >> >> >> > 10, >> >> >> >> > 33, >> >> >> >> > 37, >> >> >> >> > 72 >> >> >> >> > ], >> >> >> >> > "peering_blocked_by": [ >> >> >> >> > { >> >> >> >> > "osd": 6, >> >> >> >> > "current_lost_at": 0, >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> > us proceed" >> >> >> >> > }, >> >> >> >> > { >> >> >> >> > "osd": 10, >> >> >> >> > "current_lost_at": 0, >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> > us proceed" >> >> >> >> > }, >> >> >> >> > { >> >> >> >> > "osd": 37, >> >> >> >> > "current_lost_at": 0, >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> > us proceed" >> >> >> >> > }, >> >> >> >> > { >> >> >> >> > "osd": 72, >> >> >> >> > "current_lost_at": 113771, >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> > us proceed" >> >> > These are the osds (6, 10, 37, 72). >> >> >> >> >> > } >> >> >> >> > ] >> >> >> >> > }, >> >> >> >> >> >> >> >> > Are any of those OSDs startable? >> >> > This >> >> osd 6 - isn't startable > Disk completely 100% dead, or just borken enough that ceph-osd won't > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs > from this osd to recover any important writes on that osd. 2017-05-24 11:21:23.341938 7f6830a36940 0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375 2017-05-24 11:21:23.350180 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e) 2017-05-24 11:21:23.350610 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work 2017-05-24 11:21:23.350617 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2017-05-24 11:21:23.350633 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported 2017-05-24 11:21:23.351897 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2017-05-24 11:21:23.351951 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported 2017-05-24 11:21:23.351970 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists 2017-05-24 11:21:23.351981 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported 2017-05-24 11:21:23.351984 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted 2017-05-24 11:21:23.351987 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed 2017-05-24 11:21:23.351996 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING 2017-05-24 11:21:23.352573 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877) 2017-05-24 11:21:23.353001 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported 2017-05-24 11:21:23.353012 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test 2017-05-24 11:21:23.353016 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted 2017-05-24 11:21:23.353021 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported 2017-05-24 11:21:23.353022 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted 2017-05-24 11:21:23.353027 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted 2017-05-24 11:21:23.355156 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory 2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock 2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m it is all I get for this osd in logs, when I try to start it. >> osd 10, 37, 72 are startable > With those started, I'd repeat the original sequence and get a fresh pg > query to confirm that it still wants just osd.6. You mean about procedure with loop and taking down OSDs, which broken PGs are pointing to ? pg 1.60 is down+remapped+peering, acting [66,40] pg 1.165 is down+peering, acting [67,88,48] for pg 1.60 <--> 66 down, then in loop check pg query ? > use ceph-objectstore-tool to export the pg from osd.6, stop some other > ranodm osd (not one of these ones), import the pg into that osd, and start > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that > point. repeat with the same basic process with the other pg. I have already did 'ceph osd lost 6', do I need to do this once again ? -- Regards Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 15:00 ` Łukasz Chrustek @ 2017-05-24 15:07 ` Łukasz Chrustek 2017-05-24 15:11 ` Sage Weil 1 sibling, 0 replies; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 15:07 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel > it is all I get for this osd in logs, when I try to start it. >>> osd 10, 37, 72 are startable >> With those started, I'd repeat the original sequence and get a fresh pg >> query to confirm that it still wants just osd.6. > You mean about procedure with loop and taking down OSDs, which broken > PGs are pointing to ? > pg 1.60 is down+remapped+peering, acting [66,40] > pg 1.165 is down+peering, acting [67,88,48] > for pg 1.60 <--> 66 down, then in loop check pg query ? >> use ceph-objectstore-tool to export the pg from osd.6, stop some other >> ranodm osd (not one of these ones), import the pg into that osd, and start >> again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that >> point. repeat with the same basic process with the other pg. > I have already did 'ceph osd lost 6', do I need to do this once again ? /dev/sdb1 3,7T 34M 3,7T 1% /var/lib/ceph/osd/ceph-6 this disk have no data, they where migrated, when this osd was able to be up. -- Regards, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 15:00 ` Łukasz Chrustek 2017-05-24 15:07 ` Łukasz Chrustek @ 2017-05-24 15:11 ` Sage Weil 2017-05-24 15:24 ` Łukasz Chrustek 2017-05-24 15:54 ` Łukasz Chrustek 1 sibling, 2 replies; 35+ messages in thread From: Sage Weil @ 2017-05-24 15:11 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 7223 bytes --] On Wed, 24 May 2017, Łukasz Chrustek wrote: > Hello, > > > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> Cześć, > >> > >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> >> Cześć, > >> >> > >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> >> >> Cześć, > >> >> >> > >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: > >> >> >> >> Cześć, > >> >> >> >> > >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: > >> >> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I > >> >> >> >> >> did, as You wrote, but turning off this > >> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... > >> >> >> >> > >> >> >> >> > The important bit is: > >> >> >> >> > >> >> >> >> > "blocked": "peering is blocked due to down osds", > >> >> >> >> > "down_osds_we_would_probe": [ > >> >> >> >> > 6, > >> >> >> >> > 10, > >> >> >> >> > 33, > >> >> >> >> > 37, > >> >> >> >> > 72 > >> >> >> >> > ], > >> >> >> >> > "peering_blocked_by": [ > >> >> >> >> > { > >> >> >> >> > "osd": 6, > >> >> >> >> > "current_lost_at": 0, > >> >> >> >> > "comment": "starting or marking this osd lost may let > >> >> >> >> > us proceed" > >> >> >> >> > }, > >> >> >> >> > { > >> >> >> >> > "osd": 10, > >> >> >> >> > "current_lost_at": 0, > >> >> >> >> > "comment": "starting or marking this osd lost may let > >> >> >> >> > us proceed" > >> >> >> >> > }, > >> >> >> >> > { > >> >> >> >> > "osd": 37, > >> >> >> >> > "current_lost_at": 0, > >> >> >> >> > "comment": "starting or marking this osd lost may let > >> >> >> >> > us proceed" > >> >> >> >> > }, > >> >> >> >> > { > >> >> >> >> > "osd": 72, > >> >> >> >> > "current_lost_at": 113771, > >> >> >> >> > "comment": "starting or marking this osd lost may let > >> >> >> >> > us proceed" > >> > >> > These are the osds (6, 10, 37, 72). > >> > >> >> >> >> > } > >> >> >> >> > ] > >> >> >> >> > }, > >> >> >> >> > >> >> >> >> > Are any of those OSDs startable? > >> > >> > This > >> > >> osd 6 - isn't startable > > > Disk completely 100% dead, or just borken enough that ceph-osd won't > > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs > > from this osd to recover any important writes on that osd. > > 2017-05-24 11:21:23.341938 7f6830a36940 0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375 > 2017-05-24 11:21:23.350180 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e) > 2017-05-24 11:21:23.350610 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work > 2017-05-24 11:21:23.350617 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option > 2017-05-24 11:21:23.350633 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported > 2017-05-24 11:21:23.351897 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) > 2017-05-24 11:21:23.351951 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported > 2017-05-24 11:21:23.351970 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists > 2017-05-24 11:21:23.351981 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported > 2017-05-24 11:21:23.351984 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted > 2017-05-24 11:21:23.351987 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed > 2017-05-24 11:21:23.351996 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING > 2017-05-24 11:21:23.352573 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877) > 2017-05-24 11:21:23.353001 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported > 2017-05-24 11:21:23.353012 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test > 2017-05-24 11:21:23.353016 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted > 2017-05-24 11:21:23.353021 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported > 2017-05-24 11:21:23.353022 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted > 2017-05-24 11:21:23.353027 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted > 2017-05-24 11:21:23.355156 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled > 2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory > 2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock > 2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m > > it is all I get for this osd in logs, when I try to start it. > > >> osd 10, 37, 72 are startable > > > With those started, I'd repeat the original sequence and get a fresh pg > > query to confirm that it still wants just osd.6. > > You mean about procedure with loop and taking down OSDs, which broken > PGs are pointing to ? > pg 1.60 is down+remapped+peering, acting [66,40] > pg 1.165 is down+peering, acting [67,88,48] > > for pg 1.60 <--> 66 down, then in loop check pg query ? Right. > > use ceph-objectstore-tool to export the pg from osd.6, stop some other > > ranodm osd (not one of these ones), import the pg into that osd, and start > > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that > > point. repeat with the same basic process with the other pg. > > I have already did 'ceph osd lost 6', do I need to do this once again ? Hmm not sure, if the OSD is empty then there is no harm in doing it again. Try that first since it might resolve it. If not, do the query loop above. s ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 15:11 ` Sage Weil @ 2017-05-24 15:24 ` Łukasz Chrustek 2017-05-24 15:54 ` Łukasz Chrustek 1 sibling, 0 replies; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 15:24 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hello, >> >> >> osd 10, 37, 72 are startable >> >> > With those started, I'd repeat the original sequence and get a fresh pg >> > query to confirm that it still wants just osd.6. >> >> You mean about procedure with loop and taking down OSDs, which broken >> PGs are pointing to ? >> pg 1.60 is down+remapped+peering, acting [66,40] >> pg 1.165 is down+peering, acting [67,88,48] >> >> for pg 1.60 <--> 66 down, then in loop check pg query ? > Right. >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other >> > ranodm osd (not one of these ones), import the pg into that osd, and start >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that >> > point. repeat with the same basic process with the other pg. >> >> I have already did 'ceph osd lost 6', do I need to do this once again ? > Hmm not sure, if the OSD is empty then there is no harm in doing it again. > Try that first since it might resolve it. If not, do the query loop > above. [root@cc1 ~]# ceph osd lost 6 --yes-i-really-mean-it marked osd lost in epoch 113414 [root@cc1 ~]# [root@cc1 ~]# ceph -s cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60 health HEALTH_WARN 2 pgs down 2 pgs peering 2 pgs stuck inactive monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0} election epoch 872, quorum 0,1,2 cc1,cc2,cc3 osdmap e115449: 100 osds: 88 up, 86 in; 1 remapped pgs pgmap v67646402: 4032 pgs, 18 pools, 26733 GB data, 4862 kobjects 76759 GB used, 107 TB / 182 TB avail 4030 active+clean 1 down+peering 1 down+remapped+peering client io 57154 kB/s rd, 1189 kB/s wr, 95 op/s There is no action after marking again this osd as lost. -- Regards, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 15:11 ` Sage Weil 2017-05-24 15:24 ` Łukasz Chrustek @ 2017-05-24 15:54 ` Łukasz Chrustek 2017-05-24 16:02 ` Łukasz Chrustek 1 sibling, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 15:54 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hello, > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> Hello, >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> Cześć, >> >> >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> >> Cześć, >> >> >> >> >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> >> >> Cześć, >> >> >> >> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> >> >> >> Cześć, >> >> >> >> >> >> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> >> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I >> >> >> >> >> >> did, as You wrote, but turning off this >> >> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... >> >> >> >> >> >> >> >> >> >> > The important bit is: >> >> >> >> >> >> >> >> >> >> > "blocked": "peering is blocked due to down osds", >> >> >> >> >> > "down_osds_we_would_probe": [ >> >> >> >> >> > 6, >> >> >> >> >> > 10, >> >> >> >> >> > 33, >> >> >> >> >> > 37, >> >> >> >> >> > 72 >> >> >> >> >> > ], >> >> >> >> >> > "peering_blocked_by": [ >> >> >> >> >> > { >> >> >> >> >> > "osd": 6, >> >> >> >> >> > "current_lost_at": 0, >> >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> >> > us proceed" >> >> >> >> >> > }, >> >> >> >> >> > { >> >> >> >> >> > "osd": 10, >> >> >> >> >> > "current_lost_at": 0, >> >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> >> > us proceed" >> >> >> >> >> > }, >> >> >> >> >> > { >> >> >> >> >> > "osd": 37, >> >> >> >> >> > "current_lost_at": 0, >> >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> >> > us proceed" >> >> >> >> >> > }, >> >> >> >> >> > { >> >> >> >> >> > "osd": 72, >> >> >> >> >> > "current_lost_at": 113771, >> >> >> >> >> > "comment": "starting or marking this osd lost may let >> >> >> >> >> > us proceed" >> >> >> >> > These are the osds (6, 10, 37, 72). >> >> >> >> >> >> >> > } >> >> >> >> >> > ] >> >> >> >> >> > }, >> >> >> >> >> >> >> >> >> >> > Are any of those OSDs startable? >> >> >> >> > This >> >> >> >> osd 6 - isn't startable >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs >> > from this osd to recover any important writes on that osd. >> >> 2017-05-24 11:21:23.341938 7f6830a36940 0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375 >> 2017-05-24 11:21:23.350180 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e) >> 2017-05-24 11:21:23.350610 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work >> 2017-05-24 11:21:23.350617 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option >> 2017-05-24 11:21:23.350633 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported >> 2017-05-24 11:21:23.351897 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) >> 2017-05-24 11:21:23.351951 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported >> 2017-05-24 11:21:23.351970 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists >> 2017-05-24 11:21:23.351981 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported >> 2017-05-24 11:21:23.351984 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted >> 2017-05-24 11:21:23.351987 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed >> 2017-05-24 11:21:23.351996 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING >> 2017-05-24 11:21:23.352573 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877) >> 2017-05-24 11:21:23.353001 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported >> 2017-05-24 11:21:23.353012 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test >> 2017-05-24 11:21:23.353016 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted >> 2017-05-24 11:21:23.353021 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported >> 2017-05-24 11:21:23.353022 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted >> 2017-05-24 11:21:23.353027 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted >> 2017-05-24 11:21:23.355156 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled >> 2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory >> 2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock >> 2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m >> >> it is all I get for this osd in logs, when I try to start it. >> >> >> osd 10, 37, 72 are startable >> >> > With those started, I'd repeat the original sequence and get a fresh pg >> > query to confirm that it still wants just osd.6. >> >> You mean about procedure with loop and taking down OSDs, which broken >> PGs are pointing to ? >> pg 1.60 is down+remapped+peering, acting [66,40] >> pg 1.165 is down+peering, acting [67,88,48] >> >> for pg 1.60 <--> 66 down, then in loop check pg query ? > Right. And now it is very weird.... I made osd.37 up, and loop while true;do; ceph tell 1.165 query ;done catch this: https://pastebin.com/zKu06fJn Can You tell, what is wrong now ? >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other >> > ranodm osd (not one of these ones), import the pg into that osd, and start >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that >> > point. repeat with the same basic process with the other pg. >> >> I have already did 'ceph osd lost 6', do I need to do this once again ? > Hmm not sure, if the OSD is empty then there is no harm in doing it again. > Try that first since it might resolve it. If not, do the query loop > above. > s -- Regards,, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 15:54 ` Łukasz Chrustek @ 2017-05-24 16:02 ` Łukasz Chrustek 2017-05-24 17:07 ` Łukasz Chrustek 0 siblings, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 16:02 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hello, > And now it is very weird.... I made osd.37 up, and loop > while true;do; ceph tell 1.165 query ;done Here need to explain more - all I did was start ceph-osd id=37 on storage node, in ceph osd tree this osd osd is marked as out: -17 21.49995 host stor8 22 1.59999 osd.22 up 1.00000 1.00000 23 1.59999 osd.23 up 1.00000 1.00000 36 2.09999 osd.36 up 1.00000 1.00000 37 2.09999 osd.37 up 0 1.00000 38 2.50000 osd.38 up 1.00000 1.00000 39 2.50000 osd.39 up 1.00000 1.00000 40 2.50000 osd.40 up 0 1.00000 41 2.50000 osd.41 down 0 1.00000 42 2.50000 osd.42 up 1.00000 1.00000 43 1.59999 osd.43 up 1.00000 1.00000 after start of this osd, ceph tell 1.165 query worked only for one call of this command > catch this: > https://pastebin.com/zKu06fJn > Can You tell, what is wrong now ? >>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other >>> > ranodm osd (not one of these ones), import the pg into that osd, and start >>> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that >>> > point. repeat with the same basic process with the other pg. >>> >>> I have already did 'ceph osd lost 6', do I need to do this once again ? >> Hmm not sure, if the OSD is empty then there is no harm in doing it again. >> Try that first since it might resolve it. If not, do the query loop >> above. >> s -- Pozdrowienia, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 16:02 ` Łukasz Chrustek @ 2017-05-24 17:07 ` Łukasz Chrustek 2017-05-24 17:16 ` Sage Weil 0 siblings, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 17:07 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel >> And now it is very weird.... I made osd.37 up, and loop >> while true;do; ceph tell 1.165 query ;done > Here need to explain more - all I did was start ceph-osd id=37 on > storage node, in ceph osd tree this osd osd is marked as out: > -17 21.49995 host stor8 > 22 1.59999 osd.22 up 1.00000 1.00000 > 23 1.59999 osd.23 up 1.00000 1.00000 > 36 2.09999 osd.36 up 1.00000 1.00000 > 37 2.09999 osd.37 up 0 1.00000 > 38 2.50000 osd.38 up 1.00000 1.00000 > 39 2.50000 osd.39 up 1.00000 1.00000 > 40 2.50000 osd.40 up 0 1.00000 > 41 2.50000 osd.41 down 0 1.00000 > 42 2.50000 osd.42 up 1.00000 1.00000 > 43 1.59999 osd.43 up 1.00000 1.00000 > after start of this osd, ceph tell 1.165 query worked only for one call of this command >> catch this: >> https://pastebin.com/zKu06fJn here is for pg 1.60: https://pastebin.com/Xuk5iFXr >> Can You tell, what is wrong now ? >>>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other >>>> > ranodm osd (not one of these ones), import the pg into that osd, and start >>>> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that >>>> > point. repeat with the same basic process with the other pg. >>>> >>>> I have already did 'ceph osd lost 6', do I need to do this once again ? >>> Hmm not sure, if the OSD is empty then there is no harm in doing it again. >>> Try that first since it might resolve it. If not, do the query loop >>> above. >>> s -- Pozdrowienia, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 17:07 ` Łukasz Chrustek @ 2017-05-24 17:16 ` Sage Weil 2017-05-24 17:28 ` Łukasz Chrustek 2017-05-24 17:30 ` Łukasz Chrustek 0 siblings, 2 replies; 35+ messages in thread From: Sage Weil @ 2017-05-24 17:16 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 1424 bytes --] On Wed, 24 May 2017, Łukasz Chrustek wrote: > > >> And now it is very weird.... I made osd.37 up, and loop > >> while true;do; ceph tell 1.165 query ;done > > > Here need to explain more - all I did was start ceph-osd id=37 on > > storage node, in ceph osd tree this osd osd is marked as out: > > > > -17 21.49995 host stor8 > > 22 1.59999 osd.22 up 1.00000 1.00000 > > 23 1.59999 osd.23 up 1.00000 1.00000 > > 36 2.09999 osd.36 up 1.00000 1.00000 > > 37 2.09999 osd.37 up 0 1.00000 > > 38 2.50000 osd.38 up 1.00000 1.00000 > > 39 2.50000 osd.39 up 1.00000 1.00000 > > 40 2.50000 osd.40 up 0 1.00000 > > 41 2.50000 osd.41 down 0 1.00000 > > 42 2.50000 osd.42 up 1.00000 1.00000 > > 43 1.59999 osd.43 up 1.00000 1.00000 > > > after start of this osd, ceph tell 1.165 query worked only for one call of this command > >> catch this: > > >> https://pastebin.com/zKu06fJn > > here is for pg 1.60: > > https://pastebin.com/Xuk5iFXr Look at the bottom, after it says "blocked": "peering is blocked due to down osds", Did the 1.165 pg recover? sage ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 17:16 ` Sage Weil @ 2017-05-24 17:28 ` Łukasz Chrustek 2017-05-24 18:16 ` Sage Weil 2017-05-24 17:30 ` Łukasz Chrustek 1 sibling, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 17:28 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Cześć, > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> >> And now it is very weird.... I made osd.37 up, and loop >> >> while true;do; ceph tell 1.165 query ;done >> >> > Here need to explain more - all I did was start ceph-osd id=37 on >> > storage node, in ceph osd tree this osd osd is marked as out: >> >> >> > -17 21.49995 host stor8 >> > 22 1.59999 osd.22 up 1.00000 1.00000 >> > 23 1.59999 osd.23 up 1.00000 1.00000 >> > 36 2.09999 osd.36 up 1.00000 1.00000 >> > 37 2.09999 osd.37 up 0 1.00000 >> > 38 2.50000 osd.38 up 1.00000 1.00000 >> > 39 2.50000 osd.39 up 1.00000 1.00000 >> > 40 2.50000 osd.40 up 0 1.00000 >> > 41 2.50000 osd.41 down 0 1.00000 >> > 42 2.50000 osd.42 up 1.00000 1.00000 >> > 43 1.59999 osd.43 up 1.00000 1.00000 >> >> > after start of this osd, ceph tell 1.165 query worked only for one call of this command >> >> catch this: >> >> >> https://pastebin.com/zKu06fJn >> >> here is for pg 1.60: >> >> https://pastebin.com/Xuk5iFXr > Look at the bottom, after it says > "blocked": "peering is blocked due to down osds", > Did the 1.165 pg recover? No it didn't: [root@cc1 ~]# ceph health detail HEALTH_WARN 1 pgs down; 1 pgs incomplete; 1 pgs peering; 2 pgs stuck inactive pg 1.165 is stuck inactive since forever, current state incomplete, last acting [67,88,48] pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [68] pg 1.60 is down+remapped+peering, acting [68] pg 1.165 is incomplete, acting [67,88,48] [root@cc1 ~]# -- Pozdrowienia, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 17:28 ` Łukasz Chrustek @ 2017-05-24 18:16 ` Sage Weil 2017-05-24 19:47 ` Łukasz Chrustek 0 siblings, 1 reply; 35+ messages in thread From: Sage Weil @ 2017-05-24 18:16 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 2211 bytes --] On Wed, 24 May 2017, Łukasz Chrustek wrote: > Cześć, > > > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> > >> >> And now it is very weird.... I made osd.37 up, and loop > >> >> while true;do; ceph tell 1.165 query ;done > >> > >> > Here need to explain more - all I did was start ceph-osd id=37 on > >> > storage node, in ceph osd tree this osd osd is marked as out: > >> > >> > >> > -17 21.49995 host stor8 > >> > 22 1.59999 osd.22 up 1.00000 1.00000 > >> > 23 1.59999 osd.23 up 1.00000 1.00000 > >> > 36 2.09999 osd.36 up 1.00000 1.00000 > >> > 37 2.09999 osd.37 up 0 1.00000 > >> > 38 2.50000 osd.38 up 1.00000 1.00000 > >> > 39 2.50000 osd.39 up 1.00000 1.00000 > >> > 40 2.50000 osd.40 up 0 1.00000 > >> > 41 2.50000 osd.41 down 0 1.00000 > >> > 42 2.50000 osd.42 up 1.00000 1.00000 > >> > 43 1.59999 osd.43 up 1.00000 1.00000 > >> > >> > after start of this osd, ceph tell 1.165 query worked only for one call of this command > >> >> catch this: > >> > >> >> https://pastebin.com/zKu06fJn > >> > >> here is for pg 1.60: > >> > >> https://pastebin.com/Xuk5iFXr > > > Look at the bottom, after it says > > > "blocked": "peering is blocked due to down osds", > > > Did the 1.165 pg recover? > > No it didn't: > > [root@cc1 ~]# ceph health detail > HEALTH_WARN 1 pgs down; 1 pgs incomplete; 1 pgs peering; 2 pgs stuck inactive > pg 1.165 is stuck inactive since forever, current state incomplete, last acting [67,88,48] > pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [68] > pg 1.60 is down+remapped+peering, acting [68] > pg 1.165 is incomplete, acting [67,88,48] > [root@cc1 ~]# Hrm. ceph daemon osd.67 config set debug_osd 20 ceph daemon osd.67 config set debug_ms 1 ceph osd down 67 and capture the log resulting log segment, then post it with ceph-post-file. sage ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 18:16 ` Sage Weil @ 2017-05-24 19:47 ` Łukasz Chrustek 0 siblings, 0 replies; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 19:47 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hello, > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> Cześć, >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> >> >> >> And now it is very weird.... I made osd.37 up, and loop >> >> >> while true;do; ceph tell 1.165 query ;done >> >> >> >> > Here need to explain more - all I did was start ceph-osd id=37 on >> >> > storage node, in ceph osd tree this osd osd is marked as out: >> >> >> >> >> >> > -17 21.49995 host stor8 >> >> > 22 1.59999 osd.22 up 1.00000 1.00000 >> >> > 23 1.59999 osd.23 up 1.00000 1.00000 >> >> > 36 2.09999 osd.36 up 1.00000 1.00000 >> >> > 37 2.09999 osd.37 up 0 1.00000 >> >> > 38 2.50000 osd.38 up 1.00000 1.00000 >> >> > 39 2.50000 osd.39 up 1.00000 1.00000 >> >> > 40 2.50000 osd.40 up 0 1.00000 >> >> > 41 2.50000 osd.41 down 0 1.00000 >> >> > 42 2.50000 osd.42 up 1.00000 1.00000 >> >> > 43 1.59999 osd.43 up 1.00000 1.00000 >> >> >> >> > after start of this osd, ceph tell 1.165 query worked only for one call of this command >> >> >> catch this: >> >> >> >> >> https://pastebin.com/zKu06fJn >> >> >> >> here is for pg 1.60: >> >> >> >> https://pastebin.com/Xuk5iFXr >> >> > Look at the bottom, after it says >> >> > "blocked": "peering is blocked due to down osds", >> >> > Did the 1.165 pg recover? >> >> No it didn't: >> >> [root@cc1 ~]# ceph health detail >> HEALTH_WARN 1 pgs down; 1 pgs incomplete; 1 pgs peering; 2 pgs stuck inactive >> pg 1.165 is stuck inactive since forever, current state incomplete, last acting [67,88,48] >> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [68] >> pg 1.60 is down+remapped+peering, acting [68] >> pg 1.165 is incomplete, acting [67,88,48] >> [root@cc1 ~]# > Hrm. > ceph daemon osd.67 config set debug_osd 20 > ceph daemon osd.67 config set debug_ms 1 > ceph osd down 67 > and capture the log resulting log segment, then post it with > ceph-post-file. args: -- /var/log/ceph/ceph-osd.67.log /usr/bin/ceph-post-file: upload tag 05a02f14-8fd6-43da-9b9c-e42cd1fce560 /usr/bin/ceph-post-file: user: root@stor3 /usr/bin/ceph-post-file: will upload file /var/log/ceph/ceph-osd.67.log sftp> mkdir post/05a02f14-8fd6-43da-9b9c-e42cd1fce560_root@stor3_8612f2d9-bb31-4d5e-b3e7-3722f8d13314 sftp> cd post/05a02f14-8fd6-43da-9b9c-e42cd1fce560_root@stor3_8612f2d9-bb31-4d5e-b3e7-3722f8d13314 sftp> put /tmp/tmp.rggR3suNMt user sftp> put /var/log/ceph/ceph-osd.67.log /usr/bin/ceph-post-file: copy the upload id below to share with a dev: ceph-post-file: 05a02f14-8fd6-43da-9b9c-e42cd1fce560 -- Regards,, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 17:16 ` Sage Weil 2017-05-24 17:28 ` Łukasz Chrustek @ 2017-05-24 17:30 ` Łukasz Chrustek 2017-05-24 17:35 ` Łukasz Chrustek 1 sibling, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 17:30 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Cześć, > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> >> And now it is very weird.... I made osd.37 up, and loop >> >> while true;do; ceph tell 1.165 query ;done >> >> > Here need to explain more - all I did was start ceph-osd id=37 on >> > storage node, in ceph osd tree this osd osd is marked as out: >> >> >> > -17 21.49995 host stor8 >> > 22 1.59999 osd.22 up 1.00000 1.00000 >> > 23 1.59999 osd.23 up 1.00000 1.00000 >> > 36 2.09999 osd.36 up 1.00000 1.00000 >> > 37 2.09999 osd.37 up 0 1.00000 >> > 38 2.50000 osd.38 up 1.00000 1.00000 >> > 39 2.50000 osd.39 up 1.00000 1.00000 >> > 40 2.50000 osd.40 up 0 1.00000 >> > 41 2.50000 osd.41 down 0 1.00000 >> > 42 2.50000 osd.42 up 1.00000 1.00000 >> > 43 1.59999 osd.43 up 1.00000 1.00000 >> >> > after start of this osd, ceph tell 1.165 query worked only for one call of this command >> >> catch this: >> >> >> https://pastebin.com/zKu06fJn >> >> here is for pg 1.60: >> >> https://pastebin.com/Xuk5iFXr > Look at the bottom, after it says > "blocked": "peering is blocked due to down osds", for pg 1.60: all osds was down, when ceph tell 1.60 query catch one 'interrupt'. > Did the 1.165 pg recover? > sage -- Pozdrowienia, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 17:30 ` Łukasz Chrustek @ 2017-05-24 17:35 ` Łukasz Chrustek 0 siblings, 0 replies; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 17:35 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hello, >> On Wed, 24 May 2017, Łukasz Chrustek wrote: >>> >>> >> And now it is very weird.... I made osd.37 up, and loop >>> >> while true;do; ceph tell 1.165 query ;done >>> >>> > Here need to explain more - all I did was start ceph-osd id=37 on >>> > storage node, in ceph osd tree this osd osd is marked as out: >>> >>> >>> > -17 21.49995 host stor8 >>> > 22 1.59999 osd.22 up 1.00000 1.00000 >>> > 23 1.59999 osd.23 up 1.00000 1.00000 >>> > 36 2.09999 osd.36 up 1.00000 1.00000 >>> > 37 2.09999 osd.37 up 0 1.00000 >>> > 38 2.50000 osd.38 up 1.00000 1.00000 >>> > 39 2.50000 osd.39 up 1.00000 1.00000 >>> > 40 2.50000 osd.40 up 0 1.00000 >>> > 41 2.50000 osd.41 down 0 1.00000 >>> > 42 2.50000 osd.42 up 1.00000 1.00000 >>> > 43 1.59999 osd.43 up 1.00000 1.00000 >>> >>> > after start of this osd, ceph tell 1.165 query worked only for one call of this command >>> >> catch this: >>> >>> >> https://pastebin.com/zKu06fJn >>> >>> here is for pg 1.60: >>> >>> https://pastebin.com/Xuk5iFXr >> Look at the bottom, after it says >> "blocked": "peering is blocked due to down osds", > for pg 1.60: all osds was down, when ceph tell 1.60 query catch one > 'interrupt'. when I'm trying to use ceph-objectstore-tool I get: [root@stor3 ~]# ceph-objectstore-tool --op export --pgid 1.60 --data-path /mnt --journal-path /mnt/journal --file 1.60.export Mount failed with '(95) Operation not supported' [root@stor3 ~]# du -sh /mnt/current/1.60_head 276M /mnt/current/1.60_head [root@stor3 ~]# ls -al /mnt/current/1.60_head | wc -l 49 [root@stor3 ~]# -- Regards, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 14:47 ` Sage Weil 2017-05-24 15:00 ` Łukasz Chrustek @ 2017-05-24 21:38 ` Łukasz Chrustek 2017-05-24 21:53 ` Sage Weil 1 sibling, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 21:38 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hello, >> >> > This >> >> osd 6 - isn't startable > Disk completely 100% dead, or just borken enough that ceph-osd won't > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs > from this osd to recover any important writes on that osd. >> osd 10, 37, 72 are startable > With those started, I'd repeat the original sequence and get a fresh pg > query to confirm that it still wants just osd.6. > use ceph-objectstore-tool to export the pg from osd.6, stop some other > ranodm osd (not one of these ones), import the pg into that osd, and start > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that > point. repeat with the same basic process with the other pg. Here is output from ceph-objectstore-tool - also didn't success: https://pastebin.com/7XGAHdKH -- Pozdrowienia, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 21:38 ` Łukasz Chrustek @ 2017-05-24 21:53 ` Sage Weil 2017-05-24 22:09 ` Łukasz Chrustek 0 siblings, 1 reply; 35+ messages in thread From: Sage Weil @ 2017-05-24 21:53 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 5583 bytes --] On Wed, 24 May 2017, Łukasz Chrustek wrote: > Hello, > > >> > >> > This > >> > >> osd 6 - isn't startable > > > Disk completely 100% dead, or just borken enough that ceph-osd won't > > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs > > from this osd to recover any important writes on that osd. > > >> osd 10, 37, 72 are startable > > > With those started, I'd repeat the original sequence and get a fresh pg > > query to confirm that it still wants just osd.6. > > > use ceph-objectstore-tool to export the pg from osd.6, stop some other > > ranodm osd (not one of these ones), import the pg into that osd, and start > > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that > > point. repeat with the same basic process with the other pg. > > Here is output from ceph-objectstore-tool - also didn't success: > > https://pastebin.com/7XGAHdKH Hmm, btrfs: 2017-05-24 23:28:58.547456 7f500948e940 -1 filestore(/var/lib/ceph/osd/ceph-84) ERROR: /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid losing new data You could try setting --osd-use-stale-snap as suggested. Is it the same error with the other one? Looking at the log you sent earlier for 1.165 on osd.67, and the primary reports: 2017-05-24 21:37:11.505256 7efdbc1e5700 5 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] enter Started/Primary/Peering/GetLog 2017-05-24 21:37:11.505291 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.37 1.165( v 112598'67281552 (112574'67278547,112598'67281552] lb 1/56500165/rbd_data.674a3ed7dffd473.0000000000000b38/ head (NIBBLEWISE) local-les=112584 n=1 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) 2017-05-24 21:37:11.505299 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.38 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) 2017-05-24 21:37:11.505306 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.48 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) 2017-05-24 21:37:11.505313 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.67 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) 2017-05-24 21:37:11.505319 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.88 1.165( empty local-les=0 n=0 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) 2017-05-24 21:37:11.505326 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] choose_acting failed in particular, osd 37 38 48 67 all have incomplete copies of the PG (they are mid-backfill) and 68 has nothing. Some data is lost unless you can recovery another OSD with that PG. The set of OSDs that might have data are: 6,10,33,72,84 If that bears no fruit, then you can force last_backfill to report complete on one of those OSDs and it'll think it has all the data even though some of it is likely gone. (We can pick one that is farther along... 38 48 and 67 seem to all match.) sage ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 21:53 ` Sage Weil @ 2017-05-24 22:09 ` Łukasz Chrustek 2017-05-24 22:27 ` Sage Weil 0 siblings, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 22:09 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Cześć, > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> Hello, >> >> >> >> >> > This >> >> >> >> osd 6 - isn't startable >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs >> > from this osd to recover any important writes on that osd. >> >> >> osd 10, 37, 72 are startable >> >> > With those started, I'd repeat the original sequence and get a fresh pg >> > query to confirm that it still wants just osd.6. >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other >> > ranodm osd (not one of these ones), import the pg into that osd, and start >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that >> > point. repeat with the same basic process with the other pg. >> >> Here is output from ceph-objectstore-tool - also didn't success: >> >> https://pastebin.com/7XGAHdKH > Hmm, btrfs: > 2017-05-24 23:28:58.547456 7f500948e940 -1 > filestore(/var/lib/ceph/osd/ceph-84) ERROR: > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid > losing new data > You could try setting --osd-use-stale-snap as suggested. Yes... tried... and I simply get rided of 39GB data... > Is it the same error with the other one? Yes: https://pastebin.com/7XGAHdKH > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they > are mid-backfill) and 68 has nothing. Some data is lost unless you can > recovery another OSD with that PG. > The set of OSDs that might have data are: 6,10,33,72,84 > If that bears no fruit, then you can force last_backfill to report how to force last_backfill ? > complete on one of those OSDs and it'll think it has all the data even > though some of it is likely gone. (We can pick one that is farther > along... 38 48 and 67 seem to all match.) > sage -- Pozdrowienia, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 22:09 ` Łukasz Chrustek @ 2017-05-24 22:27 ` Sage Weil 2017-05-24 22:46 ` Łukasz Chrustek 0 siblings, 1 reply; 35+ messages in thread From: Sage Weil @ 2017-05-24 22:27 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 2337 bytes --] On Thu, 25 May 2017, Łukasz Chrustek wrote: > Cześć, > > > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> Hello, > >> > >> >> > >> >> > This > >> >> > >> >> osd 6 - isn't startable > >> > >> > Disk completely 100% dead, or just borken enough that ceph-osd won't > >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs > >> > from this osd to recover any important writes on that osd. > >> > >> >> osd 10, 37, 72 are startable > >> > >> > With those started, I'd repeat the original sequence and get a fresh pg > >> > query to confirm that it still wants just osd.6. > >> > >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other > >> > ranodm osd (not one of these ones), import the pg into that osd, and start > >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that > >> > point. repeat with the same basic process with the other pg. > >> > >> Here is output from ceph-objectstore-tool - also didn't success: > >> > >> https://pastebin.com/7XGAHdKH > > > Hmm, btrfs: > > > 2017-05-24 23:28:58.547456 7f500948e940 -1 > > filestore(/var/lib/ceph/osd/ceph-84) ERROR: > > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid > > losing new data > > > You could try setting --osd-use-stale-snap as suggested. > > Yes... tried... and I simply get rided of 39GB data... What does "get rided" mean? > > > Is it the same error with the other one? > > Yes: https://pastebin.com/7XGAHdKH > > > > > > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they > > are mid-backfill) and 68 has nothing. Some data is lost unless you can > > recovery another OSD with that PG. > > > The set of OSDs that might have data are: 6,10,33,72,84 > > > If that bears no fruit, then you can force last_backfill to report > > how to force last_backfill ? > > > complete on one of those OSDs and it'll think it has all the data even > > though some of it is likely gone. (We can pick one that is farther > > along... 38 48 and 67 seem to all match.) > > > sage > > > > -- > Pozdrowienia, > Łukasz Chrustek > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 22:27 ` Sage Weil @ 2017-05-24 22:46 ` Łukasz Chrustek 2017-05-25 2:06 ` Sage Weil 2017-05-30 13:21 ` Sage Weil 0 siblings, 2 replies; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-24 22:46 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Cześć, > On Thu, 25 May 2017, Łukasz Chrustek wrote: >> Cześć, >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> Hello, >> >> >> >> >> >> >> >> > This >> >> >> >> >> >> osd 6 - isn't startable >> >> >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't >> >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs >> >> > from this osd to recover any important writes on that osd. >> >> >> >> >> osd 10, 37, 72 are startable >> >> >> >> > With those started, I'd repeat the original sequence and get a fresh pg >> >> > query to confirm that it still wants just osd.6. >> >> >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start >> >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that >> >> > point. repeat with the same basic process with the other pg. >> >> >> >> Here is output from ceph-objectstore-tool - also didn't success: >> >> >> >> https://pastebin.com/7XGAHdKH >> >> > Hmm, btrfs: >> >> > 2017-05-24 23:28:58.547456 7f500948e940 -1 >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR: >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid >> > losing new data >> >> > You could try setting --osd-use-stale-snap as suggested. >> >> Yes... tried... and I simply get rided of 39GB data... > What does "get rided" mean? according to this pastebin: https://pastebin.com/QPcpkjg4 ls -R /var/lib/ceph/osd/ceph-33/current/ /var/lib/ceph/osd/ceph-33/current/: commit_op_seq omap /var/lib/ceph/osd/ceph-33/current/omap: 000003.log CURRENT LOCK MANIFEST-000002 earlier there were same data files. >> >> > Is it the same error with the other one? >> >> Yes: https://pastebin.com/7XGAHdKH >> >> >> >> >> > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they >> > are mid-backfill) and 68 has nothing. Some data is lost unless you can >> > recovery another OSD with that PG. >> >> > The set of OSDs that might have data are: 6,10,33,72,84 >> >> > If that bears no fruit, then you can force last_backfill to report >> complete on one of those OSDs and it'll think it has all the data even >> though some of it is likely gone. (We can pick one that is farther >> along... 38 48 and 67 seem to all match. Can You explain what do You mean by 'force last_backfill to report complete' ? The current value for PG 1.60 is MAX and for 1.165 is 1\/db616165\/rbd_data.ed9979641a9d82.000000000001dcee\/head -- Pozdrowienia, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 22:46 ` Łukasz Chrustek @ 2017-05-25 2:06 ` Sage Weil 2017-05-25 11:22 ` Łukasz Chrustek 2017-05-30 13:21 ` Sage Weil 1 sibling, 1 reply; 35+ messages in thread From: Sage Weil @ 2017-05-25 2:06 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 3049 bytes --] On Thu, 25 May 2017, Łukasz Chrustek wrote: > Cześć, > > > On Thu, 25 May 2017, Łukasz Chrustek wrote: > >> Cześć, > >> > >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> >> Hello, > >> >> > >> >> >> > >> >> >> > This > >> >> >> > >> >> >> osd 6 - isn't startable > >> >> > >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't > >> >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs > >> >> > from this osd to recover any important writes on that osd. > >> >> > >> >> >> osd 10, 37, 72 are startable > >> >> > >> >> > With those started, I'd repeat the original sequence and get a fresh pg > >> >> > query to confirm that it still wants just osd.6. > >> >> > >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other > >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start > >> >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that > >> >> > point. repeat with the same basic process with the other pg. > >> >> > >> >> Here is output from ceph-objectstore-tool - also didn't success: > >> >> > >> >> https://pastebin.com/7XGAHdKH > >> > >> > Hmm, btrfs: > >> > >> > 2017-05-24 23:28:58.547456 7f500948e940 -1 > >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR: > >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid > >> > losing new data > >> > >> > You could try setting --osd-use-stale-snap as suggested. > >> > >> Yes... tried... and I simply get rided of 39GB data... > > > What does "get rided" mean? > > according to this pastebin: https://pastebin.com/QPcpkjg4 > > ls -R /var/lib/ceph/osd/ceph-33/current/ > > /var/lib/ceph/osd/ceph-33/current/: > > commit_op_seq omap > > > > /var/lib/ceph/osd/ceph-33/current/omap: > > 000003.log CURRENT LOCK MANIFEST-000002 > > earlier there were same data files. Yeah, looks like all the data was deleted from the device. :( > >> > >> > Is it the same error with the other one? > >> > >> Yes: https://pastebin.com/7XGAHdKH > >> > >> > >> > >> > >> > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they > >> > are mid-backfill) and 68 has nothing. Some data is lost unless you can > >> > recovery another OSD with that PG. > >> > >> > The set of OSDs that might have data are: 6,10,33,72,84 > >> > >> > If that bears no fruit, then you can force last_backfill to report > >> complete on one of those OSDs and it'll think it has all the data even > >> though some of it is likely gone. (We can pick one that is farther > >> along... 38 48 and 67 seem to all match. > > Can You explain what do You mean by 'force last_backfill to report > complete' ? The current value for PG 1.60 is MAX and for 1.165 is > 1\/db616165\/rbd_data.ed9979641a9d82.000000000001dcee\/head ceph-objectstore-tool has a mark-complete operation. Do that one one of the OSDs that has the more advanced last_backfill (like the one above). After you restart the PG should recover. Good luck! sage ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-25 2:06 ` Sage Weil @ 2017-05-25 11:22 ` Łukasz Chrustek 2017-05-29 15:31 ` Łukasz Chrustek 0 siblings, 1 reply; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-25 11:22 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Cześć, > On Thu, 25 May 2017, Łukasz Chrustek wrote: >> Cześć, >> >> > On Thu, 25 May 2017, Łukasz Chrustek wrote: >> >> Cześć, >> >> >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> >> >> Hello, >> >> >> >> >> >> >> >> >> >> >> > This >> >> >> >> >> >> >> >> osd 6 - isn't startable >> >> >> >> >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't >> >> >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs >> >> >> > from this osd to recover any important writes on that osd. >> >> >> >> >> >> >> osd 10, 37, 72 are startable >> >> >> >> >> >> > With those started, I'd repeat the original sequence and get a fresh pg >> >> >> > query to confirm that it still wants just osd.6. >> >> >> >> >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other >> >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start >> >> >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that >> >> >> > point. repeat with the same basic process with the other pg. >> >> >> >> >> >> Here is output from ceph-objectstore-tool - also didn't success: >> >> >> >> >> >> https://pastebin.com/7XGAHdKH >> >> >> >> > Hmm, btrfs: >> >> >> >> > 2017-05-24 23:28:58.547456 7f500948e940 -1 >> >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR: >> >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid >> >> > losing new data >> >> >> >> > You could try setting --osd-use-stale-snap as suggested. >> >> >> >> Yes... tried... and I simply get rided of 39GB data... >> >> > What does "get rided" mean? >> >> according to this pastebin: https://pastebin.com/QPcpkjg4 >> >> ls -R /var/lib/ceph/osd/ceph-33/current/ >> >> /var/lib/ceph/osd/ceph-33/current/: >> >> commit_op_seq omap >> >> >> >> /var/lib/ceph/osd/ceph-33/current/omap: >> >> 000003.log CURRENT LOCK MANIFEST-000002 >> >> earlier there were same data files. > Yeah, looks like all the data was deleted from the device. :( >> >> >> >> > Is it the same error with the other one? >> >> >> >> Yes: https://pastebin.com/7XGAHdKH >> >> >> >> >> >> >> >> >> >> > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they >> >> > are mid-backfill) and 68 has nothing. Some data is lost unless you can >> >> > recovery another OSD with that PG. >> >> >> >> > The set of OSDs that might have data are: 6,10,33,72,84 >> >> >> >> > If that bears no fruit, then you can force last_backfill to report >> >> complete on one of those OSDs and it'll think it has all the data even >> >> though some of it is likely gone. (We can pick one that is farther >> >> along... 38 48 and 67 seem to all match. >> >> Can You explain what do You mean by 'force last_backfill to report >> complete' ? The current value for PG 1.60 is MAX and for 1.165 is >> 1\/db616165\/rbd_data.ed9979641a9d82.000000000001dcee\/head > ceph-objectstore-tool has a mark-complete operation. Do that one one of > the OSDs that has the more advanced last_backfill (like the one above). > After you restart the PG should recover. It is (https://pastebin.com/Jv2DpcB3) pg dump_stuck BEFORE running: ceph-objectstore-tool --debug --op mark-complete --pgid 1.165 --data-path /var/lib/ceph/osd/ceph-48 --journal-path /var/lib/ceph/osd/ceph-48/journal --osd-use-stale-snap as in previous usage of this tool data gone away: [root@stor5 /var/lib/ceph/osd/ceph-48]# du -sh current 20K current [root@stor5 /var/lib/ceph/osd/ceph-48/current]# ls -R .: commit_op_seq nosnap omap/ ./omap: 000011.log CURRENT LOCK LOG LOG.old MANIFEST-000010 after running ceph-objectstore-tool it is: ceph pg dump_stuck ok pg_stat state up up_primary acting acting_primary 1.39 active+remapped+backfilling [11,4,39] 11 [5,39,70] 5 1.1a9 active+remapped+backfilling [11,30,3] 11 [0,30,8] 0 1.b active+remapped+backfilling [11,36,94] 11 [38,97,70] 38 1.12f active+remapped+backfilling [14,11,47] 14 [14,5,69] 14 1.1d2 active+remapped+backfilling [11,2,38] 11 [0,36,49] 0 1.133 active+remapped+backfilling [42,11,83] 42 [42,89,21] 42 40.69 stale+active+undersized+degraded [48] 48 [48] 48 1.9d active+remapped+backfilling [39,2,11] 39 [39,2,86] 39 1.a2 active+remapped+backfilling [11,12,34] 11 [14,35,95] 14 1.10a active+remapped+backfilling [11,2,87] 11 [1,87,81] 1 1.70 active+remapped+backfilling [14,39,11] 14 [14,39,4] 14 1.60 down+remapped+peering [83,69,68] 83 [9] 9 1.eb active+remapped+backfilling [11,18,53] 11 [14,53,69] 14 1.8d active+remapped+backfilling [11,0,30] 11 [36,0,30] 36 1.118 active+remapped+backfilling [34,11,12] 34 [34,20,86] 34 1.121 active+remapped+backfilling [43,11,35] 43 [43,35,2] 43 1.177 active+remapped+backfilling [14,1,11] 14 [14,1,38] 14 1.17c active+remapped+backfilling [5,94,11] 5 [5,94,7] 5 1.16d active+remapped+backfilling [96,11,53] 96 [96,52,9] 96 1.19a active+remapped+backfilling [11,0,14] 11 [0,17,35] 0 1.165 down+peering [39,55,82] 39 [39,55,82] 39 1.1a active+remapped+backfilling [36,52,11] 36 [36,52,96] 36 1.e7 active+remapped+backfilling [11,35,44] 11 [34,44,9] 34 Is there any chance to rescue this cluster ? -- Regards, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-25 11:22 ` Łukasz Chrustek @ 2017-05-29 15:31 ` Łukasz Chrustek 0 siblings, 0 replies; 35+ messages in thread From: Łukasz Chrustek @ 2017-05-29 15:31 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hello, > ./omap: > 000011.log CURRENT LOCK LOG LOG.old MANIFEST-000010 > after running ceph-objectstore-tool it is: > ceph pg dump_stuck > ok > pg_stat state up up_primary acting acting_primary > 1.39 active+remapped+backfilling [11,4,39] 11 [5,39,70] 5 > 1.1a9 active+remapped+backfilling [11,30,3] 11 [0,30,8] 0 > 1.b active+remapped+backfilling [11,36,94] 11 [38,97,70] 38 > 1.12f active+remapped+backfilling [14,11,47] 14 [14,5,69] 14 > 1.1d2 active+remapped+backfilling [11,2,38] 11 [0,36,49] 0 > 1.133 active+remapped+backfilling [42,11,83] 42 [42,89,21] 42 > 40.69 stale+active+undersized+degraded [48] 48 [48] 48 > 1.9d active+remapped+backfilling [39,2,11] 39 [39,2,86] 39 > 1.a2 active+remapped+backfilling [11,12,34] 11 [14,35,95] 14 > 1.10a active+remapped+backfilling [11,2,87] 11 [1,87,81] 1 > 1.70 active+remapped+backfilling [14,39,11] 14 [14,39,4] 14 > 1.60 down+remapped+peering [83,69,68] 83 [9] 9 > 1.eb active+remapped+backfilling [11,18,53] 11 [14,53,69] 14 > 1.8d active+remapped+backfilling [11,0,30] 11 [36,0,30] 36 > 1.118 active+remapped+backfilling [34,11,12] 34 [34,20,86] 34 > 1.121 active+remapped+backfilling [43,11,35] 43 [43,35,2] 43 > 1.177 active+remapped+backfilling [14,1,11] 14 [14,1,38] 14 > 1.17c active+remapped+backfilling [5,94,11] 5 [5,94,7] 5 > 1.16d active+remapped+backfilling [96,11,53] 96 [96,52,9] 96 > 1.19a active+remapped+backfilling [11,0,14] 11 [0,17,35] 0 > 1.165 down+peering [39,55,82] 39 [39,55,82] 39 > 1.1a active+remapped+backfilling [36,52,11] 36 [36,52,96] 36 > 1.e7 active+remapped+backfilling [11,35,44] 11 [34,44,9] 34 > Is there any chance to rescue this cluster ? I have now turned off all OSDs and MONs, after that I turn on two of three MONs to make qourum. On all osds all ceph processes are off. But ceph osd tree see old/false data: https://pastebin.com/pVGLxAPs Why ceph doesn't see that all osds are down ? What can him block like this ? -- Regards, Łukasz Chrustek ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-24 22:46 ` Łukasz Chrustek 2017-05-25 2:06 ` Sage Weil @ 2017-05-30 13:21 ` Sage Weil 2017-06-10 22:45 ` Łukasz Chrustek 1 sibling, 1 reply; 35+ messages in thread From: Sage Weil @ 2017-05-30 13:21 UTC (permalink / raw) To: Łukasz Chrustek; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 2451 bytes --] On Thu, 25 May 2017, Łukasz Chrustek wrote: > Cześć, > > > On Thu, 25 May 2017, Łukasz Chrustek wrote: > >> Cześć, > >> > >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> >> Hello, > >> >> > >> >> >> > >> >> >> > This > >> >> >> > >> >> >> osd 6 - isn't startable > >> >> > >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't > >> >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs > >> >> > from this osd to recover any important writes on that osd. > >> >> > >> >> >> osd 10, 37, 72 are startable > >> >> > >> >> > With those started, I'd repeat the original sequence and get a fresh pg > >> >> > query to confirm that it still wants just osd.6. > >> >> > >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other > >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start > >> >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that > >> >> > point. repeat with the same basic process with the other pg. > >> >> > >> >> Here is output from ceph-objectstore-tool - also didn't success: > >> >> > >> >> https://pastebin.com/7XGAHdKH > >> > >> > Hmm, btrfs: > >> > >> > 2017-05-24 23:28:58.547456 7f500948e940 -1 > >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR: > >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid > >> > losing new data > >> > >> > You could try setting --osd-use-stale-snap as suggested. > >> > >> Yes... tried... and I simply get rided of 39GB data... > > > What does "get rided" mean? > > according to this pastebin: https://pastebin.com/QPcpkjg4 > > ls -R /var/lib/ceph/osd/ceph-33/current/ > > /var/lib/ceph/osd/ceph-33/current/: > > commit_op_seq omap > > > > /var/lib/ceph/osd/ceph-33/current/omap: > > 000003.log CURRENT LOCK MANIFEST-000002 > > earlier there were same data files. Okay, sorry I took a while to get back to you. It looks like I gave you bad advice here! The 'nosnap' files means filestore was operating in non-snapshotting mode, and the --osd-use-stale-snap warning that it would lose data was real... it rolled back to an empty state and threw out the data on the device. :( :( I'm *very* sorry about this! I haven't looked at or worked with the btrfs mode is ages (we don't recommend it and almost nobody uses it) but I should have been paying close attention. What is the state of the cluster now? sage ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs 2017-05-30 13:21 ` Sage Weil @ 2017-06-10 22:45 ` Łukasz Chrustek 0 siblings, 0 replies; 35+ messages in thread From: Łukasz Chrustek @ 2017-06-10 22:45 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hi Sage, > Okay, sorry I took a while to get back to you. Sorry too - most of time I was focused on this problem. > It looks like I gave > you bad advice here! The 'nosnap' files means filestore was > operating in non-snapshotting mode, and the --osd-use-stale-snap > warning that it would lose data was real... it rolled back to an empty > state and threw out the data on the device. :( :( I'm *very* sorry about > this! I haven't looked at or worked with the btrfs mode is ages (we > don't recommend it and almost nobody uses it) but I should have been > paying close attention. Thank You for Your time and effort, it was important to have such help. There were many errors in setup of this cluster. We didn't relize that there could be so much strange things, which were f...ed up... > What is the state of the cluster now? Cluster is dead. After few more days of fight with cluster we decieded to shut it down and we fixed scripts for recovering volumes from turned off ceph cluster (this one: https://github.com/cmgitdream/ceph-rbd-recover-tool) and make it running for jewel version (10.2.7). I setup brand new cluster on other hardware and now images are importing to new cluster. With some direct edition of mysql in openstack systems we didn't had to change everything for our clients from horizon point of view. Once the dust settles, we will add changes to github for this tool. After end of migration we will try to run this dead cluster and make some more agressive action to make it work anyway. -- Regards,, Lukasz ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2017-06-10 22:45 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <175484591.20170523135449@tlen.pl> 2017-05-23 12:48 ` Problem with query and any operation on PGs Łukasz Chrustek 2017-05-23 14:17 ` Sage Weil 2017-05-23 14:43 ` Łukasz Chrustek [not found] ` <1464688590.20170523185052@tlen.pl> 2017-05-23 17:40 ` Sage Weil 2017-05-23 21:43 ` Łukasz Chrustek 2017-05-23 21:48 ` Sage Weil 2017-05-24 13:19 ` Łukasz Chrustek 2017-05-24 13:37 ` Sage Weil 2017-05-24 13:58 ` Łukasz Chrustek 2017-05-24 14:02 ` Sage Weil 2017-05-24 14:18 ` Łukasz Chrustek 2017-05-24 14:47 ` Sage Weil 2017-05-24 15:00 ` Łukasz Chrustek 2017-05-24 15:07 ` Łukasz Chrustek 2017-05-24 15:11 ` Sage Weil 2017-05-24 15:24 ` Łukasz Chrustek 2017-05-24 15:54 ` Łukasz Chrustek 2017-05-24 16:02 ` Łukasz Chrustek 2017-05-24 17:07 ` Łukasz Chrustek 2017-05-24 17:16 ` Sage Weil 2017-05-24 17:28 ` Łukasz Chrustek 2017-05-24 18:16 ` Sage Weil 2017-05-24 19:47 ` Łukasz Chrustek 2017-05-24 17:30 ` Łukasz Chrustek 2017-05-24 17:35 ` Łukasz Chrustek 2017-05-24 21:38 ` Łukasz Chrustek 2017-05-24 21:53 ` Sage Weil 2017-05-24 22:09 ` Łukasz Chrustek 2017-05-24 22:27 ` Sage Weil 2017-05-24 22:46 ` Łukasz Chrustek 2017-05-25 2:06 ` Sage Weil 2017-05-25 11:22 ` Łukasz Chrustek 2017-05-29 15:31 ` Łukasz Chrustek 2017-05-30 13:21 ` Sage Weil 2017-06-10 22:45 ` Łukasz Chrustek
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.