* Problem with query and any operation on PGs
[not found] <175484591.20170523135449@tlen.pl>
@ 2017-05-23 12:48 ` Łukasz Chrustek
2017-05-23 14:17 ` Sage Weil
0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-23 12:48 UTC (permalink / raw)
To: ceph-devel
Cześć,
Hello,
After terrible outage coused by failure of 10Gbit switch, ceph cluster
went to HEALTH_ERR (three whole storage servers go offline in the same time
and didn't back in short time). After cluster recovery two PGs goto to
incomplite state, I can't them query, and can't do with them anything,
what would allow back working cluster back. here is strace of
this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off:
[root@cc1 ~]# rbd ls management-vms
os-mongodb1
os-mongodb1-database
os-gitlab-root
os-mongodb1-database2
os-wiki-root
[root@cc1 ~]# rbd ls volumes
^C
[root@cc1 ~]#
and for all mon hosts (don't put all three here)
[root@cc1 ~]# rbd -m 192.168.128.1 list management-vms
os-mongodb1
os-mongodb1-database
os-gitlab-root
os-mongodb1-database2
os-wiki-root
[root@cc1 ~]# rbd -m 192.168.128.1 list volumes
^C
[root@cc1 ~]#
and all other POOLs from list, except (most important) volumes, I can
list images.
Fanny thing, I can list rbd info for particular image:
[root@cc1 ~]# rbd info
volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497
rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497':
size 20480 MB in 1280 objects
order 24 (16384 kB objects)
block_name_prefix: rbd_data.64a21a0a9acf52
format: 2
features: layering
flags:
parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap
overlap: 3072 MB
but can't list the whole content of pool volumes.
[root@cc1 ~]# ceph osd pool ls
volumes
images
backups
volumes-ssd-intel-s3700
management-vms
.rgw.root
.rgw.control
.rgw
.rgw.gc
.log
.users.uid
.rgw.buckets.index
.users
.rgw.buckets.extra
.rgw.buckets
volumes-cached
cache-ssd
here is ceph osd tree:
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-7 20.88388 root ssd-intel-s3700
-11 3.19995 host ssd-stor1
56 0.79999 osd.56 up 1.00000 1.00000
57 0.79999 osd.57 up 1.00000 1.00000
58 0.79999 osd.58 up 1.00000 1.00000
59 0.79999 osd.59 up 1.00000 1.00000
-9 2.12999 host ssd-stor2
60 0.70999 osd.60 up 1.00000 1.00000
61 0.70999 osd.61 up 1.00000 1.00000
62 0.70999 osd.62 up 1.00000 1.00000
-8 2.12999 host ssd-stor3
63 0.70999 osd.63 up 1.00000 1.00000
64 0.70999 osd.64 up 1.00000 1.00000
65 0.70999 osd.65 up 1.00000 1.00000
-10 4.19998 host ssd-stor4
25 0.70000 osd.25 up 1.00000 1.00000
26 0.70000 osd.26 up 1.00000 1.00000
27 0.70000 osd.27 up 1.00000 1.00000
28 0.70000 osd.28 up 1.00000 1.00000
29 0.70000 osd.29 up 1.00000 1.00000
24 0.70000 osd.24 up 1.00000 1.00000
-12 3.41199 host ssd-stor5
73 0.85300 osd.73 up 1.00000 1.00000
74 0.85300 osd.74 up 1.00000 1.00000
75 0.85300 osd.75 up 1.00000 1.00000
76 0.85300 osd.76 up 1.00000 1.00000
-13 3.41199 host ssd-stor6
77 0.85300 osd.77 up 1.00000 1.00000
78 0.85300 osd.78 up 1.00000 1.00000
79 0.85300 osd.79 up 1.00000 1.00000
80 0.85300 osd.80 up 1.00000 1.00000
-15 2.39999 host ssd-stor7
90 0.79999 osd.90 up 1.00000 1.00000
91 0.79999 osd.91 up 1.00000 1.00000
92 0.79999 osd.92 up 1.00000 1.00000
-1 167.69969 root default
-2 33.99994 host stor1
6 3.39999 osd.6 down 0 1.00000
7 3.39999 osd.7 up 1.00000 1.00000
8 3.39999 osd.8 up 1.00000 1.00000
9 3.39999 osd.9 up 1.00000 1.00000
10 3.39999 osd.10 down 0 1.00000
11 3.39999 osd.11 down 0 1.00000
69 3.39999 osd.69 up 1.00000 1.00000
70 3.39999 osd.70 up 1.00000 1.00000
71 3.39999 osd.71 down 0 1.00000
81 3.39999 osd.81 up 1.00000 1.00000
-3 20.99991 host stor2
13 2.09999 osd.13 up 1.00000 1.00000
12 2.09999 osd.12 up 1.00000 1.00000
14 2.09999 osd.14 up 1.00000 1.00000
15 2.09999 osd.15 up 1.00000 1.00000
16 2.09999 osd.16 up 1.00000 1.00000
17 2.09999 osd.17 up 1.00000 1.00000
18 2.09999 osd.18 down 0 1.00000
19 2.09999 osd.19 up 1.00000 1.00000
20 2.09999 osd.20 up 1.00000 1.00000
21 2.09999 osd.21 up 1.00000 1.00000
-4 25.00000 host stor3
30 2.50000 osd.30 up 1.00000 1.00000
31 2.50000 osd.31 up 1.00000 1.00000
32 2.50000 osd.32 up 1.00000 1.00000
33 2.50000 osd.33 down 0 1.00000
34 2.50000 osd.34 up 1.00000 1.00000
35 2.50000 osd.35 up 1.00000 1.00000
66 2.50000 osd.66 up 1.00000 1.00000
67 2.50000 osd.67 up 1.00000 1.00000
68 2.50000 osd.68 up 1.00000 1.00000
72 2.50000 osd.72 down 0 1.00000
-5 25.00000 host stor4
44 2.50000 osd.44 up 1.00000 1.00000
45 2.50000 osd.45 up 1.00000 1.00000
46 2.50000 osd.46 down 0 1.00000
47 2.50000 osd.47 up 1.00000 1.00000
0 2.50000 osd.0 up 1.00000 1.00000
1 2.50000 osd.1 up 1.00000 1.00000
2 2.50000 osd.2 up 1.00000 1.00000
3 2.50000 osd.3 up 1.00000 1.00000
4 2.50000 osd.4 up 1.00000 1.00000
5 2.50000 osd.5 up 1.00000 1.00000
-6 14.19991 host stor5
48 1.79999 osd.48 up 1.00000 1.00000
49 1.59999 osd.49 up 1.00000 1.00000
50 1.79999 osd.50 up 1.00000 1.00000
51 1.79999 osd.51 down 0 1.00000
52 1.79999 osd.52 up 1.00000 1.00000
53 1.79999 osd.53 up 1.00000 1.00000
54 1.79999 osd.54 up 1.00000 1.00000
55 1.79999 osd.55 up 1.00000 1.00000
-14 14.39999 host stor6
82 1.79999 osd.82 up 1.00000 1.00000
83 1.79999 osd.83 up 1.00000 1.00000
84 1.79999 osd.84 up 1.00000 1.00000
85 1.79999 osd.85 up 1.00000 1.00000
86 1.79999 osd.86 up 1.00000 1.00000
87 1.79999 osd.87 up 1.00000 1.00000
88 1.79999 osd.88 up 1.00000 1.00000
89 1.79999 osd.89 up 1.00000 1.00000
-16 12.59999 host stor7
93 1.79999 osd.93 up 1.00000 1.00000
94 1.79999 osd.94 up 1.00000 1.00000
95 1.79999 osd.95 up 1.00000 1.00000
96 1.79999 osd.96 up 1.00000 1.00000
97 1.79999 osd.97 up 1.00000 1.00000
98 1.79999 osd.98 up 1.00000 1.00000
99 1.79999 osd.99 up 1.00000 1.00000
-17 21.49995 host stor8
22 1.59999 osd.22 up 1.00000 1.00000
23 1.59999 osd.23 up 1.00000 1.00000
36 2.09999 osd.36 up 1.00000 1.00000
37 2.09999 osd.37 up 1.00000 1.00000
38 2.50000 osd.38 up 1.00000 1.00000
39 2.50000 osd.39 up 1.00000 1.00000
40 2.50000 osd.40 up 1.00000 1.00000
41 2.50000 osd.41 down 0 1.00000
42 2.50000 osd.42 up 1.00000 1.00000
43 1.59999 osd.43 up 1.00000 1.00000
[root@cc1 ~]#
and ceph health detail:
ceph health detail | grep down
HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs
peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs
undersized; recovery 176211/14148564 objects degraded (1.245%);
recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set
pg 1.60 is stuck inactive since forever, current state
down+remapped+peering, last acting [66,69,40]
pg 1.165 is stuck inactive since forever, current state
down+remapped+peering, last acting [37]
pg 1.60 is stuck unclean since forever, current state
down+remapped+peering, last acting [66,69,40]
pg 1.165 is stuck unclean since forever, current state
down+remapped+peering, last acting [37]
pg 1.165 is down+remapped+peering, acting [37]
pg 1.60 is down+remapped+peering, acting [66,69,40]
problematic pgs are 1.165 and 1.60.
Please advice how to unblock pool volumes and/or make this two pgs
working - in a last night and day, when we tried to solve this issue
these pgs are for 100% empty from data.
--
Pozdrowienia,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-23 12:48 ` Problem with query and any operation on PGs Łukasz Chrustek
@ 2017-05-23 14:17 ` Sage Weil
2017-05-23 14:43 ` Łukasz Chrustek
[not found] ` <1464688590.20170523185052@tlen.pl>
0 siblings, 2 replies; 35+ messages in thread
From: Sage Weil @ 2017-05-23 14:17 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 11624 bytes --]
On Tue, 23 May 2017, Łukasz Chrustek wrote:
> Cześć,
>
> Hello,
>
> After terrible outage coused by failure of 10Gbit switch, ceph cluster
> went to HEALTH_ERR (three whole storage servers go offline in the same time
> and didn't back in short time). After cluster recovery two PGs goto to
> incomplite state, I can't them query, and can't do with them anything,
The thing where you can't query a PG is because the OSD is throttling
incoming work and the throttle is exhausted (the PG can't do work so it
isn't making progress). A workaround for jewel is to restart the OSD
serving the PG and do the query quickly after that (probably in a loop so
that you catch it after it starts up but before the throttle is
exhausted again). (In luminous this is fixed.)
Once you have the query output ('ceph tell $pgid query') you'll be able to
tell what is preventing the PG from peering.
You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'.
HTH!
sage
> what would allow back working cluster back. here is strace of
> this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off:
>
> [root@cc1 ~]# rbd ls management-vms
> os-mongodb1
> os-mongodb1-database
> os-gitlab-root
> os-mongodb1-database2
> os-wiki-root
> [root@cc1 ~]# rbd ls volumes
> ^C
> [root@cc1 ~]#
>
> and for all mon hosts (don't put all three here)
>
> [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms
> os-mongodb1
> os-mongodb1-database
> os-gitlab-root
> os-mongodb1-database2
> os-wiki-root
> [root@cc1 ~]# rbd -m 192.168.128.1 list volumes
> ^C
> [root@cc1 ~]#
>
> and all other POOLs from list, except (most important) volumes, I can
> list images.
>
> Fanny thing, I can list rbd info for particular image:
>
> [root@cc1 ~]# rbd info
> volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497
> rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497':
> size 20480 MB in 1280 objects
> order 24 (16384 kB objects)
> block_name_prefix: rbd_data.64a21a0a9acf52
> format: 2
> features: layering
> flags:
> parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap
> overlap: 3072 MB
>
> but can't list the whole content of pool volumes.
>
> [root@cc1 ~]# ceph osd pool ls
> volumes
> images
> backups
> volumes-ssd-intel-s3700
> management-vms
> .rgw.root
> .rgw.control
> .rgw
> .rgw.gc
> .log
> .users.uid
> .rgw.buckets.index
> .users
> .rgw.buckets.extra
> .rgw.buckets
> volumes-cached
> cache-ssd
>
> here is ceph osd tree:
>
> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -7 20.88388 root ssd-intel-s3700
> -11 3.19995 host ssd-stor1
> 56 0.79999 osd.56 up 1.00000 1.00000
> 57 0.79999 osd.57 up 1.00000 1.00000
> 58 0.79999 osd.58 up 1.00000 1.00000
> 59 0.79999 osd.59 up 1.00000 1.00000
> -9 2.12999 host ssd-stor2
> 60 0.70999 osd.60 up 1.00000 1.00000
> 61 0.70999 osd.61 up 1.00000 1.00000
> 62 0.70999 osd.62 up 1.00000 1.00000
> -8 2.12999 host ssd-stor3
> 63 0.70999 osd.63 up 1.00000 1.00000
> 64 0.70999 osd.64 up 1.00000 1.00000
> 65 0.70999 osd.65 up 1.00000 1.00000
> -10 4.19998 host ssd-stor4
> 25 0.70000 osd.25 up 1.00000 1.00000
> 26 0.70000 osd.26 up 1.00000 1.00000
> 27 0.70000 osd.27 up 1.00000 1.00000
> 28 0.70000 osd.28 up 1.00000 1.00000
> 29 0.70000 osd.29 up 1.00000 1.00000
> 24 0.70000 osd.24 up 1.00000 1.00000
> -12 3.41199 host ssd-stor5
> 73 0.85300 osd.73 up 1.00000 1.00000
> 74 0.85300 osd.74 up 1.00000 1.00000
> 75 0.85300 osd.75 up 1.00000 1.00000
> 76 0.85300 osd.76 up 1.00000 1.00000
> -13 3.41199 host ssd-stor6
> 77 0.85300 osd.77 up 1.00000 1.00000
> 78 0.85300 osd.78 up 1.00000 1.00000
> 79 0.85300 osd.79 up 1.00000 1.00000
> 80 0.85300 osd.80 up 1.00000 1.00000
> -15 2.39999 host ssd-stor7
> 90 0.79999 osd.90 up 1.00000 1.00000
> 91 0.79999 osd.91 up 1.00000 1.00000
> 92 0.79999 osd.92 up 1.00000 1.00000
> -1 167.69969 root default
> -2 33.99994 host stor1
> 6 3.39999 osd.6 down 0 1.00000
> 7 3.39999 osd.7 up 1.00000 1.00000
> 8 3.39999 osd.8 up 1.00000 1.00000
> 9 3.39999 osd.9 up 1.00000 1.00000
> 10 3.39999 osd.10 down 0 1.00000
> 11 3.39999 osd.11 down 0 1.00000
> 69 3.39999 osd.69 up 1.00000 1.00000
> 70 3.39999 osd.70 up 1.00000 1.00000
> 71 3.39999 osd.71 down 0 1.00000
> 81 3.39999 osd.81 up 1.00000 1.00000
> -3 20.99991 host stor2
> 13 2.09999 osd.13 up 1.00000 1.00000
> 12 2.09999 osd.12 up 1.00000 1.00000
> 14 2.09999 osd.14 up 1.00000 1.00000
> 15 2.09999 osd.15 up 1.00000 1.00000
> 16 2.09999 osd.16 up 1.00000 1.00000
> 17 2.09999 osd.17 up 1.00000 1.00000
> 18 2.09999 osd.18 down 0 1.00000
> 19 2.09999 osd.19 up 1.00000 1.00000
> 20 2.09999 osd.20 up 1.00000 1.00000
> 21 2.09999 osd.21 up 1.00000 1.00000
> -4 25.00000 host stor3
> 30 2.50000 osd.30 up 1.00000 1.00000
> 31 2.50000 osd.31 up 1.00000 1.00000
> 32 2.50000 osd.32 up 1.00000 1.00000
> 33 2.50000 osd.33 down 0 1.00000
> 34 2.50000 osd.34 up 1.00000 1.00000
> 35 2.50000 osd.35 up 1.00000 1.00000
> 66 2.50000 osd.66 up 1.00000 1.00000
> 67 2.50000 osd.67 up 1.00000 1.00000
> 68 2.50000 osd.68 up 1.00000 1.00000
> 72 2.50000 osd.72 down 0 1.00000
> -5 25.00000 host stor4
> 44 2.50000 osd.44 up 1.00000 1.00000
> 45 2.50000 osd.45 up 1.00000 1.00000
> 46 2.50000 osd.46 down 0 1.00000
> 47 2.50000 osd.47 up 1.00000 1.00000
> 0 2.50000 osd.0 up 1.00000 1.00000
> 1 2.50000 osd.1 up 1.00000 1.00000
> 2 2.50000 osd.2 up 1.00000 1.00000
> 3 2.50000 osd.3 up 1.00000 1.00000
> 4 2.50000 osd.4 up 1.00000 1.00000
> 5 2.50000 osd.5 up 1.00000 1.00000
> -6 14.19991 host stor5
> 48 1.79999 osd.48 up 1.00000 1.00000
> 49 1.59999 osd.49 up 1.00000 1.00000
> 50 1.79999 osd.50 up 1.00000 1.00000
> 51 1.79999 osd.51 down 0 1.00000
> 52 1.79999 osd.52 up 1.00000 1.00000
> 53 1.79999 osd.53 up 1.00000 1.00000
> 54 1.79999 osd.54 up 1.00000 1.00000
> 55 1.79999 osd.55 up 1.00000 1.00000
> -14 14.39999 host stor6
> 82 1.79999 osd.82 up 1.00000 1.00000
> 83 1.79999 osd.83 up 1.00000 1.00000
> 84 1.79999 osd.84 up 1.00000 1.00000
> 85 1.79999 osd.85 up 1.00000 1.00000
> 86 1.79999 osd.86 up 1.00000 1.00000
> 87 1.79999 osd.87 up 1.00000 1.00000
> 88 1.79999 osd.88 up 1.00000 1.00000
> 89 1.79999 osd.89 up 1.00000 1.00000
> -16 12.59999 host stor7
> 93 1.79999 osd.93 up 1.00000 1.00000
> 94 1.79999 osd.94 up 1.00000 1.00000
> 95 1.79999 osd.95 up 1.00000 1.00000
> 96 1.79999 osd.96 up 1.00000 1.00000
> 97 1.79999 osd.97 up 1.00000 1.00000
> 98 1.79999 osd.98 up 1.00000 1.00000
> 99 1.79999 osd.99 up 1.00000 1.00000
> -17 21.49995 host stor8
> 22 1.59999 osd.22 up 1.00000 1.00000
> 23 1.59999 osd.23 up 1.00000 1.00000
> 36 2.09999 osd.36 up 1.00000 1.00000
> 37 2.09999 osd.37 up 1.00000 1.00000
> 38 2.50000 osd.38 up 1.00000 1.00000
> 39 2.50000 osd.39 up 1.00000 1.00000
> 40 2.50000 osd.40 up 1.00000 1.00000
> 41 2.50000 osd.41 down 0 1.00000
> 42 2.50000 osd.42 up 1.00000 1.00000
> 43 1.59999 osd.43 up 1.00000 1.00000
> [root@cc1 ~]#
>
> and ceph health detail:
>
> ceph health detail | grep down
> HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs
> peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs
> undersized; recovery 176211/14148564 objects degraded (1.245%);
> recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set
> pg 1.60 is stuck inactive since forever, current state
> down+remapped+peering, last acting [66,69,40]
> pg 1.165 is stuck inactive since forever, current state
> down+remapped+peering, last acting [37]
> pg 1.60 is stuck unclean since forever, current state
> down+remapped+peering, last acting [66,69,40]
> pg 1.165 is stuck unclean since forever, current state
> down+remapped+peering, last acting [37]
> pg 1.165 is down+remapped+peering, acting [37]
> pg 1.60 is down+remapped+peering, acting [66,69,40]
>
>
> problematic pgs are 1.165 and 1.60.
>
> Please advice how to unblock pool volumes and/or make this two pgs
> working - in a last night and day, when we tried to solve this issue
> these pgs are for 100% empty from data.
>
>
>
>
> --
> Pozdrowienia,
> Łukasz Chrustek
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-23 14:17 ` Sage Weil
@ 2017-05-23 14:43 ` Łukasz Chrustek
[not found] ` <1464688590.20170523185052@tlen.pl>
1 sibling, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-23 14:43 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Cześć,
> On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>>
>> Hello,
>>
>> After terrible outage coused by failure of 10Gbit switch, ceph cluster
>> went to HEALTH_ERR (three whole storage servers go offline in the same time
>> and didn't back in short time). After cluster recovery two PGs goto to
>> incomplite state, I can't them query, and can't do with them anything,
> The thing where you can't query a PG is because the OSD is throttling
> incoming work and the throttle is exhausted (the PG can't do work so it
> isn't making progress). A workaround for jewel is to restart the OSD
> serving the PG and do the query quickly after that (probably in a loop so
> that you catch it after it starts up but before the throttle is
> exhausted again). (In luminous this is fixed.)
Thank You for claryfication.
> Once you have the query output ('ceph tell $pgid query') you'll be able to
> tell what is preventing the PG from peering.
Hm.. what kind of loop You sugests ? When I do ceph tell $pgid query
it hangs, not relasing to the console.
> You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'.
it is somehting strange here for 1.165, how it is posible, that acting
is 37 and it isn't in range of [84,38,48] ?:
ceph pg map 1.165
osdmap e114855 pg 1.165 (1.165) -> up [84,38,48] acting [37]
second one is ok, but also no ability to make pg query:
[root@cc1 ~]# ceph pg map 1.60
osdmap e114855 pg 1.60 (1.60) -> up [66,84,40] acting [66,69,40]
do I need to restart all three osds in the same time ?
Can You advice how to unblock access to one of pool for this kind
of command:
[root@cc1 ~]# rbd ls volumes
^C
strace for this is here: https://pastebin.com/hpbDg6gP - this time it
hangs on some futex function. Are this cases (pg query hang and this
rbd ls problem) are connected each other ?
If I find solution for this, You will make my day (and night :) ).
Regards
Lukasz
> HTH!
> sage
>> what would allow back working cluster back. here is strace of
>> this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off:
>>
>> [root@cc1 ~]# rbd ls management-vms
>> os-mongodb1
>> os-mongodb1-database
>> os-gitlab-root
>> os-mongodb1-database2
>> os-wiki-root
>> [root@cc1 ~]# rbd ls volumes
>> ^C
>> [root@cc1 ~]#
>>
>> and for all mon hosts (don't put all three here)
>>
>> [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms
>> os-mongodb1
>> os-mongodb1-database
>> os-gitlab-root
>> os-mongodb1-database2
>> os-wiki-root
>> [root@cc1 ~]# rbd -m 192.168.128.1 list volumes
>> ^C
>> [root@cc1 ~]#
>>
>> and all other POOLs from list, except (most important) volumes, I can
>> list images.
>>
>> Fanny thing, I can list rbd info for particular image:
>>
>> [root@cc1 ~]# rbd info
>> volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497
>> rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497':
>> size 20480 MB in 1280 objects
>> order 24 (16384 kB objects)
>> block_name_prefix: rbd_data.64a21a0a9acf52
>> format: 2
>> features: layering
>> flags:
>> parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap
>> overlap: 3072 MB
>>
>> but can't list the whole content of pool volumes.
>>
>> [root@cc1 ~]# ceph osd pool ls
>> volumes
>> images
>> backups
>> volumes-ssd-intel-s3700
>> management-vms
>> .rgw.root
>> .rgw.control
>> .rgw
>> .rgw.gc
>> .log
>> .users.uid
>> .rgw.buckets.index
>> .users
>> .rgw.buckets.extra
>> .rgw.buckets
>> volumes-cached
>> cache-ssd
>>
>> here is ceph osd tree:
>>
>> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -7 20.88388 root ssd-intel-s3700
>> -11 3.19995 host ssd-stor1
>> 56 0.79999 osd.56 up 1.00000 1.00000
>> 57 0.79999 osd.57 up 1.00000 1.00000
>> 58 0.79999 osd.58 up 1.00000 1.00000
>> 59 0.79999 osd.59 up 1.00000 1.00000
>> -9 2.12999 host ssd-stor2
>> 60 0.70999 osd.60 up 1.00000 1.00000
>> 61 0.70999 osd.61 up 1.00000 1.00000
>> 62 0.70999 osd.62 up 1.00000 1.00000
>> -8 2.12999 host ssd-stor3
>> 63 0.70999 osd.63 up 1.00000 1.00000
>> 64 0.70999 osd.64 up 1.00000 1.00000
>> 65 0.70999 osd.65 up 1.00000 1.00000
>> -10 4.19998 host ssd-stor4
>> 25 0.70000 osd.25 up 1.00000 1.00000
>> 26 0.70000 osd.26 up 1.00000 1.00000
>> 27 0.70000 osd.27 up 1.00000 1.00000
>> 28 0.70000 osd.28 up 1.00000 1.00000
>> 29 0.70000 osd.29 up 1.00000 1.00000
>> 24 0.70000 osd.24 up 1.00000 1.00000
>> -12 3.41199 host ssd-stor5
>> 73 0.85300 osd.73 up 1.00000 1.00000
>> 74 0.85300 osd.74 up 1.00000 1.00000
>> 75 0.85300 osd.75 up 1.00000 1.00000
>> 76 0.85300 osd.76 up 1.00000 1.00000
>> -13 3.41199 host ssd-stor6
>> 77 0.85300 osd.77 up 1.00000 1.00000
>> 78 0.85300 osd.78 up 1.00000 1.00000
>> 79 0.85300 osd.79 up 1.00000 1.00000
>> 80 0.85300 osd.80 up 1.00000 1.00000
>> -15 2.39999 host ssd-stor7
>> 90 0.79999 osd.90 up 1.00000 1.00000
>> 91 0.79999 osd.91 up 1.00000 1.00000
>> 92 0.79999 osd.92 up 1.00000 1.00000
>> -1 167.69969 root default
>> -2 33.99994 host stor1
>> 6 3.39999 osd.6 down 0 1.00000
>> 7 3.39999 osd.7 up 1.00000 1.00000
>> 8 3.39999 osd.8 up 1.00000 1.00000
>> 9 3.39999 osd.9 up 1.00000 1.00000
>> 10 3.39999 osd.10 down 0 1.00000
>> 11 3.39999 osd.11 down 0 1.00000
>> 69 3.39999 osd.69 up 1.00000 1.00000
>> 70 3.39999 osd.70 up 1.00000 1.00000
>> 71 3.39999 osd.71 down 0 1.00000
>> 81 3.39999 osd.81 up 1.00000 1.00000
>> -3 20.99991 host stor2
>> 13 2.09999 osd.13 up 1.00000 1.00000
>> 12 2.09999 osd.12 up 1.00000 1.00000
>> 14 2.09999 osd.14 up 1.00000 1.00000
>> 15 2.09999 osd.15 up 1.00000 1.00000
>> 16 2.09999 osd.16 up 1.00000 1.00000
>> 17 2.09999 osd.17 up 1.00000 1.00000
>> 18 2.09999 osd.18 down 0 1.00000
>> 19 2.09999 osd.19 up 1.00000 1.00000
>> 20 2.09999 osd.20 up 1.00000 1.00000
>> 21 2.09999 osd.21 up 1.00000 1.00000
>> -4 25.00000 host stor3
>> 30 2.50000 osd.30 up 1.00000 1.00000
>> 31 2.50000 osd.31 up 1.00000 1.00000
>> 32 2.50000 osd.32 up 1.00000 1.00000
>> 33 2.50000 osd.33 down 0 1.00000
>> 34 2.50000 osd.34 up 1.00000 1.00000
>> 35 2.50000 osd.35 up 1.00000 1.00000
>> 66 2.50000 osd.66 up 1.00000 1.00000
>> 67 2.50000 osd.67 up 1.00000 1.00000
>> 68 2.50000 osd.68 up 1.00000 1.00000
>> 72 2.50000 osd.72 down 0 1.00000
>> -5 25.00000 host stor4
>> 44 2.50000 osd.44 up 1.00000 1.00000
>> 45 2.50000 osd.45 up 1.00000 1.00000
>> 46 2.50000 osd.46 down 0 1.00000
>> 47 2.50000 osd.47 up 1.00000 1.00000
>> 0 2.50000 osd.0 up 1.00000 1.00000
>> 1 2.50000 osd.1 up 1.00000 1.00000
>> 2 2.50000 osd.2 up 1.00000 1.00000
>> 3 2.50000 osd.3 up 1.00000 1.00000
>> 4 2.50000 osd.4 up 1.00000 1.00000
>> 5 2.50000 osd.5 up 1.00000 1.00000
>> -6 14.19991 host stor5
>> 48 1.79999 osd.48 up 1.00000 1.00000
>> 49 1.59999 osd.49 up 1.00000 1.00000
>> 50 1.79999 osd.50 up 1.00000 1.00000
>> 51 1.79999 osd.51 down 0 1.00000
>> 52 1.79999 osd.52 up 1.00000 1.00000
>> 53 1.79999 osd.53 up 1.00000 1.00000
>> 54 1.79999 osd.54 up 1.00000 1.00000
>> 55 1.79999 osd.55 up 1.00000 1.00000
>> -14 14.39999 host stor6
>> 82 1.79999 osd.82 up 1.00000 1.00000
>> 83 1.79999 osd.83 up 1.00000 1.00000
>> 84 1.79999 osd.84 up 1.00000 1.00000
>> 85 1.79999 osd.85 up 1.00000 1.00000
>> 86 1.79999 osd.86 up 1.00000 1.00000
>> 87 1.79999 osd.87 up 1.00000 1.00000
>> 88 1.79999 osd.88 up 1.00000 1.00000
>> 89 1.79999 osd.89 up 1.00000 1.00000
>> -16 12.59999 host stor7
>> 93 1.79999 osd.93 up 1.00000 1.00000
>> 94 1.79999 osd.94 up 1.00000 1.00000
>> 95 1.79999 osd.95 up 1.00000 1.00000
>> 96 1.79999 osd.96 up 1.00000 1.00000
>> 97 1.79999 osd.97 up 1.00000 1.00000
>> 98 1.79999 osd.98 up 1.00000 1.00000
>> 99 1.79999 osd.99 up 1.00000 1.00000
>> -17 21.49995 host stor8
>> 22 1.59999 osd.22 up 1.00000 1.00000
>> 23 1.59999 osd.23 up 1.00000 1.00000
>> 36 2.09999 osd.36 up 1.00000 1.00000
>> 37 2.09999 osd.37 up 1.00000 1.00000
>> 38 2.50000 osd.38 up 1.00000 1.00000
>> 39 2.50000 osd.39 up 1.00000 1.00000
>> 40 2.50000 osd.40 up 1.00000 1.00000
>> 41 2.50000 osd.41 down 0 1.00000
>> 42 2.50000 osd.42 up 1.00000 1.00000
>> 43 1.59999 osd.43 up 1.00000 1.00000
>> [root@cc1 ~]#
>>
>> and ceph health detail:
>>
>> ceph health detail | grep down
>> HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs
>> peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs
>> undersized; recovery 176211/14148564 objects degraded (1.245%);
>> recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set
>> pg 1.60 is stuck inactive since forever, current state
>> down+remapped+peering, last acting [66,69,40]
>> pg 1.165 is stuck inactive since forever, current state
>> down+remapped+peering, last acting [37]
>> pg 1.60 is stuck unclean since forever, current state
>> down+remapped+peering, last acting [66,69,40]
>> pg 1.165 is stuck unclean since forever, current state
>> down+remapped+peering, last acting [37]
>> pg 1.165 is down+remapped+peering, acting [37]
>> pg 1.60 is down+remapped+peering, acting [66,69,40]
>>
>>
>> problematic pgs are 1.165 and 1.60.
>>
>> Please advice how to unblock pool volumes and/or make this two pgs
>> working - in a last night and day, when we tried to solve this issue
>> these pgs are for 100% empty from data.
>>
>>
>>
>>
>> --
>> Pozdrowienia,
>> Łukasz Chrustek
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
--
Pozdrowienia,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
[not found] ` <1464688590.20170523185052@tlen.pl>
@ 2017-05-23 17:40 ` Sage Weil
2017-05-23 21:43 ` Łukasz Chrustek
0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-23 17:40 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 13855 bytes --]
On Tue, 23 May 2017, Łukasz Chrustek wrote:
> I'm not sleeping for over 30 hours, and still can't find solution. I
> did, as You wrote, but turning off this
> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
The important bit is:
"blocked": "peering is blocked due to down osds",
"down_osds_we_would_probe": [
6,
10,
33,
37,
72
],
"peering_blocked_by": [
{
"osd": 6,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let
us proceed"
},
{
"osd": 10,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let
us proceed"
},
{
"osd": 37,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let
us proceed"
},
{
"osd": 72,
"current_lost_at": 113771,
"comment": "starting or marking this osd lost may let
us proceed"
}
]
},
Are any of those OSDs startable?
sage
>
> Regards
> Lukasz Chrustek
>
>
> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >>
> >> Hello,
> >>
> >> After terrible outage coused by failure of 10Gbit switch, ceph cluster
> >> went to HEALTH_ERR (three whole storage servers go offline in the same time
> >> and didn't back in short time). After cluster recovery two PGs goto to
> >> incomplite state, I can't them query, and can't do with them anything,
>
> > The thing where you can't query a PG is because the OSD is throttling
> > incoming work and the throttle is exhausted (the PG can't do work so it
> > isn't making progress). A workaround for jewel is to restart the OSD
> > serving the PG and do the query quickly after that (probably in a loop so
> > that you catch it after it starts up but before the throttle is
> > exhausted again). (In luminous this is fixed.)
>
> > Once you have the query output ('ceph tell $pgid query') you'll be able to
> > tell what is preventing the PG from peering.
>
> > You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'.
>
> > HTH!
> > sage
>
>
> >> what would allow back working cluster back. here is strace of
> >> this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off:
> >>
> >> [root@cc1 ~]# rbd ls management-vms
> >> os-mongodb1
> >> os-mongodb1-database
> >> os-gitlab-root
> >> os-mongodb1-database2
> >> os-wiki-root
> >> [root@cc1 ~]# rbd ls volumes
> >> ^C
> >> [root@cc1 ~]#
> >>
> >> and for all mon hosts (don't put all three here)
> >>
> >> [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms
> >> os-mongodb1
> >> os-mongodb1-database
> >> os-gitlab-root
> >> os-mongodb1-database2
> >> os-wiki-root
> >> [root@cc1 ~]# rbd -m 192.168.128.1 list volumes
> >> ^C
> >> [root@cc1 ~]#
> >>
> >> and all other POOLs from list, except (most important) volumes, I can
> >> list images.
> >>
> >> Fanny thing, I can list rbd info for particular image:
> >>
> >> [root@cc1 ~]# rbd info
> >> volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497
> >> rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497':
> >> size 20480 MB in 1280 objects
> >> order 24 (16384 kB objects)
> >> block_name_prefix: rbd_data.64a21a0a9acf52
> >> format: 2
> >> features: layering
> >> flags:
> >> parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap
> >> overlap: 3072 MB
> >>
> >> but can't list the whole content of pool volumes.
> >>
> >> [root@cc1 ~]# ceph osd pool ls
> >> volumes
> >> images
> >> backups
> >> volumes-ssd-intel-s3700
> >> management-vms
> >> .rgw.root
> >> .rgw.control
> >> .rgw
> >> .rgw.gc
> >> .log
> >> .users.uid
> >> .rgw.buckets.index
> >> .users
> >> .rgw.buckets.extra
> >> .rgw.buckets
> >> volumes-cached
> >> cache-ssd
> >>
> >> here is ceph osd tree:
> >>
> >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> >> -7 20.88388 root ssd-intel-s3700
> >> -11 3.19995 host ssd-stor1
> >> 56 0.79999 osd.56 up 1.00000 1.00000
> >> 57 0.79999 osd.57 up 1.00000 1.00000
> >> 58 0.79999 osd.58 up 1.00000 1.00000
> >> 59 0.79999 osd.59 up 1.00000 1.00000
> >> -9 2.12999 host ssd-stor2
> >> 60 0.70999 osd.60 up 1.00000 1.00000
> >> 61 0.70999 osd.61 up 1.00000 1.00000
> >> 62 0.70999 osd.62 up 1.00000 1.00000
> >> -8 2.12999 host ssd-stor3
> >> 63 0.70999 osd.63 up 1.00000 1.00000
> >> 64 0.70999 osd.64 up 1.00000 1.00000
> >> 65 0.70999 osd.65 up 1.00000 1.00000
> >> -10 4.19998 host ssd-stor4
> >> 25 0.70000 osd.25 up 1.00000 1.00000
> >> 26 0.70000 osd.26 up 1.00000 1.00000
> >> 27 0.70000 osd.27 up 1.00000 1.00000
> >> 28 0.70000 osd.28 up 1.00000 1.00000
> >> 29 0.70000 osd.29 up 1.00000 1.00000
> >> 24 0.70000 osd.24 up 1.00000 1.00000
> >> -12 3.41199 host ssd-stor5
> >> 73 0.85300 osd.73 up 1.00000 1.00000
> >> 74 0.85300 osd.74 up 1.00000 1.00000
> >> 75 0.85300 osd.75 up 1.00000 1.00000
> >> 76 0.85300 osd.76 up 1.00000 1.00000
> >> -13 3.41199 host ssd-stor6
> >> 77 0.85300 osd.77 up 1.00000 1.00000
> >> 78 0.85300 osd.78 up 1.00000 1.00000
> >> 79 0.85300 osd.79 up 1.00000 1.00000
> >> 80 0.85300 osd.80 up 1.00000 1.00000
> >> -15 2.39999 host ssd-stor7
> >> 90 0.79999 osd.90 up 1.00000 1.00000
> >> 91 0.79999 osd.91 up 1.00000 1.00000
> >> 92 0.79999 osd.92 up 1.00000 1.00000
> >> -1 167.69969 root default
> >> -2 33.99994 host stor1
> >> 6 3.39999 osd.6 down 0 1.00000
> >> 7 3.39999 osd.7 up 1.00000 1.00000
> >> 8 3.39999 osd.8 up 1.00000 1.00000
> >> 9 3.39999 osd.9 up 1.00000 1.00000
> >> 10 3.39999 osd.10 down 0 1.00000
> >> 11 3.39999 osd.11 down 0 1.00000
> >> 69 3.39999 osd.69 up 1.00000 1.00000
> >> 70 3.39999 osd.70 up 1.00000 1.00000
> >> 71 3.39999 osd.71 down 0 1.00000
> >> 81 3.39999 osd.81 up 1.00000 1.00000
> >> -3 20.99991 host stor2
> >> 13 2.09999 osd.13 up 1.00000 1.00000
> >> 12 2.09999 osd.12 up 1.00000 1.00000
> >> 14 2.09999 osd.14 up 1.00000 1.00000
> >> 15 2.09999 osd.15 up 1.00000 1.00000
> >> 16 2.09999 osd.16 up 1.00000 1.00000
> >> 17 2.09999 osd.17 up 1.00000 1.00000
> >> 18 2.09999 osd.18 down 0 1.00000
> >> 19 2.09999 osd.19 up 1.00000 1.00000
> >> 20 2.09999 osd.20 up 1.00000 1.00000
> >> 21 2.09999 osd.21 up 1.00000 1.00000
> >> -4 25.00000 host stor3
> >> 30 2.50000 osd.30 up 1.00000 1.00000
> >> 31 2.50000 osd.31 up 1.00000 1.00000
> >> 32 2.50000 osd.32 up 1.00000 1.00000
> >> 33 2.50000 osd.33 down 0 1.00000
> >> 34 2.50000 osd.34 up 1.00000 1.00000
> >> 35 2.50000 osd.35 up 1.00000 1.00000
> >> 66 2.50000 osd.66 up 1.00000 1.00000
> >> 67 2.50000 osd.67 up 1.00000 1.00000
> >> 68 2.50000 osd.68 up 1.00000 1.00000
> >> 72 2.50000 osd.72 down 0 1.00000
> >> -5 25.00000 host stor4
> >> 44 2.50000 osd.44 up 1.00000 1.00000
> >> 45 2.50000 osd.45 up 1.00000 1.00000
> >> 46 2.50000 osd.46 down 0 1.00000
> >> 47 2.50000 osd.47 up 1.00000 1.00000
> >> 0 2.50000 osd.0 up 1.00000 1.00000
> >> 1 2.50000 osd.1 up 1.00000 1.00000
> >> 2 2.50000 osd.2 up 1.00000 1.00000
> >> 3 2.50000 osd.3 up 1.00000 1.00000
> >> 4 2.50000 osd.4 up 1.00000 1.00000
> >> 5 2.50000 osd.5 up 1.00000 1.00000
> >> -6 14.19991 host stor5
> >> 48 1.79999 osd.48 up 1.00000 1.00000
> >> 49 1.59999 osd.49 up 1.00000 1.00000
> >> 50 1.79999 osd.50 up 1.00000 1.00000
> >> 51 1.79999 osd.51 down 0 1.00000
> >> 52 1.79999 osd.52 up 1.00000 1.00000
> >> 53 1.79999 osd.53 up 1.00000 1.00000
> >> 54 1.79999 osd.54 up 1.00000 1.00000
> >> 55 1.79999 osd.55 up 1.00000 1.00000
> >> -14 14.39999 host stor6
> >> 82 1.79999 osd.82 up 1.00000 1.00000
> >> 83 1.79999 osd.83 up 1.00000 1.00000
> >> 84 1.79999 osd.84 up 1.00000 1.00000
> >> 85 1.79999 osd.85 up 1.00000 1.00000
> >> 86 1.79999 osd.86 up 1.00000 1.00000
> >> 87 1.79999 osd.87 up 1.00000 1.00000
> >> 88 1.79999 osd.88 up 1.00000 1.00000
> >> 89 1.79999 osd.89 up 1.00000 1.00000
> >> -16 12.59999 host stor7
> >> 93 1.79999 osd.93 up 1.00000 1.00000
> >> 94 1.79999 osd.94 up 1.00000 1.00000
> >> 95 1.79999 osd.95 up 1.00000 1.00000
> >> 96 1.79999 osd.96 up 1.00000 1.00000
> >> 97 1.79999 osd.97 up 1.00000 1.00000
> >> 98 1.79999 osd.98 up 1.00000 1.00000
> >> 99 1.79999 osd.99 up 1.00000 1.00000
> >> -17 21.49995 host stor8
> >> 22 1.59999 osd.22 up 1.00000 1.00000
> >> 23 1.59999 osd.23 up 1.00000 1.00000
> >> 36 2.09999 osd.36 up 1.00000 1.00000
> >> 37 2.09999 osd.37 up 1.00000 1.00000
> >> 38 2.50000 osd.38 up 1.00000 1.00000
> >> 39 2.50000 osd.39 up 1.00000 1.00000
> >> 40 2.50000 osd.40 up 1.00000 1.00000
> >> 41 2.50000 osd.41 down 0 1.00000
> >> 42 2.50000 osd.42 up 1.00000 1.00000
> >> 43 1.59999 osd.43 up 1.00000 1.00000
> >> [root@cc1 ~]#
> >>
> >> and ceph health detail:
> >>
> >> ceph health detail | grep down
> >> HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs
> >> peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs
> >> undersized; recovery 176211/14148564 objects degraded (1.245%);
> >> recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set
> >> pg 1.60 is stuck inactive since forever, current state
> >> down+remapped+peering, last acting [66,69,40]
> >> pg 1.165 is stuck inactive since forever, current state
> >> down+remapped+peering, last acting [37]
> >> pg 1.60 is stuck unclean since forever, current state
> >> down+remapped+peering, last acting [66,69,40]
> >> pg 1.165 is stuck unclean since forever, current state
> >> down+remapped+peering, last acting [37]
> >> pg 1.165 is down+remapped+peering, acting [37]
> >> pg 1.60 is down+remapped+peering, acting [66,69,40]
> >>
> >>
> >> problematic pgs are 1.165 and 1.60.
> >>
> >> Please advice how to unblock pool volumes and/or make this two pgs
> >> working - in a last night and day, when we tried to solve this issue
> >> these pgs are for 100% empty from data.
> >>
> >>
> >>
> >>
> >> --
> >> Pozdrowienia,
> >> Łukasz Chrustek
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >>
>
>
>
> --
> Pozdrowienia,
> Łukasz Chrustek
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-23 17:40 ` Sage Weil
@ 2017-05-23 21:43 ` Łukasz Chrustek
2017-05-23 21:48 ` Sage Weil
0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-23 21:43 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Cześć,
> On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> I'm not sleeping for over 30 hours, and still can't find solution. I
>> did, as You wrote, but turning off this
>> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
> The important bit is:
> "blocked": "peering is blocked due to down osds",
> "down_osds_we_would_probe": [
> 6,
> 10,
> 33,
> 37,
> 72
> ],
> "peering_blocked_by": [
> {
> "osd": 6,
> "current_lost_at": 0,
> "comment": "starting or marking this osd lost may let
> us proceed"
> },
> {
> "osd": 10,
> "current_lost_at": 0,
> "comment": "starting or marking this osd lost may let
> us proceed"
> },
> {
> "osd": 37,
> "current_lost_at": 0,
> "comment": "starting or marking this osd lost may let
> us proceed"
> },
> {
> "osd": 72,
> "current_lost_at": 113771,
> "comment": "starting or marking this osd lost may let
> us proceed"
> }
> ]
> },
> Are any of those OSDs startable?
They were all up and running - but I decided to shut them down and out
them from ceph, now it looks like ceph working ok, but still two PGs
are in down state, how to get rid of it ?
ceph health detail
HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
pg 1.165 is stuck inactive since forever, current state down+remapped+peering, last acting [38,48]
pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
pg 1.60 is down+remapped+peering, acting [66,40]
pg 1.165 is down+remapped+peering, acting [38,48]
[root@cc1 ~]# ceph -s
cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
health HEALTH_WARN
2 pgs down
2 pgs peering
2 pgs stuck inactive
monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
election epoch 872, quorum 0,1,2 cc1,cc2,cc3
osdmap e115175: 100 osds: 88 up, 86 in; 2 remapped pgs
pgmap v67583069: 3520 pgs, 17 pools, 26675 GB data, 4849 kobjects
76638 GB used, 107 TB / 182 TB avail
3515 active+clean
3 active+clean+scrubbing+deep
2 down+remapped+peering
client io 0 B/s rd, 869 kB/s wr, 14 op/s rd, 113 op/s wr
--
Regards
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-23 21:43 ` Łukasz Chrustek
@ 2017-05-23 21:48 ` Sage Weil
2017-05-24 13:19 ` Łukasz Chrustek
0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-23 21:48 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3462 bytes --]
On Tue, 23 May 2017, Łukasz Chrustek wrote:
> Cześć,
>
> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> I'm not sleeping for over 30 hours, and still can't find solution. I
> >> did, as You wrote, but turning off this
> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>
> > The important bit is:
>
> > "blocked": "peering is blocked due to down osds",
> > "down_osds_we_would_probe": [
> > 6,
> > 10,
> > 33,
> > 37,
> > 72
> > ],
> > "peering_blocked_by": [
> > {
> > "osd": 6,
> > "current_lost_at": 0,
> > "comment": "starting or marking this osd lost may let
> > us proceed"
> > },
> > {
> > "osd": 10,
> > "current_lost_at": 0,
> > "comment": "starting or marking this osd lost may let
> > us proceed"
> > },
> > {
> > "osd": 37,
> > "current_lost_at": 0,
> > "comment": "starting or marking this osd lost may let
> > us proceed"
> > },
> > {
> > "osd": 72,
> > "current_lost_at": 113771,
> > "comment": "starting or marking this osd lost may let
> > us proceed"
> > }
> > ]
> > },
>
> > Are any of those OSDs startable?
>
> They were all up and running - but I decided to shut them down and out
> them from ceph, now it looks like ceph working ok, but still two PGs
> are in down state, how to get rid of it ?
If you haven't deleted the data, you should start the OSDs back up.
If they are partially damanged you can use ceph-objectstore-tool to
extract just the PGs in question to make sure you haven't lost anything,
inject them on some other OSD(s) and restart those, and *then* mark the
bad OSDs as 'lost'.
If all else fails, you can just mark those OSDs 'lost', but in doing so
you might be telling the cluster to lose data.
The best thing to do is definitely to get those OSDs started again.
sage
>
> ceph health detail
> HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
> pg 1.165 is stuck inactive since forever, current state down+remapped+peering, last acting [38,48]
> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
> pg 1.60 is down+remapped+peering, acting [66,40]
> pg 1.165 is down+remapped+peering, acting [38,48]
> [root@cc1 ~]# ceph -s
> cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
> health HEALTH_WARN
> 2 pgs down
> 2 pgs peering
> 2 pgs stuck inactive
> monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
> election epoch 872, quorum 0,1,2 cc1,cc2,cc3
> osdmap e115175: 100 osds: 88 up, 86 in; 2 remapped pgs
> pgmap v67583069: 3520 pgs, 17 pools, 26675 GB data, 4849 kobjects
> 76638 GB used, 107 TB / 182 TB avail
> 3515 active+clean
> 3 active+clean+scrubbing+deep
> 2 down+remapped+peering
> client io 0 B/s rd, 869 kB/s wr, 14 op/s rd, 113 op/s wr
>
> --
> Regards
> Łukasz Chrustek
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-23 21:48 ` Sage Weil
@ 2017-05-24 13:19 ` Łukasz Chrustek
2017-05-24 13:37 ` Sage Weil
0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 13:19 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Cześć,
> On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>>
>> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> I'm not sleeping for over 30 hours, and still can't find solution. I
>> >> did, as You wrote, but turning off this
>> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>>
>> > The important bit is:
>>
>> > "blocked": "peering is blocked due to down osds",
>> > "down_osds_we_would_probe": [
>> > 6,
>> > 10,
>> > 33,
>> > 37,
>> > 72
>> > ],
>> > "peering_blocked_by": [
>> > {
>> > "osd": 6,
>> > "current_lost_at": 0,
>> > "comment": "starting or marking this osd lost may let
>> > us proceed"
>> > },
>> > {
>> > "osd": 10,
>> > "current_lost_at": 0,
>> > "comment": "starting or marking this osd lost may let
>> > us proceed"
>> > },
>> > {
>> > "osd": 37,
>> > "current_lost_at": 0,
>> > "comment": "starting or marking this osd lost may let
>> > us proceed"
>> > },
>> > {
>> > "osd": 72,
>> > "current_lost_at": 113771,
>> > "comment": "starting or marking this osd lost may let
>> > us proceed"
>> > }
>> > ]
>> > },
>>
>> > Are any of those OSDs startable?
>>
>> They were all up and running - but I decided to shut them down and out
>> them from ceph, now it looks like ceph working ok, but still two PGs
>> are in down state, how to get rid of it ?
> If you haven't deleted the data, you should start the OSDs back up.
> If they are partially damanged you can use ceph-objectstore-tool to
> extract just the PGs in question to make sure you haven't lost anything,
> inject them on some other OSD(s) and restart those, and *then* mark the
> bad OSDs as 'lost'.
> If all else fails, you can just mark those OSDs 'lost', but in doing so
> you might be telling the cluster to lose data.
> The best thing to do is definitely to get those OSDs started again.
Now situation looks like this:
[root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
size 500 GB in 128000 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.ed9d394a851426
format: 2
features: layering
flags:
[root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
(output cutted)
rbd_data.ed9d394a851426.000000000000447c
rbd_data.ed9d394a851426.0000000000010857
rbd_data.ed9d394a851426.000000000000ec8b
rbd_data.ed9d394a851426.000000000000fa43
rbd_data.ed9d394a851426.000000000001ef2d
^C
it hangs on this object and isn't going further. rbd cp also hangs...
rbd map - also...
can You advice what can be solution for this case ?
--
Regards,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 13:19 ` Łukasz Chrustek
@ 2017-05-24 13:37 ` Sage Weil
2017-05-24 13:58 ` Łukasz Chrustek
0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-24 13:37 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3882 bytes --]
On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Cześć,
>
> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >>
> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I
> >> >> did, as You wrote, but turning off this
> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
> >>
> >> > The important bit is:
> >>
> >> > "blocked": "peering is blocked due to down osds",
> >> > "down_osds_we_would_probe": [
> >> > 6,
> >> > 10,
> >> > 33,
> >> > 37,
> >> > 72
> >> > ],
> >> > "peering_blocked_by": [
> >> > {
> >> > "osd": 6,
> >> > "current_lost_at": 0,
> >> > "comment": "starting or marking this osd lost may let
> >> > us proceed"
> >> > },
> >> > {
> >> > "osd": 10,
> >> > "current_lost_at": 0,
> >> > "comment": "starting or marking this osd lost may let
> >> > us proceed"
> >> > },
> >> > {
> >> > "osd": 37,
> >> > "current_lost_at": 0,
> >> > "comment": "starting or marking this osd lost may let
> >> > us proceed"
> >> > },
> >> > {
> >> > "osd": 72,
> >> > "current_lost_at": 113771,
> >> > "comment": "starting or marking this osd lost may let
> >> > us proceed"
> >> > }
> >> > ]
> >> > },
> >>
> >> > Are any of those OSDs startable?
> >>
> >> They were all up and running - but I decided to shut them down and out
> >> them from ceph, now it looks like ceph working ok, but still two PGs
> >> are in down state, how to get rid of it ?
>
> > If you haven't deleted the data, you should start the OSDs back up.
>
> > If they are partially damanged you can use ceph-objectstore-tool to
> > extract just the PGs in question to make sure you haven't lost anything,
> > inject them on some other OSD(s) and restart those, and *then* mark the
> > bad OSDs as 'lost'.
>
> > If all else fails, you can just mark those OSDs 'lost', but in doing so
> > you might be telling the cluster to lose data.
>
> > The best thing to do is definitely to get those OSDs started again.
>
> Now situation looks like this:
>
> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
> size 500 GB in 128000 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.ed9d394a851426
> format: 2
> features: layering
> flags:
>
> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
> (output cutted)
> rbd_data.ed9d394a851426.000000000000447c
> rbd_data.ed9d394a851426.0000000000010857
> rbd_data.ed9d394a851426.000000000000ec8b
> rbd_data.ed9d394a851426.000000000000fa43
> rbd_data.ed9d394a851426.000000000001ef2d
> ^C
>
> it hangs on this object and isn't going further. rbd cp also hangs...
> rbd map - also...
>
> can You advice what can be solution for this case ?
The hang is due to OSD throttling (see my first reply for how to wrok
around that and get a pg query). But you already did that and the cluster
told you which OSDs it needs to see up in order for it to peer and
recover. If you haven't destroyed those disks, you should start those
osds and it shoudl be fine. If you've destroyed the data or the disks are
truly broken and dead, then you can mark those OSDs lost and the cluster
*maybe* recover (but hard to say given the information you've shared).
sage
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 13:37 ` Sage Weil
@ 2017-05-24 13:58 ` Łukasz Chrustek
2017-05-24 14:02 ` Sage Weil
0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 13:58 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Cześć,
> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>>
>> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >>
>> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I
>> >> >> did, as You wrote, but turning off this
>> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>> >>
>> >> > The important bit is:
>> >>
>> >> > "blocked": "peering is blocked due to down osds",
>> >> > "down_osds_we_would_probe": [
>> >> > 6,
>> >> > 10,
>> >> > 33,
>> >> > 37,
>> >> > 72
>> >> > ],
>> >> > "peering_blocked_by": [
>> >> > {
>> >> > "osd": 6,
>> >> > "current_lost_at": 0,
>> >> > "comment": "starting or marking this osd lost may let
>> >> > us proceed"
>> >> > },
>> >> > {
>> >> > "osd": 10,
>> >> > "current_lost_at": 0,
>> >> > "comment": "starting or marking this osd lost may let
>> >> > us proceed"
>> >> > },
>> >> > {
>> >> > "osd": 37,
>> >> > "current_lost_at": 0,
>> >> > "comment": "starting or marking this osd lost may let
>> >> > us proceed"
>> >> > },
>> >> > {
>> >> > "osd": 72,
>> >> > "current_lost_at": 113771,
>> >> > "comment": "starting or marking this osd lost may let
>> >> > us proceed"
>> >> > }
>> >> > ]
>> >> > },
>> >>
>> >> > Are any of those OSDs startable?
>> >>
>> >> They were all up and running - but I decided to shut them down and out
>> >> them from ceph, now it looks like ceph working ok, but still two PGs
>> >> are in down state, how to get rid of it ?
>>
>> > If you haven't deleted the data, you should start the OSDs back up.
>>
>> > If they are partially damanged you can use ceph-objectstore-tool to
>> > extract just the PGs in question to make sure you haven't lost anything,
>> > inject them on some other OSD(s) and restart those, and *then* mark the
>> > bad OSDs as 'lost'.
>>
>> > If all else fails, you can just mark those OSDs 'lost', but in doing so
>> > you might be telling the cluster to lose data.
>>
>> > The best thing to do is definitely to get those OSDs started again.
>>
>> Now situation looks like this:
>>
>> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
>> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
>> size 500 GB in 128000 objects
>> order 22 (4096 kB objects)
>> block_name_prefix: rbd_data.ed9d394a851426
>> format: 2
>> features: layering
>> flags:
>>
>> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
>> (output cutted)
>> rbd_data.ed9d394a851426.000000000000447c
>> rbd_data.ed9d394a851426.0000000000010857
>> rbd_data.ed9d394a851426.000000000000ec8b
>> rbd_data.ed9d394a851426.000000000000fa43
>> rbd_data.ed9d394a851426.000000000001ef2d
>> ^C
>>
>> it hangs on this object and isn't going further. rbd cp also hangs...
>> rbd map - also...
>>
>> can You advice what can be solution for this case ?
> The hang is due to OSD throttling (see my first reply for how to wrok
> around that and get a pg query). But you already did that and the cluster
> told you which OSDs it needs to see up in order for it to peer and
> recover. If you haven't destroyed those disks, you should start those
> osds and it shoudl be fine. If you've destroyed the data or the disks are
> truly broken and dead, then you can mark those OSDs lost and the cluster
> *maybe* recover (but hard to say given the information you've shared).
> sage
What information I can bring to You to say it is recoverable ?
here are ceph -s and ceph health detail:
[root@cc1 ~]# ceph -s
cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
health HEALTH_WARN
2 pgs down
2 pgs peering
2 pgs stuck inactive
monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
election epoch 872, quorum 0,1,2 cc1,cc2,cc3
osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs
pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects
76705 GB used, 107 TB / 182 TB avail
4030 active+clean
1 down+remapped+peering
1 down+peering
client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr
[root@cc1 ~]# ceph health detail
HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48]
pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
pg 1.60 is down+remapped+peering, acting [66,40]
pg 1.165 is down+peering, acting [67,88,48]
[root@cc1 ~]#
--
Regards,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 13:58 ` Łukasz Chrustek
@ 2017-05-24 14:02 ` Sage Weil
2017-05-24 14:18 ` Łukasz Chrustek
0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-24 14:02 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 5806 bytes --]
On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Cześć,
>
> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >>
> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> Cześć,
> >> >>
> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I
> >> >> >> did, as You wrote, but turning off this
> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
> >> >>
> >> >> > The important bit is:
> >> >>
> >> >> > "blocked": "peering is blocked due to down osds",
> >> >> > "down_osds_we_would_probe": [
> >> >> > 6,
> >> >> > 10,
> >> >> > 33,
> >> >> > 37,
> >> >> > 72
> >> >> > ],
> >> >> > "peering_blocked_by": [
> >> >> > {
> >> >> > "osd": 6,
> >> >> > "current_lost_at": 0,
> >> >> > "comment": "starting or marking this osd lost may let
> >> >> > us proceed"
> >> >> > },
> >> >> > {
> >> >> > "osd": 10,
> >> >> > "current_lost_at": 0,
> >> >> > "comment": "starting or marking this osd lost may let
> >> >> > us proceed"
> >> >> > },
> >> >> > {
> >> >> > "osd": 37,
> >> >> > "current_lost_at": 0,
> >> >> > "comment": "starting or marking this osd lost may let
> >> >> > us proceed"
> >> >> > },
> >> >> > {
> >> >> > "osd": 72,
> >> >> > "current_lost_at": 113771,
> >> >> > "comment": "starting or marking this osd lost may let
> >> >> > us proceed"
These are the osds (6, 10, 37, 72).
> >> >> > }
> >> >> > ]
> >> >> > },
> >> >>
> >> >> > Are any of those OSDs startable?
This
> >> >>
> >> >> They were all up and running - but I decided to shut them down and out
> >> >> them from ceph, now it looks like ceph working ok, but still two PGs
> >> >> are in down state, how to get rid of it ?
> >>
> >> > If you haven't deleted the data, you should start the OSDs back up.
This
> >>
> >> > If they are partially damanged you can use ceph-objectstore-tool to
> >> > extract just the PGs in question to make sure you haven't lost anything,
> >> > inject them on some other OSD(s) and restart those, and *then* mark the
> >> > bad OSDs as 'lost'.
> >>
> >> > If all else fails, you can just mark those OSDs 'lost', but in doing so
> >> > you might be telling the cluster to lose data.
> >>
> >> > The best thing to do is definitely to get those OSDs started again.
This
> >>
> >> Now situation looks like this:
> >>
> >> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
> >> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
> >> size 500 GB in 128000 objects
> >> order 22 (4096 kB objects)
> >> block_name_prefix: rbd_data.ed9d394a851426
> >> format: 2
> >> features: layering
> >> flags:
> >>
> >> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
> >> (output cutted)
> >> rbd_data.ed9d394a851426.000000000000447c
> >> rbd_data.ed9d394a851426.0000000000010857
> >> rbd_data.ed9d394a851426.000000000000ec8b
> >> rbd_data.ed9d394a851426.000000000000fa43
> >> rbd_data.ed9d394a851426.000000000001ef2d
> >> ^C
> >>
> >> it hangs on this object and isn't going further. rbd cp also hangs...
> >> rbd map - also...
> >>
> >> can You advice what can be solution for this case ?
>
> > The hang is due to OSD throttling (see my first reply for how to wrok
> > around that and get a pg query). But you already did that and the cluster
> > told you which OSDs it needs to see up in order for it to peer and
> > recover. If you haven't destroyed those disks, you should start those
> > osds and it shoudl be fine. If you've destroyed the data or the disks are
> > truly broken and dead, then you can mark those OSDs lost and the cluster
> > *maybe* recover (but hard to say given the information you've shared).
This
>
> > sage
>
> What information I can bring to You to say it is recoverable ?
>
> here are ceph -s and ceph health detail:
>
> [root@cc1 ~]# ceph -s
> cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
> health HEALTH_WARN
> 2 pgs down
> 2 pgs peering
> 2 pgs stuck inactive
> monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
> election epoch 872, quorum 0,1,2 cc1,cc2,cc3
> osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs
> pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects
> 76705 GB used, 107 TB / 182 TB avail
> 4030 active+clean
> 1 down+remapped+peering
> 1 down+peering
> client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr
> [root@cc1 ~]# ceph health detail
> HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
> pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48]
> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
> pg 1.60 is down+remapped+peering, acting [66,40]
> pg 1.165 is down+peering, acting [67,88,48]
> [root@cc1 ~]#
>
> --
> Regards,
> Łukasz Chrustek
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 14:02 ` Sage Weil
@ 2017-05-24 14:18 ` Łukasz Chrustek
2017-05-24 14:47 ` Sage Weil
0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 14:18 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Cześć,
> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>>
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >>
>> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> Cześć,
>> >> >>
>> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I
>> >> >> >> did, as You wrote, but turning off this
>> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>> >> >>
>> >> >> > The important bit is:
>> >> >>
>> >> >> > "blocked": "peering is blocked due to down osds",
>> >> >> > "down_osds_we_would_probe": [
>> >> >> > 6,
>> >> >> > 10,
>> >> >> > 33,
>> >> >> > 37,
>> >> >> > 72
>> >> >> > ],
>> >> >> > "peering_blocked_by": [
>> >> >> > {
>> >> >> > "osd": 6,
>> >> >> > "current_lost_at": 0,
>> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> > us proceed"
>> >> >> > },
>> >> >> > {
>> >> >> > "osd": 10,
>> >> >> > "current_lost_at": 0,
>> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> > us proceed"
>> >> >> > },
>> >> >> > {
>> >> >> > "osd": 37,
>> >> >> > "current_lost_at": 0,
>> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> > us proceed"
>> >> >> > },
>> >> >> > {
>> >> >> > "osd": 72,
>> >> >> > "current_lost_at": 113771,
>> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> > us proceed"
> These are the osds (6, 10, 37, 72).
>> >> >> > }
>> >> >> > ]
>> >> >> > },
>> >> >>
>> >> >> > Are any of those OSDs startable?
> This
osd 6 - isn't startable
osd 10, 37, 72 are startable
>> >> >>
>> >> >> They were all up and running - but I decided to shut them down and out
>> >> >> them from ceph, now it looks like ceph working ok, but still two PGs
>> >> >> are in down state, how to get rid of it ?
>> >>
>> >> > If you haven't deleted the data, you should start the OSDs back up.
> This
By OSDs backup You mean copy /var/lib/ceph/osd/ceph-72/* to some other
(non ceph) disk ?
>> >>
>> >> > If they are partially damanged you can use ceph-objectstore-tool to
>> >> > extract just the PGs in question to make sure you haven't lost anything,
>> >> > inject them on some other OSD(s) and restart those, and *then* mark the
>> >> > bad OSDs as 'lost'.
>> >>
>> >> > If all else fails, you can just mark those OSDs 'lost', but in doing so
>> >> > you might be telling the cluster to lose data.
>> >>
>> >> > The best thing to do is definitely to get those OSDs started again.
> This
There were actions on this PGs, that make them destroy. I started this
osds (these three, which are startable) - this dosn't solved
situation. I need to add, that on this cluster are other pools, only
with pool with broken/down PGs is problem.
>> >>
>> >> Now situation looks like this:
>> >>
>> >> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
>> >> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
>> >> size 500 GB in 128000 objects
>> >> order 22 (4096 kB objects)
>> >> block_name_prefix: rbd_data.ed9d394a851426
>> >> format: 2
>> >> features: layering
>> >> flags:
>> >>
>> >> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
>> >> (output cutted)
>> >> rbd_data.ed9d394a851426.000000000000447c
>> >> rbd_data.ed9d394a851426.0000000000010857
>> >> rbd_data.ed9d394a851426.000000000000ec8b
>> >> rbd_data.ed9d394a851426.000000000000fa43
>> >> rbd_data.ed9d394a851426.000000000001ef2d
>> >> ^C
>> >>
>> >> it hangs on this object and isn't going further. rbd cp also hangs...
>> >> rbd map - also...
>> >>
>> >> can You advice what can be solution for this case ?
>>
>> > The hang is due to OSD throttling (see my first reply for how to wrok
>> > around that and get a pg query). But you already did that and the cluster
>> > told you which OSDs it needs to see up in order for it to peer and
>> > recover. If you haven't destroyed those disks, you should start those
>> > osds and it shoudl be fine. If you've destroyed the data or the disks are
>> > truly broken and dead, then you can mark those OSDs lost and the cluster
>> > *maybe* recover (but hard to say given the information you've shared).
> This
[root@cc1 ~]# ceph osd lost 10 --yes-i-really-mean-it
marked osd lost in epoch 115310
[root@cc1 ~]# ceph osd lost 37 --yes-i-really-mean-it
marked osd lost in epoch 115314
[root@cc1 ~]# ceph osd lost 72 --yes-i-really-mean-it
marked osd lost in epoch 115317
[root@cc1 ~]# ceph -s
cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
health HEALTH_WARN
2 pgs down
2 pgs peering
2 pgs stuck inactive
monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
election epoch 872, quorum 0,1,2 cc1,cc2,cc3
osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs
pgmap v67642483: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects
76718 GB used, 107 TB / 182 TB avail
4030 active+clean
1 down+remapped+peering
1 down+peering
client io 14624 kB/s rd, 31619 kB/s wr, 382 op/s rd, 228 op/s wr
[root@cc1 ~]# ceph -s
cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
health HEALTH_WARN
2 pgs down
2 pgs peering
2 pgs stuck inactive
monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
election epoch 872, quorum 0,1,2 cc1,cc2,cc3
osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs
pgmap v67642485: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects
76718 GB used, 107 TB / 182 TB avail
4030 active+clean
1 down+remapped+peering
1 down+peering
client io 17805 kB/s rd, 18787 kB/s wr, 215 op/s rd, 107 op/s wr
>>
>> > sage
>>
>> What information I can bring to You to say it is recoverable ?
>>
>> here are ceph -s and ceph health detail:
>>
>> [root@cc1 ~]# ceph -s
>> cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
>> health HEALTH_WARN
>> 2 pgs down
>> 2 pgs peering
>> 2 pgs stuck inactive
>> monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
>> election epoch 872, quorum 0,1,2 cc1,cc2,cc3
>> osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs
>> pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects
>> 76705 GB used, 107 TB / 182 TB avail
>> 4030 active+clean
>> 1 down+remapped+peering
>> 1 down+peering
>> client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr
>> [root@cc1 ~]# ceph health detail
>> HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
>> pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48]
>> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
>> pg 1.60 is down+remapped+peering, acting [66,40]
>> pg 1.165 is down+peering, acting [67,88,48]
>> [root@cc1 ~]#
>>
>> --
>> Regards,
>> Łukasz Chrustek
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
--
Pozdrowienia,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 14:18 ` Łukasz Chrustek
@ 2017-05-24 14:47 ` Sage Weil
2017-05-24 15:00 ` Łukasz Chrustek
2017-05-24 21:38 ` Łukasz Chrustek
0 siblings, 2 replies; 35+ messages in thread
From: Sage Weil @ 2017-05-24 14:47 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 9357 bytes --]
On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Cześć,
>
> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >>
> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> >> Cześć,
> >> >>
> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> >> Cześć,
> >> >> >>
> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I
> >> >> >> >> did, as You wrote, but turning off this
> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
> >> >> >>
> >> >> >> > The important bit is:
> >> >> >>
> >> >> >> > "blocked": "peering is blocked due to down osds",
> >> >> >> > "down_osds_we_would_probe": [
> >> >> >> > 6,
> >> >> >> > 10,
> >> >> >> > 33,
> >> >> >> > 37,
> >> >> >> > 72
> >> >> >> > ],
> >> >> >> > "peering_blocked_by": [
> >> >> >> > {
> >> >> >> > "osd": 6,
> >> >> >> > "current_lost_at": 0,
> >> >> >> > "comment": "starting or marking this osd lost may let
> >> >> >> > us proceed"
> >> >> >> > },
> >> >> >> > {
> >> >> >> > "osd": 10,
> >> >> >> > "current_lost_at": 0,
> >> >> >> > "comment": "starting or marking this osd lost may let
> >> >> >> > us proceed"
> >> >> >> > },
> >> >> >> > {
> >> >> >> > "osd": 37,
> >> >> >> > "current_lost_at": 0,
> >> >> >> > "comment": "starting or marking this osd lost may let
> >> >> >> > us proceed"
> >> >> >> > },
> >> >> >> > {
> >> >> >> > "osd": 72,
> >> >> >> > "current_lost_at": 113771,
> >> >> >> > "comment": "starting or marking this osd lost may let
> >> >> >> > us proceed"
>
> > These are the osds (6, 10, 37, 72).
>
> >> >> >> > }
> >> >> >> > ]
> >> >> >> > },
> >> >> >>
> >> >> >> > Are any of those OSDs startable?
>
> > This
>
> osd 6 - isn't startable
Disk completely 100% dead, or just borken enough that ceph-osd won't
start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
from this osd to recover any important writes on that osd.
> osd 10, 37, 72 are startable
With those started, I'd repeat the original sequence and get a fresh pg
query to confirm that it still wants just osd.6.
use ceph-objectstore-tool to export the pg from osd.6, stop some other
ranodm osd (not one of these ones), import the pg into that osd, and start
again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
point. repeat with the same basic process with the other pg.
s
>
> >> >> >>
> >> >> >> They were all up and running - but I decided to shut them down and out
> >> >> >> them from ceph, now it looks like ceph working ok, but still two PGs
> >> >> >> are in down state, how to get rid of it ?
> >> >>
> >> >> > If you haven't deleted the data, you should start the OSDs back up.
>
> > This
>
> By OSDs backup You mean copy /var/lib/ceph/osd/ceph-72/* to some other
> (non ceph) disk ?
>
> >> >>
> >> >> > If they are partially damanged you can use ceph-objectstore-tool to
> >> >> > extract just the PGs in question to make sure you haven't lost anything,
> >> >> > inject them on some other OSD(s) and restart those, and *then* mark the
> >> >> > bad OSDs as 'lost'.
> >> >>
> >> >> > If all else fails, you can just mark those OSDs 'lost', but in doing so
> >> >> > you might be telling the cluster to lose data.
> >> >>
> >> >> > The best thing to do is definitely to get those OSDs started again.
>
> > This
>
> There were actions on this PGs, that make them destroy. I started this
> osds (these three, which are startable) - this dosn't solved
> situation. I need to add, that on this cluster are other pools, only
> with pool with broken/down PGs is problem.
> >> >>
> >> >> Now situation looks like this:
> >> >>
> >> >> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
> >> >> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
> >> >> size 500 GB in 128000 objects
> >> >> order 22 (4096 kB objects)
> >> >> block_name_prefix: rbd_data.ed9d394a851426
> >> >> format: 2
> >> >> features: layering
> >> >> flags:
> >> >>
> >> >> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
> >> >> (output cutted)
> >> >> rbd_data.ed9d394a851426.000000000000447c
> >> >> rbd_data.ed9d394a851426.0000000000010857
> >> >> rbd_data.ed9d394a851426.000000000000ec8b
> >> >> rbd_data.ed9d394a851426.000000000000fa43
> >> >> rbd_data.ed9d394a851426.000000000001ef2d
> >> >> ^C
> >> >>
> >> >> it hangs on this object and isn't going further. rbd cp also hangs...
> >> >> rbd map - also...
> >> >>
> >> >> can You advice what can be solution for this case ?
> >>
> >> > The hang is due to OSD throttling (see my first reply for how to wrok
> >> > around that and get a pg query). But you already did that and the cluster
> >> > told you which OSDs it needs to see up in order for it to peer and
> >> > recover. If you haven't destroyed those disks, you should start those
>
> >> > osds and it shoudl be fine. If you've destroyed the data or the disks are
> >> > truly broken and dead, then you can mark those OSDs lost and the cluster
> >> > *maybe* recover (but hard to say given the information you've shared).
>
> > This
>
>
> [root@cc1 ~]# ceph osd lost 10 --yes-i-really-mean-it
> marked osd lost in epoch 115310
> [root@cc1 ~]# ceph osd lost 37 --yes-i-really-mean-it
> marked osd lost in epoch 115314
> [root@cc1 ~]# ceph osd lost 72 --yes-i-really-mean-it
> marked osd lost in epoch 115317
> [root@cc1 ~]# ceph -s
> cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
> health HEALTH_WARN
> 2 pgs down
> 2 pgs peering
> 2 pgs stuck inactive
> monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
> election epoch 872, quorum 0,1,2 cc1,cc2,cc3
> osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs
> pgmap v67642483: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects
> 76718 GB used, 107 TB / 182 TB avail
> 4030 active+clean
> 1 down+remapped+peering
> 1 down+peering
> client io 14624 kB/s rd, 31619 kB/s wr, 382 op/s rd, 228 op/s wr
> [root@cc1 ~]# ceph -s
> cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
> health HEALTH_WARN
> 2 pgs down
> 2 pgs peering
> 2 pgs stuck inactive
> monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
> election epoch 872, quorum 0,1,2 cc1,cc2,cc3
> osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs
> pgmap v67642485: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects
> 76718 GB used, 107 TB / 182 TB avail
> 4030 active+clean
> 1 down+remapped+peering
> 1 down+peering
> client io 17805 kB/s rd, 18787 kB/s wr, 215 op/s rd, 107 op/s wr
>
> >>
> >> > sage
> >>
> >> What information I can bring to You to say it is recoverable ?
> >>
> >> here are ceph -s and ceph health detail:
> >>
> >> [root@cc1 ~]# ceph -s
> >> cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
> >> health HEALTH_WARN
> >> 2 pgs down
> >> 2 pgs peering
> >> 2 pgs stuck inactive
> >> monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
> >> election epoch 872, quorum 0,1,2 cc1,cc2,cc3
> >> osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs
> >> pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects
> >> 76705 GB used, 107 TB / 182 TB avail
> >> 4030 active+clean
> >> 1 down+remapped+peering
> >> 1 down+peering
> >> client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr
> >> [root@cc1 ~]# ceph health detail
> >> HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
> >> pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48]
> >> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
> >> pg 1.60 is down+remapped+peering, acting [66,40]
> >> pg 1.165 is down+peering, acting [67,88,48]
> >> [root@cc1 ~]#
> >>
> >> --
> >> Regards,
> >> Łukasz Chrustek
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >>
>
>
>
> --
> Pozdrowienia,
> Łukasz Chrustek
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 14:47 ` Sage Weil
@ 2017-05-24 15:00 ` Łukasz Chrustek
2017-05-24 15:07 ` Łukasz Chrustek
2017-05-24 15:11 ` Sage Weil
2017-05-24 21:38 ` Łukasz Chrustek
1 sibling, 2 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 15:00 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hello,
> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>>
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >>
>> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> Cześć,
>> >> >>
>> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> Cześć,
>> >> >> >>
>> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I
>> >> >> >> >> did, as You wrote, but turning off this
>> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>> >> >> >>
>> >> >> >> > The important bit is:
>> >> >> >>
>> >> >> >> > "blocked": "peering is blocked due to down osds",
>> >> >> >> > "down_osds_we_would_probe": [
>> >> >> >> > 6,
>> >> >> >> > 10,
>> >> >> >> > 33,
>> >> >> >> > 37,
>> >> >> >> > 72
>> >> >> >> > ],
>> >> >> >> > "peering_blocked_by": [
>> >> >> >> > {
>> >> >> >> > "osd": 6,
>> >> >> >> > "current_lost_at": 0,
>> >> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> >> > us proceed"
>> >> >> >> > },
>> >> >> >> > {
>> >> >> >> > "osd": 10,
>> >> >> >> > "current_lost_at": 0,
>> >> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> >> > us proceed"
>> >> >> >> > },
>> >> >> >> > {
>> >> >> >> > "osd": 37,
>> >> >> >> > "current_lost_at": 0,
>> >> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> >> > us proceed"
>> >> >> >> > },
>> >> >> >> > {
>> >> >> >> > "osd": 72,
>> >> >> >> > "current_lost_at": 113771,
>> >> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> >> > us proceed"
>>
>> > These are the osds (6, 10, 37, 72).
>>
>> >> >> >> > }
>> >> >> >> > ]
>> >> >> >> > },
>> >> >> >>
>> >> >> >> > Are any of those OSDs startable?
>>
>> > This
>>
>> osd 6 - isn't startable
> Disk completely 100% dead, or just borken enough that ceph-osd won't
> start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> from this osd to recover any important writes on that osd.
2017-05-24 11:21:23.341938 7f6830a36940 0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375
2017-05-24 11:21:23.350180 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e)
2017-05-24 11:21:23.350610 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work
2017-05-24 11:21:23.350617 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2017-05-24 11:21:23.350633 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported
2017-05-24 11:21:23.351897 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2017-05-24 11:21:23.351951 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported
2017-05-24 11:21:23.351970 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists
2017-05-24 11:21:23.351981 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported
2017-05-24 11:21:23.351984 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
2017-05-24 11:21:23.351987 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed
2017-05-24 11:21:23.351996 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING
2017-05-24 11:21:23.352573 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877)
2017-05-24 11:21:23.353001 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported
2017-05-24 11:21:23.353012 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test
2017-05-24 11:21:23.353016 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted
2017-05-24 11:21:23.353021 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported
2017-05-24 11:21:23.353022 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
2017-05-24 11:21:23.353027 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted
2017-05-24 11:21:23.355156 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock
2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m
it is all I get for this osd in logs, when I try to start it.
>> osd 10, 37, 72 are startable
> With those started, I'd repeat the original sequence and get a fresh pg
> query to confirm that it still wants just osd.6.
You mean about procedure with loop and taking down OSDs, which broken
PGs are pointing to ?
pg 1.60 is down+remapped+peering, acting [66,40]
pg 1.165 is down+peering, acting [67,88,48]
for pg 1.60 <--> 66 down, then in loop check pg query ?
> use ceph-objectstore-tool to export the pg from osd.6, stop some other
> ranodm osd (not one of these ones), import the pg into that osd, and start
> again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
> point. repeat with the same basic process with the other pg.
I have already did 'ceph osd lost 6', do I need to do this once again ?
--
Regards
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 15:00 ` Łukasz Chrustek
@ 2017-05-24 15:07 ` Łukasz Chrustek
2017-05-24 15:11 ` Sage Weil
1 sibling, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 15:07 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
> it is all I get for this osd in logs, when I try to start it.
>>> osd 10, 37, 72 are startable
>> With those started, I'd repeat the original sequence and get a fresh pg
>> query to confirm that it still wants just osd.6.
> You mean about procedure with loop and taking down OSDs, which broken
> PGs are pointing to ?
> pg 1.60 is down+remapped+peering, acting [66,40]
> pg 1.165 is down+peering, acting [67,88,48]
> for pg 1.60 <--> 66 down, then in loop check pg query ?
>> use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> ranodm osd (not one of these ones), import the pg into that osd, and start
>> again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
>> point. repeat with the same basic process with the other pg.
> I have already did 'ceph osd lost 6', do I need to do this once again ?
/dev/sdb1 3,7T 34M 3,7T 1% /var/lib/ceph/osd/ceph-6
this disk have no data, they where migrated, when this osd was able to
be up.
--
Regards,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 15:00 ` Łukasz Chrustek
2017-05-24 15:07 ` Łukasz Chrustek
@ 2017-05-24 15:11 ` Sage Weil
2017-05-24 15:24 ` Łukasz Chrustek
2017-05-24 15:54 ` Łukasz Chrustek
1 sibling, 2 replies; 35+ messages in thread
From: Sage Weil @ 2017-05-24 15:11 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 7223 bytes --]
On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Hello,
>
> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >>
> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> >> Cześć,
> >> >>
> >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> >> >> Cześć,
> >> >> >>
> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> >> >> Cześć,
> >> >> >> >>
> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I
> >> >> >> >> >> did, as You wrote, but turning off this
> >> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
> >> >> >> >>
> >> >> >> >> > The important bit is:
> >> >> >> >>
> >> >> >> >> > "blocked": "peering is blocked due to down osds",
> >> >> >> >> > "down_osds_we_would_probe": [
> >> >> >> >> > 6,
> >> >> >> >> > 10,
> >> >> >> >> > 33,
> >> >> >> >> > 37,
> >> >> >> >> > 72
> >> >> >> >> > ],
> >> >> >> >> > "peering_blocked_by": [
> >> >> >> >> > {
> >> >> >> >> > "osd": 6,
> >> >> >> >> > "current_lost_at": 0,
> >> >> >> >> > "comment": "starting or marking this osd lost may let
> >> >> >> >> > us proceed"
> >> >> >> >> > },
> >> >> >> >> > {
> >> >> >> >> > "osd": 10,
> >> >> >> >> > "current_lost_at": 0,
> >> >> >> >> > "comment": "starting or marking this osd lost may let
> >> >> >> >> > us proceed"
> >> >> >> >> > },
> >> >> >> >> > {
> >> >> >> >> > "osd": 37,
> >> >> >> >> > "current_lost_at": 0,
> >> >> >> >> > "comment": "starting or marking this osd lost may let
> >> >> >> >> > us proceed"
> >> >> >> >> > },
> >> >> >> >> > {
> >> >> >> >> > "osd": 72,
> >> >> >> >> > "current_lost_at": 113771,
> >> >> >> >> > "comment": "starting or marking this osd lost may let
> >> >> >> >> > us proceed"
> >>
> >> > These are the osds (6, 10, 37, 72).
> >>
> >> >> >> >> > }
> >> >> >> >> > ]
> >> >> >> >> > },
> >> >> >> >>
> >> >> >> >> > Are any of those OSDs startable?
> >>
> >> > This
> >>
> >> osd 6 - isn't startable
>
> > Disk completely 100% dead, or just borken enough that ceph-osd won't
> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> > from this osd to recover any important writes on that osd.
>
> 2017-05-24 11:21:23.341938 7f6830a36940 0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375
> 2017-05-24 11:21:23.350180 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e)
> 2017-05-24 11:21:23.350610 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work
> 2017-05-24 11:21:23.350617 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
> 2017-05-24 11:21:23.350633 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported
> 2017-05-24 11:21:23.351897 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
> 2017-05-24 11:21:23.351951 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported
> 2017-05-24 11:21:23.351970 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists
> 2017-05-24 11:21:23.351981 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported
> 2017-05-24 11:21:23.351984 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
> 2017-05-24 11:21:23.351987 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed
> 2017-05-24 11:21:23.351996 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING
> 2017-05-24 11:21:23.352573 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877)
> 2017-05-24 11:21:23.353001 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported
> 2017-05-24 11:21:23.353012 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test
> 2017-05-24 11:21:23.353016 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted
> 2017-05-24 11:21:23.353021 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported
> 2017-05-24 11:21:23.353022 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
> 2017-05-24 11:21:23.353027 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted
> 2017-05-24 11:21:23.355156 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
> 2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
> 2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock
> 2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m
>
> it is all I get for this osd in logs, when I try to start it.
>
> >> osd 10, 37, 72 are startable
>
> > With those started, I'd repeat the original sequence and get a fresh pg
> > query to confirm that it still wants just osd.6.
>
> You mean about procedure with loop and taking down OSDs, which broken
> PGs are pointing to ?
> pg 1.60 is down+remapped+peering, acting [66,40]
> pg 1.165 is down+peering, acting [67,88,48]
>
> for pg 1.60 <--> 66 down, then in loop check pg query ?
Right.
> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> > ranodm osd (not one of these ones), import the pg into that osd, and start
> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
> > point. repeat with the same basic process with the other pg.
>
> I have already did 'ceph osd lost 6', do I need to do this once again ?
Hmm not sure, if the OSD is empty then there is no harm in doing it again.
Try that first since it might resolve it. If not, do the query loop
above.
s
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 15:11 ` Sage Weil
@ 2017-05-24 15:24 ` Łukasz Chrustek
2017-05-24 15:54 ` Łukasz Chrustek
1 sibling, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 15:24 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hello,
>>
>> >> osd 10, 37, 72 are startable
>>
>> > With those started, I'd repeat the original sequence and get a fresh pg
>> > query to confirm that it still wants just osd.6.
>>
>> You mean about procedure with loop and taking down OSDs, which broken
>> PGs are pointing to ?
>> pg 1.60 is down+remapped+peering, acting [66,40]
>> pg 1.165 is down+peering, acting [67,88,48]
>>
>> for pg 1.60 <--> 66 down, then in loop check pg query ?
> Right.
>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
>> > point. repeat with the same basic process with the other pg.
>>
>> I have already did 'ceph osd lost 6', do I need to do this once again ?
> Hmm not sure, if the OSD is empty then there is no harm in doing it again.
> Try that first since it might resolve it. If not, do the query loop
> above.
[root@cc1 ~]# ceph osd lost 6 --yes-i-really-mean-it
marked osd lost in epoch 113414
[root@cc1 ~]#
[root@cc1 ~]# ceph -s
cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
health HEALTH_WARN
2 pgs down
2 pgs peering
2 pgs stuck inactive
monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
election epoch 872, quorum 0,1,2 cc1,cc2,cc3
osdmap e115449: 100 osds: 88 up, 86 in; 1 remapped pgs
pgmap v67646402: 4032 pgs, 18 pools, 26733 GB data, 4862 kobjects
76759 GB used, 107 TB / 182 TB avail
4030 active+clean
1 down+peering
1 down+remapped+peering
client io 57154 kB/s rd, 1189 kB/s wr, 95 op/s
There is no action after marking again this osd as lost.
--
Regards,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 15:11 ` Sage Weil
2017-05-24 15:24 ` Łukasz Chrustek
@ 2017-05-24 15:54 ` Łukasz Chrustek
2017-05-24 16:02 ` Łukasz Chrustek
1 sibling, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 15:54 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hello,
> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> Hello,
>>
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >>
>> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> Cześć,
>> >> >>
>> >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> >> Cześć,
>> >> >> >>
>> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> >> Cześć,
>> >> >> >> >>
>> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> >> >> I'm not sleeping for over 30 hours, and still can't find solution. I
>> >> >> >> >> >> did, as You wrote, but turning off this
>> >> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>> >> >> >> >>
>> >> >> >> >> > The important bit is:
>> >> >> >> >>
>> >> >> >> >> > "blocked": "peering is blocked due to down osds",
>> >> >> >> >> > "down_osds_we_would_probe": [
>> >> >> >> >> > 6,
>> >> >> >> >> > 10,
>> >> >> >> >> > 33,
>> >> >> >> >> > 37,
>> >> >> >> >> > 72
>> >> >> >> >> > ],
>> >> >> >> >> > "peering_blocked_by": [
>> >> >> >> >> > {
>> >> >> >> >> > "osd": 6,
>> >> >> >> >> > "current_lost_at": 0,
>> >> >> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> > },
>> >> >> >> >> > {
>> >> >> >> >> > "osd": 10,
>> >> >> >> >> > "current_lost_at": 0,
>> >> >> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> > },
>> >> >> >> >> > {
>> >> >> >> >> > "osd": 37,
>> >> >> >> >> > "current_lost_at": 0,
>> >> >> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> > },
>> >> >> >> >> > {
>> >> >> >> >> > "osd": 72,
>> >> >> >> >> > "current_lost_at": 113771,
>> >> >> >> >> > "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >>
>> >> > These are the osds (6, 10, 37, 72).
>> >>
>> >> >> >> >> > }
>> >> >> >> >> > ]
>> >> >> >> >> > },
>> >> >> >> >>
>> >> >> >> >> > Are any of those OSDs startable?
>> >>
>> >> > This
>> >>
>> >> osd 6 - isn't startable
>>
>> > Disk completely 100% dead, or just borken enough that ceph-osd won't
>> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
>> > from this osd to recover any important writes on that osd.
>>
>> 2017-05-24 11:21:23.341938 7f6830a36940 0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375
>> 2017-05-24 11:21:23.350180 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e)
>> 2017-05-24 11:21:23.350610 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work
>> 2017-05-24 11:21:23.350617 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
>> 2017-05-24 11:21:23.350633 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported
>> 2017-05-24 11:21:23.351897 7f6830a36940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
>> 2017-05-24 11:21:23.351951 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported
>> 2017-05-24 11:21:23.351970 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists
>> 2017-05-24 11:21:23.351981 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported
>> 2017-05-24 11:21:23.351984 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
>> 2017-05-24 11:21:23.351987 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed
>> 2017-05-24 11:21:23.351996 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING
>> 2017-05-24 11:21:23.352573 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877)
>> 2017-05-24 11:21:23.353001 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported
>> 2017-05-24 11:21:23.353012 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test
>> 2017-05-24 11:21:23.353016 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted
>> 2017-05-24 11:21:23.353021 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported
>> 2017-05-24 11:21:23.353022 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
>> 2017-05-24 11:21:23.353027 7f6830a36940 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted
>> 2017-05-24 11:21:23.355156 7f6830a36940 0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
>> 2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
>> 2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock
>> 2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m
>>
>> it is all I get for this osd in logs, when I try to start it.
>>
>> >> osd 10, 37, 72 are startable
>>
>> > With those started, I'd repeat the original sequence and get a fresh pg
>> > query to confirm that it still wants just osd.6.
>>
>> You mean about procedure with loop and taking down OSDs, which broken
>> PGs are pointing to ?
>> pg 1.60 is down+remapped+peering, acting [66,40]
>> pg 1.165 is down+peering, acting [67,88,48]
>>
>> for pg 1.60 <--> 66 down, then in loop check pg query ?
> Right.
And now it is very weird.... I made osd.37 up, and loop
while true;do; ceph tell 1.165 query ;done
catch this:
https://pastebin.com/zKu06fJn
Can You tell, what is wrong now ?
>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
>> > point. repeat with the same basic process with the other pg.
>>
>> I have already did 'ceph osd lost 6', do I need to do this once again ?
> Hmm not sure, if the OSD is empty then there is no harm in doing it again.
> Try that first since it might resolve it. If not, do the query loop
> above.
> s
--
Regards,,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 15:54 ` Łukasz Chrustek
@ 2017-05-24 16:02 ` Łukasz Chrustek
2017-05-24 17:07 ` Łukasz Chrustek
0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 16:02 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hello,
> And now it is very weird.... I made osd.37 up, and loop
> while true;do; ceph tell 1.165 query ;done
Here need to explain more - all I did was start ceph-osd id=37 on
storage node, in ceph osd tree this osd osd is marked as out:
-17 21.49995 host stor8
22 1.59999 osd.22 up 1.00000 1.00000
23 1.59999 osd.23 up 1.00000 1.00000
36 2.09999 osd.36 up 1.00000 1.00000
37 2.09999 osd.37 up 0 1.00000
38 2.50000 osd.38 up 1.00000 1.00000
39 2.50000 osd.39 up 1.00000 1.00000
40 2.50000 osd.40 up 0 1.00000
41 2.50000 osd.41 down 0 1.00000
42 2.50000 osd.42 up 1.00000 1.00000
43 1.59999 osd.43 up 1.00000 1.00000
after start of this osd, ceph tell 1.165 query worked only for one call of this command
> catch this:
> https://pastebin.com/zKu06fJn
> Can You tell, what is wrong now ?
>>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>>> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
>>> > point. repeat with the same basic process with the other pg.
>>>
>>> I have already did 'ceph osd lost 6', do I need to do this once again ?
>> Hmm not sure, if the OSD is empty then there is no harm in doing it again.
>> Try that first since it might resolve it. If not, do the query loop
>> above.
>> s
--
Pozdrowienia,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 16:02 ` Łukasz Chrustek
@ 2017-05-24 17:07 ` Łukasz Chrustek
2017-05-24 17:16 ` Sage Weil
0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 17:07 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
>> And now it is very weird.... I made osd.37 up, and loop
>> while true;do; ceph tell 1.165 query ;done
> Here need to explain more - all I did was start ceph-osd id=37 on
> storage node, in ceph osd tree this osd osd is marked as out:
> -17 21.49995 host stor8
> 22 1.59999 osd.22 up 1.00000 1.00000
> 23 1.59999 osd.23 up 1.00000 1.00000
> 36 2.09999 osd.36 up 1.00000 1.00000
> 37 2.09999 osd.37 up 0 1.00000
> 38 2.50000 osd.38 up 1.00000 1.00000
> 39 2.50000 osd.39 up 1.00000 1.00000
> 40 2.50000 osd.40 up 0 1.00000
> 41 2.50000 osd.41 down 0 1.00000
> 42 2.50000 osd.42 up 1.00000 1.00000
> 43 1.59999 osd.43 up 1.00000 1.00000
> after start of this osd, ceph tell 1.165 query worked only for one call of this command
>> catch this:
>> https://pastebin.com/zKu06fJn
here is for pg 1.60:
https://pastebin.com/Xuk5iFXr
>> Can You tell, what is wrong now ?
>>>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>>>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>>>> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
>>>> > point. repeat with the same basic process with the other pg.
>>>>
>>>> I have already did 'ceph osd lost 6', do I need to do this once again ?
>>> Hmm not sure, if the OSD is empty then there is no harm in doing it again.
>>> Try that first since it might resolve it. If not, do the query loop
>>> above.
>>> s
--
Pozdrowienia,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 17:07 ` Łukasz Chrustek
@ 2017-05-24 17:16 ` Sage Weil
2017-05-24 17:28 ` Łukasz Chrustek
2017-05-24 17:30 ` Łukasz Chrustek
0 siblings, 2 replies; 35+ messages in thread
From: Sage Weil @ 2017-05-24 17:16 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1424 bytes --]
On Wed, 24 May 2017, Łukasz Chrustek wrote:
>
> >> And now it is very weird.... I made osd.37 up, and loop
> >> while true;do; ceph tell 1.165 query ;done
>
> > Here need to explain more - all I did was start ceph-osd id=37 on
> > storage node, in ceph osd tree this osd osd is marked as out:
>
>
> > -17 21.49995 host stor8
> > 22 1.59999 osd.22 up 1.00000 1.00000
> > 23 1.59999 osd.23 up 1.00000 1.00000
> > 36 2.09999 osd.36 up 1.00000 1.00000
> > 37 2.09999 osd.37 up 0 1.00000
> > 38 2.50000 osd.38 up 1.00000 1.00000
> > 39 2.50000 osd.39 up 1.00000 1.00000
> > 40 2.50000 osd.40 up 0 1.00000
> > 41 2.50000 osd.41 down 0 1.00000
> > 42 2.50000 osd.42 up 1.00000 1.00000
> > 43 1.59999 osd.43 up 1.00000 1.00000
>
> > after start of this osd, ceph tell 1.165 query worked only for one call of this command
> >> catch this:
>
> >> https://pastebin.com/zKu06fJn
>
> here is for pg 1.60:
>
> https://pastebin.com/Xuk5iFXr
Look at the bottom, after it says
"blocked": "peering is blocked due to down osds",
Did the 1.165 pg recover?
sage
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 17:16 ` Sage Weil
@ 2017-05-24 17:28 ` Łukasz Chrustek
2017-05-24 18:16 ` Sage Weil
2017-05-24 17:30 ` Łukasz Chrustek
1 sibling, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 17:28 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Cześć,
> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>>
>> >> And now it is very weird.... I made osd.37 up, and loop
>> >> while true;do; ceph tell 1.165 query ;done
>>
>> > Here need to explain more - all I did was start ceph-osd id=37 on
>> > storage node, in ceph osd tree this osd osd is marked as out:
>>
>>
>> > -17 21.49995 host stor8
>> > 22 1.59999 osd.22 up 1.00000 1.00000
>> > 23 1.59999 osd.23 up 1.00000 1.00000
>> > 36 2.09999 osd.36 up 1.00000 1.00000
>> > 37 2.09999 osd.37 up 0 1.00000
>> > 38 2.50000 osd.38 up 1.00000 1.00000
>> > 39 2.50000 osd.39 up 1.00000 1.00000
>> > 40 2.50000 osd.40 up 0 1.00000
>> > 41 2.50000 osd.41 down 0 1.00000
>> > 42 2.50000 osd.42 up 1.00000 1.00000
>> > 43 1.59999 osd.43 up 1.00000 1.00000
>>
>> > after start of this osd, ceph tell 1.165 query worked only for one call of this command
>> >> catch this:
>>
>> >> https://pastebin.com/zKu06fJn
>>
>> here is for pg 1.60:
>>
>> https://pastebin.com/Xuk5iFXr
> Look at the bottom, after it says
> "blocked": "peering is blocked due to down osds",
> Did the 1.165 pg recover?
No it didn't:
[root@cc1 ~]# ceph health detail
HEALTH_WARN 1 pgs down; 1 pgs incomplete; 1 pgs peering; 2 pgs stuck inactive
pg 1.165 is stuck inactive since forever, current state incomplete, last acting [67,88,48]
pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [68]
pg 1.60 is down+remapped+peering, acting [68]
pg 1.165 is incomplete, acting [67,88,48]
[root@cc1 ~]#
--
Pozdrowienia,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 17:16 ` Sage Weil
2017-05-24 17:28 ` Łukasz Chrustek
@ 2017-05-24 17:30 ` Łukasz Chrustek
2017-05-24 17:35 ` Łukasz Chrustek
1 sibling, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 17:30 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Cześć,
> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>>
>> >> And now it is very weird.... I made osd.37 up, and loop
>> >> while true;do; ceph tell 1.165 query ;done
>>
>> > Here need to explain more - all I did was start ceph-osd id=37 on
>> > storage node, in ceph osd tree this osd osd is marked as out:
>>
>>
>> > -17 21.49995 host stor8
>> > 22 1.59999 osd.22 up 1.00000 1.00000
>> > 23 1.59999 osd.23 up 1.00000 1.00000
>> > 36 2.09999 osd.36 up 1.00000 1.00000
>> > 37 2.09999 osd.37 up 0 1.00000
>> > 38 2.50000 osd.38 up 1.00000 1.00000
>> > 39 2.50000 osd.39 up 1.00000 1.00000
>> > 40 2.50000 osd.40 up 0 1.00000
>> > 41 2.50000 osd.41 down 0 1.00000
>> > 42 2.50000 osd.42 up 1.00000 1.00000
>> > 43 1.59999 osd.43 up 1.00000 1.00000
>>
>> > after start of this osd, ceph tell 1.165 query worked only for one call of this command
>> >> catch this:
>>
>> >> https://pastebin.com/zKu06fJn
>>
>> here is for pg 1.60:
>>
>> https://pastebin.com/Xuk5iFXr
> Look at the bottom, after it says
> "blocked": "peering is blocked due to down osds",
for pg 1.60: all osds was down, when ceph tell 1.60 query catch one
'interrupt'.
> Did the 1.165 pg recover?
> sage
--
Pozdrowienia,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 17:30 ` Łukasz Chrustek
@ 2017-05-24 17:35 ` Łukasz Chrustek
0 siblings, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 17:35 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hello,
>> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>>>
>>> >> And now it is very weird.... I made osd.37 up, and loop
>>> >> while true;do; ceph tell 1.165 query ;done
>>>
>>> > Here need to explain more - all I did was start ceph-osd id=37 on
>>> > storage node, in ceph osd tree this osd osd is marked as out:
>>>
>>>
>>> > -17 21.49995 host stor8
>>> > 22 1.59999 osd.22 up 1.00000 1.00000
>>> > 23 1.59999 osd.23 up 1.00000 1.00000
>>> > 36 2.09999 osd.36 up 1.00000 1.00000
>>> > 37 2.09999 osd.37 up 0 1.00000
>>> > 38 2.50000 osd.38 up 1.00000 1.00000
>>> > 39 2.50000 osd.39 up 1.00000 1.00000
>>> > 40 2.50000 osd.40 up 0 1.00000
>>> > 41 2.50000 osd.41 down 0 1.00000
>>> > 42 2.50000 osd.42 up 1.00000 1.00000
>>> > 43 1.59999 osd.43 up 1.00000 1.00000
>>>
>>> > after start of this osd, ceph tell 1.165 query worked only for one call of this command
>>> >> catch this:
>>>
>>> >> https://pastebin.com/zKu06fJn
>>>
>>> here is for pg 1.60:
>>>
>>> https://pastebin.com/Xuk5iFXr
>> Look at the bottom, after it says
>> "blocked": "peering is blocked due to down osds",
> for pg 1.60: all osds was down, when ceph tell 1.60 query catch one
> 'interrupt'.
when I'm trying to use ceph-objectstore-tool I get:
[root@stor3 ~]# ceph-objectstore-tool --op export --pgid 1.60 --data-path /mnt --journal-path /mnt/journal --file 1.60.export
Mount failed with '(95) Operation not supported'
[root@stor3 ~]# du -sh /mnt/current/1.60_head
276M /mnt/current/1.60_head
[root@stor3 ~]# ls -al /mnt/current/1.60_head | wc -l
49
[root@stor3 ~]#
--
Regards,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 17:28 ` Łukasz Chrustek
@ 2017-05-24 18:16 ` Sage Weil
2017-05-24 19:47 ` Łukasz Chrustek
0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-24 18:16 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2211 bytes --]
On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Cześć,
>
> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >>
> >> >> And now it is very weird.... I made osd.37 up, and loop
> >> >> while true;do; ceph tell 1.165 query ;done
> >>
> >> > Here need to explain more - all I did was start ceph-osd id=37 on
> >> > storage node, in ceph osd tree this osd osd is marked as out:
> >>
> >>
> >> > -17 21.49995 host stor8
> >> > 22 1.59999 osd.22 up 1.00000 1.00000
> >> > 23 1.59999 osd.23 up 1.00000 1.00000
> >> > 36 2.09999 osd.36 up 1.00000 1.00000
> >> > 37 2.09999 osd.37 up 0 1.00000
> >> > 38 2.50000 osd.38 up 1.00000 1.00000
> >> > 39 2.50000 osd.39 up 1.00000 1.00000
> >> > 40 2.50000 osd.40 up 0 1.00000
> >> > 41 2.50000 osd.41 down 0 1.00000
> >> > 42 2.50000 osd.42 up 1.00000 1.00000
> >> > 43 1.59999 osd.43 up 1.00000 1.00000
> >>
> >> > after start of this osd, ceph tell 1.165 query worked only for one call of this command
> >> >> catch this:
> >>
> >> >> https://pastebin.com/zKu06fJn
> >>
> >> here is for pg 1.60:
> >>
> >> https://pastebin.com/Xuk5iFXr
>
> > Look at the bottom, after it says
>
> > "blocked": "peering is blocked due to down osds",
>
> > Did the 1.165 pg recover?
>
> No it didn't:
>
> [root@cc1 ~]# ceph health detail
> HEALTH_WARN 1 pgs down; 1 pgs incomplete; 1 pgs peering; 2 pgs stuck inactive
> pg 1.165 is stuck inactive since forever, current state incomplete, last acting [67,88,48]
> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [68]
> pg 1.60 is down+remapped+peering, acting [68]
> pg 1.165 is incomplete, acting [67,88,48]
> [root@cc1 ~]#
Hrm.
ceph daemon osd.67 config set debug_osd 20
ceph daemon osd.67 config set debug_ms 1
ceph osd down 67
and capture the log resulting log segment, then post it with
ceph-post-file.
sage
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 18:16 ` Sage Weil
@ 2017-05-24 19:47 ` Łukasz Chrustek
0 siblings, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 19:47 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hello,
> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>>
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >>
>> >> >> And now it is very weird.... I made osd.37 up, and loop
>> >> >> while true;do; ceph tell 1.165 query ;done
>> >>
>> >> > Here need to explain more - all I did was start ceph-osd id=37 on
>> >> > storage node, in ceph osd tree this osd osd is marked as out:
>> >>
>> >>
>> >> > -17 21.49995 host stor8
>> >> > 22 1.59999 osd.22 up 1.00000 1.00000
>> >> > 23 1.59999 osd.23 up 1.00000 1.00000
>> >> > 36 2.09999 osd.36 up 1.00000 1.00000
>> >> > 37 2.09999 osd.37 up 0 1.00000
>> >> > 38 2.50000 osd.38 up 1.00000 1.00000
>> >> > 39 2.50000 osd.39 up 1.00000 1.00000
>> >> > 40 2.50000 osd.40 up 0 1.00000
>> >> > 41 2.50000 osd.41 down 0 1.00000
>> >> > 42 2.50000 osd.42 up 1.00000 1.00000
>> >> > 43 1.59999 osd.43 up 1.00000 1.00000
>> >>
>> >> > after start of this osd, ceph tell 1.165 query worked only for one call of this command
>> >> >> catch this:
>> >>
>> >> >> https://pastebin.com/zKu06fJn
>> >>
>> >> here is for pg 1.60:
>> >>
>> >> https://pastebin.com/Xuk5iFXr
>>
>> > Look at the bottom, after it says
>>
>> > "blocked": "peering is blocked due to down osds",
>>
>> > Did the 1.165 pg recover?
>>
>> No it didn't:
>>
>> [root@cc1 ~]# ceph health detail
>> HEALTH_WARN 1 pgs down; 1 pgs incomplete; 1 pgs peering; 2 pgs stuck inactive
>> pg 1.165 is stuck inactive since forever, current state incomplete, last acting [67,88,48]
>> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [68]
>> pg 1.60 is down+remapped+peering, acting [68]
>> pg 1.165 is incomplete, acting [67,88,48]
>> [root@cc1 ~]#
> Hrm.
> ceph daemon osd.67 config set debug_osd 20
> ceph daemon osd.67 config set debug_ms 1
> ceph osd down 67
> and capture the log resulting log segment, then post it with
> ceph-post-file.
args: -- /var/log/ceph/ceph-osd.67.log
/usr/bin/ceph-post-file: upload tag 05a02f14-8fd6-43da-9b9c-e42cd1fce560
/usr/bin/ceph-post-file: user: root@stor3
/usr/bin/ceph-post-file: will upload file /var/log/ceph/ceph-osd.67.log
sftp> mkdir post/05a02f14-8fd6-43da-9b9c-e42cd1fce560_root@stor3_8612f2d9-bb31-4d5e-b3e7-3722f8d13314
sftp> cd post/05a02f14-8fd6-43da-9b9c-e42cd1fce560_root@stor3_8612f2d9-bb31-4d5e-b3e7-3722f8d13314
sftp> put /tmp/tmp.rggR3suNMt user
sftp> put /var/log/ceph/ceph-osd.67.log
/usr/bin/ceph-post-file: copy the upload id below to share with a dev:
ceph-post-file: 05a02f14-8fd6-43da-9b9c-e42cd1fce560
--
Regards,,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 14:47 ` Sage Weil
2017-05-24 15:00 ` Łukasz Chrustek
@ 2017-05-24 21:38 ` Łukasz Chrustek
2017-05-24 21:53 ` Sage Weil
1 sibling, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 21:38 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hello,
>>
>> > This
>>
>> osd 6 - isn't startable
> Disk completely 100% dead, or just borken enough that ceph-osd won't
> start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> from this osd to recover any important writes on that osd.
>> osd 10, 37, 72 are startable
> With those started, I'd repeat the original sequence and get a fresh pg
> query to confirm that it still wants just osd.6.
> use ceph-objectstore-tool to export the pg from osd.6, stop some other
> ranodm osd (not one of these ones), import the pg into that osd, and start
> again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
> point. repeat with the same basic process with the other pg.
Here is output from ceph-objectstore-tool - also didn't success:
https://pastebin.com/7XGAHdKH
--
Pozdrowienia,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 21:38 ` Łukasz Chrustek
@ 2017-05-24 21:53 ` Sage Weil
2017-05-24 22:09 ` Łukasz Chrustek
0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-24 21:53 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 5583 bytes --]
On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Hello,
>
> >>
> >> > This
> >>
> >> osd 6 - isn't startable
>
> > Disk completely 100% dead, or just borken enough that ceph-osd won't
> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> > from this osd to recover any important writes on that osd.
>
> >> osd 10, 37, 72 are startable
>
> > With those started, I'd repeat the original sequence and get a fresh pg
> > query to confirm that it still wants just osd.6.
>
> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> > ranodm osd (not one of these ones), import the pg into that osd, and start
> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
> > point. repeat with the same basic process with the other pg.
>
> Here is output from ceph-objectstore-tool - also didn't success:
>
> https://pastebin.com/7XGAHdKH
Hmm, btrfs:
2017-05-24 23:28:58.547456 7f500948e940 -1
filestore(/var/lib/ceph/osd/ceph-84) ERROR:
/var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
losing new data
You could try setting --osd-use-stale-snap as suggested.
Is it the same error with the other one?
Looking at the log you sent earlier for 1.165 on osd.67, and the primary
reports:
2017-05-24 21:37:11.505256 7efdbc1e5700 5 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] enter Started/Primary/Peering/GetLog
2017-05-24 21:37:11.505291 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.37 1.165( v 112598'67281552 (112574'67278547,112598'67281552] lb 1/56500165/rbd_data.674a3ed7dffd473.0000000000000b38/
head (NIBBLEWISE) local-les=112584 n=1 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505299 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.38 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h
ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505306 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.48 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h
ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505313 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.67 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h
ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505319 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.88 1.165( empty local-les=0 n=0 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505326 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] choose_acting failed
in particular, osd 37 38 48 67 all have incomplete copies of the PG (they
are mid-backfill) and 68 has nothing. Some data is lost unless you can
recovery another OSD with that PG.
The set of OSDs that might have data are: 6,10,33,72,84
If that bears no fruit, then you can force last_backfill to report
complete on one of those OSDs and it'll think it has all the data even
though some of it is likely gone. (We can pick one that is farther
along... 38 48 and 67 seem to all match.)
sage
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 21:53 ` Sage Weil
@ 2017-05-24 22:09 ` Łukasz Chrustek
2017-05-24 22:27 ` Sage Weil
0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 22:09 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Cześć,
> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> Hello,
>>
>> >>
>> >> > This
>> >>
>> >> osd 6 - isn't startable
>>
>> > Disk completely 100% dead, or just borken enough that ceph-osd won't
>> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
>> > from this osd to recover any important writes on that osd.
>>
>> >> osd 10, 37, 72 are startable
>>
>> > With those started, I'd repeat the original sequence and get a fresh pg
>> > query to confirm that it still wants just osd.6.
>>
>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
>> > point. repeat with the same basic process with the other pg.
>>
>> Here is output from ceph-objectstore-tool - also didn't success:
>>
>> https://pastebin.com/7XGAHdKH
> Hmm, btrfs:
> 2017-05-24 23:28:58.547456 7f500948e940 -1
> filestore(/var/lib/ceph/osd/ceph-84) ERROR:
> /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
> losing new data
> You could try setting --osd-use-stale-snap as suggested.
Yes... tried... and I simply get rided of 39GB data...
> Is it the same error with the other one?
Yes: https://pastebin.com/7XGAHdKH
> in particular, osd 37 38 48 67 all have incomplete copies of the PG (they
> are mid-backfill) and 68 has nothing. Some data is lost unless you can
> recovery another OSD with that PG.
> The set of OSDs that might have data are: 6,10,33,72,84
> If that bears no fruit, then you can force last_backfill to report
how to force last_backfill ?
> complete on one of those OSDs and it'll think it has all the data even
> though some of it is likely gone. (We can pick one that is farther
> along... 38 48 and 67 seem to all match.)
> sage
--
Pozdrowienia,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 22:09 ` Łukasz Chrustek
@ 2017-05-24 22:27 ` Sage Weil
2017-05-24 22:46 ` Łukasz Chrustek
0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-24 22:27 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2337 bytes --]
On Thu, 25 May 2017, Łukasz Chrustek wrote:
> Cześć,
>
> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> Hello,
> >>
> >> >>
> >> >> > This
> >> >>
> >> >> osd 6 - isn't startable
> >>
> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't
> >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> >> > from this osd to recover any important writes on that osd.
> >>
> >> >> osd 10, 37, 72 are startable
> >>
> >> > With those started, I'd repeat the original sequence and get a fresh pg
> >> > query to confirm that it still wants just osd.6.
> >>
> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> >> > ranodm osd (not one of these ones), import the pg into that osd, and start
> >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
> >> > point. repeat with the same basic process with the other pg.
> >>
> >> Here is output from ceph-objectstore-tool - also didn't success:
> >>
> >> https://pastebin.com/7XGAHdKH
>
> > Hmm, btrfs:
>
> > 2017-05-24 23:28:58.547456 7f500948e940 -1
> > filestore(/var/lib/ceph/osd/ceph-84) ERROR:
> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
> > losing new data
>
> > You could try setting --osd-use-stale-snap as suggested.
>
> Yes... tried... and I simply get rided of 39GB data...
What does "get rided" mean?
>
> > Is it the same error with the other one?
>
> Yes: https://pastebin.com/7XGAHdKH
>
>
>
>
> > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they
> > are mid-backfill) and 68 has nothing. Some data is lost unless you can
> > recovery another OSD with that PG.
>
> > The set of OSDs that might have data are: 6,10,33,72,84
>
> > If that bears no fruit, then you can force last_backfill to report
>
> how to force last_backfill ?
>
> > complete on one of those OSDs and it'll think it has all the data even
> > though some of it is likely gone. (We can pick one that is farther
> > along... 38 48 and 67 seem to all match.)
>
> > sage
>
>
>
> --
> Pozdrowienia,
> Łukasz Chrustek
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 22:27 ` Sage Weil
@ 2017-05-24 22:46 ` Łukasz Chrustek
2017-05-25 2:06 ` Sage Weil
2017-05-30 13:21 ` Sage Weil
0 siblings, 2 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 22:46 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Cześć,
> On Thu, 25 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>>
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> Hello,
>> >>
>> >> >>
>> >> >> > This
>> >> >>
>> >> >> osd 6 - isn't startable
>> >>
>> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't
>> >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
>> >> > from this osd to recover any important writes on that osd.
>> >>
>> >> >> osd 10, 37, 72 are startable
>> >>
>> >> > With those started, I'd repeat the original sequence and get a fresh pg
>> >> > query to confirm that it still wants just osd.6.
>> >>
>> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> >> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
>> >> > point. repeat with the same basic process with the other pg.
>> >>
>> >> Here is output from ceph-objectstore-tool - also didn't success:
>> >>
>> >> https://pastebin.com/7XGAHdKH
>>
>> > Hmm, btrfs:
>>
>> > 2017-05-24 23:28:58.547456 7f500948e940 -1
>> > filestore(/var/lib/ceph/osd/ceph-84) ERROR:
>> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
>> > losing new data
>>
>> > You could try setting --osd-use-stale-snap as suggested.
>>
>> Yes... tried... and I simply get rided of 39GB data...
> What does "get rided" mean?
according to this pastebin: https://pastebin.com/QPcpkjg4
ls -R /var/lib/ceph/osd/ceph-33/current/
/var/lib/ceph/osd/ceph-33/current/:
commit_op_seq omap
/var/lib/ceph/osd/ceph-33/current/omap:
000003.log CURRENT LOCK MANIFEST-000002
earlier there were same data files.
>>
>> > Is it the same error with the other one?
>>
>> Yes: https://pastebin.com/7XGAHdKH
>>
>>
>>
>>
>> > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they
>> > are mid-backfill) and 68 has nothing. Some data is lost unless you can
>> > recovery another OSD with that PG.
>>
>> > The set of OSDs that might have data are: 6,10,33,72,84
>>
>> > If that bears no fruit, then you can force last_backfill to report
>> complete on one of those OSDs and it'll think it has all the data even
>> though some of it is likely gone. (We can pick one that is farther
>> along... 38 48 and 67 seem to all match.
Can You explain what do You mean by 'force last_backfill to report
complete' ? The current value for PG 1.60 is MAX and for 1.165 is
1\/db616165\/rbd_data.ed9979641a9d82.000000000001dcee\/head
--
Pozdrowienia,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 22:46 ` Łukasz Chrustek
@ 2017-05-25 2:06 ` Sage Weil
2017-05-25 11:22 ` Łukasz Chrustek
2017-05-30 13:21 ` Sage Weil
1 sibling, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-25 2:06 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3049 bytes --]
On Thu, 25 May 2017, Łukasz Chrustek wrote:
> Cześć,
>
> > On Thu, 25 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >>
> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> >> Hello,
> >> >>
> >> >> >>
> >> >> >> > This
> >> >> >>
> >> >> >> osd 6 - isn't startable
> >> >>
> >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't
> >> >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> >> >> > from this osd to recover any important writes on that osd.
> >> >>
> >> >> >> osd 10, 37, 72 are startable
> >> >>
> >> >> > With those started, I'd repeat the original sequence and get a fresh pg
> >> >> > query to confirm that it still wants just osd.6.
> >> >>
> >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start
> >> >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
> >> >> > point. repeat with the same basic process with the other pg.
> >> >>
> >> >> Here is output from ceph-objectstore-tool - also didn't success:
> >> >>
> >> >> https://pastebin.com/7XGAHdKH
> >>
> >> > Hmm, btrfs:
> >>
> >> > 2017-05-24 23:28:58.547456 7f500948e940 -1
> >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR:
> >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
> >> > losing new data
> >>
> >> > You could try setting --osd-use-stale-snap as suggested.
> >>
> >> Yes... tried... and I simply get rided of 39GB data...
>
> > What does "get rided" mean?
>
> according to this pastebin: https://pastebin.com/QPcpkjg4
>
> ls -R /var/lib/ceph/osd/ceph-33/current/
>
> /var/lib/ceph/osd/ceph-33/current/:
>
> commit_op_seq omap
>
>
>
> /var/lib/ceph/osd/ceph-33/current/omap:
>
> 000003.log CURRENT LOCK MANIFEST-000002
>
> earlier there were same data files.
Yeah, looks like all the data was deleted from the device. :(
> >>
> >> > Is it the same error with the other one?
> >>
> >> Yes: https://pastebin.com/7XGAHdKH
> >>
> >>
> >>
> >>
> >> > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they
> >> > are mid-backfill) and 68 has nothing. Some data is lost unless you can
> >> > recovery another OSD with that PG.
> >>
> >> > The set of OSDs that might have data are: 6,10,33,72,84
> >>
> >> > If that bears no fruit, then you can force last_backfill to report
> >> complete on one of those OSDs and it'll think it has all the data even
> >> though some of it is likely gone. (We can pick one that is farther
> >> along... 38 48 and 67 seem to all match.
>
> Can You explain what do You mean by 'force last_backfill to report
> complete' ? The current value for PG 1.60 is MAX and for 1.165 is
> 1\/db616165\/rbd_data.ed9979641a9d82.000000000001dcee\/head
ceph-objectstore-tool has a mark-complete operation. Do that one one of
the OSDs that has the more advanced last_backfill (like the one above).
After you restart the PG should recover.
Good luck!
sage
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-25 2:06 ` Sage Weil
@ 2017-05-25 11:22 ` Łukasz Chrustek
2017-05-29 15:31 ` Łukasz Chrustek
0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-25 11:22 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Cześć,
> On Thu, 25 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>>
>> > On Thu, 25 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >>
>> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> Hello,
>> >> >>
>> >> >> >>
>> >> >> >> > This
>> >> >> >>
>> >> >> >> osd 6 - isn't startable
>> >> >>
>> >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't
>> >> >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
>> >> >> > from this osd to recover any important writes on that osd.
>> >> >>
>> >> >> >> osd 10, 37, 72 are startable
>> >> >>
>> >> >> > With those started, I'd repeat the original sequence and get a fresh pg
>> >> >> > query to confirm that it still wants just osd.6.
>> >> >>
>> >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> >> >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
>> >> >> > point. repeat with the same basic process with the other pg.
>> >> >>
>> >> >> Here is output from ceph-objectstore-tool - also didn't success:
>> >> >>
>> >> >> https://pastebin.com/7XGAHdKH
>> >>
>> >> > Hmm, btrfs:
>> >>
>> >> > 2017-05-24 23:28:58.547456 7f500948e940 -1
>> >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR:
>> >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
>> >> > losing new data
>> >>
>> >> > You could try setting --osd-use-stale-snap as suggested.
>> >>
>> >> Yes... tried... and I simply get rided of 39GB data...
>>
>> > What does "get rided" mean?
>>
>> according to this pastebin: https://pastebin.com/QPcpkjg4
>>
>> ls -R /var/lib/ceph/osd/ceph-33/current/
>>
>> /var/lib/ceph/osd/ceph-33/current/:
>>
>> commit_op_seq omap
>>
>>
>>
>> /var/lib/ceph/osd/ceph-33/current/omap:
>>
>> 000003.log CURRENT LOCK MANIFEST-000002
>>
>> earlier there were same data files.
> Yeah, looks like all the data was deleted from the device. :(
>> >>
>> >> > Is it the same error with the other one?
>> >>
>> >> Yes: https://pastebin.com/7XGAHdKH
>> >>
>> >>
>> >>
>> >>
>> >> > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they
>> >> > are mid-backfill) and 68 has nothing. Some data is lost unless you can
>> >> > recovery another OSD with that PG.
>> >>
>> >> > The set of OSDs that might have data are: 6,10,33,72,84
>> >>
>> >> > If that bears no fruit, then you can force last_backfill to report
>> >> complete on one of those OSDs and it'll think it has all the data even
>> >> though some of it is likely gone. (We can pick one that is farther
>> >> along... 38 48 and 67 seem to all match.
>>
>> Can You explain what do You mean by 'force last_backfill to report
>> complete' ? The current value for PG 1.60 is MAX and for 1.165 is
>> 1\/db616165\/rbd_data.ed9979641a9d82.000000000001dcee\/head
> ceph-objectstore-tool has a mark-complete operation. Do that one one of
> the OSDs that has the more advanced last_backfill (like the one above).
> After you restart the PG should recover.
It is (https://pastebin.com/Jv2DpcB3) pg dump_stuck BEFORE running:
ceph-objectstore-tool --debug --op mark-complete --pgid 1.165 --data-path /var/lib/ceph/osd/ceph-48 --journal-path /var/lib/ceph/osd/ceph-48/journal --osd-use-stale-snap
as in previous usage of this tool data gone away:
[root@stor5 /var/lib/ceph/osd/ceph-48]# du -sh current
20K current
[root@stor5 /var/lib/ceph/osd/ceph-48/current]# ls -R
.:
commit_op_seq nosnap omap/
./omap:
000011.log CURRENT LOCK LOG LOG.old MANIFEST-000010
after running ceph-objectstore-tool it is:
ceph pg dump_stuck
ok
pg_stat state up up_primary acting acting_primary
1.39 active+remapped+backfilling [11,4,39] 11 [5,39,70] 5
1.1a9 active+remapped+backfilling [11,30,3] 11 [0,30,8] 0
1.b active+remapped+backfilling [11,36,94] 11 [38,97,70] 38
1.12f active+remapped+backfilling [14,11,47] 14 [14,5,69] 14
1.1d2 active+remapped+backfilling [11,2,38] 11 [0,36,49] 0
1.133 active+remapped+backfilling [42,11,83] 42 [42,89,21] 42
40.69 stale+active+undersized+degraded [48] 48 [48] 48
1.9d active+remapped+backfilling [39,2,11] 39 [39,2,86] 39
1.a2 active+remapped+backfilling [11,12,34] 11 [14,35,95] 14
1.10a active+remapped+backfilling [11,2,87] 11 [1,87,81] 1
1.70 active+remapped+backfilling [14,39,11] 14 [14,39,4] 14
1.60 down+remapped+peering [83,69,68] 83 [9] 9
1.eb active+remapped+backfilling [11,18,53] 11 [14,53,69] 14
1.8d active+remapped+backfilling [11,0,30] 11 [36,0,30] 36
1.118 active+remapped+backfilling [34,11,12] 34 [34,20,86] 34
1.121 active+remapped+backfilling [43,11,35] 43 [43,35,2] 43
1.177 active+remapped+backfilling [14,1,11] 14 [14,1,38] 14
1.17c active+remapped+backfilling [5,94,11] 5 [5,94,7] 5
1.16d active+remapped+backfilling [96,11,53] 96 [96,52,9] 96
1.19a active+remapped+backfilling [11,0,14] 11 [0,17,35] 0
1.165 down+peering [39,55,82] 39 [39,55,82] 39
1.1a active+remapped+backfilling [36,52,11] 36 [36,52,96] 36
1.e7 active+remapped+backfilling [11,35,44] 11 [34,44,9] 34
Is there any chance to rescue this cluster ?
--
Regards,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-25 11:22 ` Łukasz Chrustek
@ 2017-05-29 15:31 ` Łukasz Chrustek
0 siblings, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-29 15:31 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hello,
> ./omap:
> 000011.log CURRENT LOCK LOG LOG.old MANIFEST-000010
> after running ceph-objectstore-tool it is:
> ceph pg dump_stuck
> ok
> pg_stat state up up_primary acting acting_primary
> 1.39 active+remapped+backfilling [11,4,39] 11 [5,39,70] 5
> 1.1a9 active+remapped+backfilling [11,30,3] 11 [0,30,8] 0
> 1.b active+remapped+backfilling [11,36,94] 11 [38,97,70] 38
> 1.12f active+remapped+backfilling [14,11,47] 14 [14,5,69] 14
> 1.1d2 active+remapped+backfilling [11,2,38] 11 [0,36,49] 0
> 1.133 active+remapped+backfilling [42,11,83] 42 [42,89,21] 42
> 40.69 stale+active+undersized+degraded [48] 48 [48] 48
> 1.9d active+remapped+backfilling [39,2,11] 39 [39,2,86] 39
> 1.a2 active+remapped+backfilling [11,12,34] 11 [14,35,95] 14
> 1.10a active+remapped+backfilling [11,2,87] 11 [1,87,81] 1
> 1.70 active+remapped+backfilling [14,39,11] 14 [14,39,4] 14
> 1.60 down+remapped+peering [83,69,68] 83 [9] 9
> 1.eb active+remapped+backfilling [11,18,53] 11 [14,53,69] 14
> 1.8d active+remapped+backfilling [11,0,30] 11 [36,0,30] 36
> 1.118 active+remapped+backfilling [34,11,12] 34 [34,20,86] 34
> 1.121 active+remapped+backfilling [43,11,35] 43 [43,35,2] 43
> 1.177 active+remapped+backfilling [14,1,11] 14 [14,1,38] 14
> 1.17c active+remapped+backfilling [5,94,11] 5 [5,94,7] 5
> 1.16d active+remapped+backfilling [96,11,53] 96 [96,52,9] 96
> 1.19a active+remapped+backfilling [11,0,14] 11 [0,17,35] 0
> 1.165 down+peering [39,55,82] 39 [39,55,82] 39
> 1.1a active+remapped+backfilling [36,52,11] 36 [36,52,96] 36
> 1.e7 active+remapped+backfilling [11,35,44] 11 [34,44,9] 34
> Is there any chance to rescue this cluster ?
I have now turned off all OSDs and MONs, after that I turn on two of
three MONs to make qourum. On all osds all ceph processes are off. But
ceph osd tree see old/false data: https://pastebin.com/pVGLxAPs
Why ceph doesn't see that all osds are down ? What can him block like
this ?
--
Regards,
Łukasz Chrustek
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-24 22:46 ` Łukasz Chrustek
2017-05-25 2:06 ` Sage Weil
@ 2017-05-30 13:21 ` Sage Weil
2017-06-10 22:45 ` Łukasz Chrustek
1 sibling, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-30 13:21 UTC (permalink / raw)
To: Łukasz Chrustek; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2451 bytes --]
On Thu, 25 May 2017, Łukasz Chrustek wrote:
> Cześć,
>
> > On Thu, 25 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >>
> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> >> Hello,
> >> >>
> >> >> >>
> >> >> >> > This
> >> >> >>
> >> >> >> osd 6 - isn't startable
> >> >>
> >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't
> >> >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> >> >> > from this osd to recover any important writes on that osd.
> >> >>
> >> >> >> osd 10, 37, 72 are startable
> >> >>
> >> >> > With those started, I'd repeat the original sequence and get a fresh pg
> >> >> > query to confirm that it still wants just osd.6.
> >> >>
> >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start
> >> >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that
> >> >> > point. repeat with the same basic process with the other pg.
> >> >>
> >> >> Here is output from ceph-objectstore-tool - also didn't success:
> >> >>
> >> >> https://pastebin.com/7XGAHdKH
> >>
> >> > Hmm, btrfs:
> >>
> >> > 2017-05-24 23:28:58.547456 7f500948e940 -1
> >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR:
> >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
> >> > losing new data
> >>
> >> > You could try setting --osd-use-stale-snap as suggested.
> >>
> >> Yes... tried... and I simply get rided of 39GB data...
>
> > What does "get rided" mean?
>
> according to this pastebin: https://pastebin.com/QPcpkjg4
>
> ls -R /var/lib/ceph/osd/ceph-33/current/
>
> /var/lib/ceph/osd/ceph-33/current/:
>
> commit_op_seq omap
>
>
>
> /var/lib/ceph/osd/ceph-33/current/omap:
>
> 000003.log CURRENT LOCK MANIFEST-000002
>
> earlier there were same data files.
Okay, sorry I took a while to get back to you. It looks like I gave
you bad advice here! The 'nosnap' files means filestore was
operating in non-snapshotting mode, and the --osd-use-stale-snap
warning that it would lose data was real... it rolled back to an empty
state and threw out the data on the device. :( :( I'm *very* sorry about
this! I haven't looked at or worked with the btrfs mode is ages (we
don't recommend it and almost nobody uses it) but I should have been
paying close attention.
What is the state of the cluster now?
sage
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Problem with query and any operation on PGs
2017-05-30 13:21 ` Sage Weil
@ 2017-06-10 22:45 ` Łukasz Chrustek
0 siblings, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-06-10 22:45 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hi Sage,
> Okay, sorry I took a while to get back to you.
Sorry too - most of time I was focused on this problem.
> It looks like I gave
> you bad advice here! The 'nosnap' files means filestore was
> operating in non-snapshotting mode, and the --osd-use-stale-snap
> warning that it would lose data was real... it rolled back to an empty
> state and threw out the data on the device. :( :( I'm *very* sorry about
> this! I haven't looked at or worked with the btrfs mode is ages (we
> don't recommend it and almost nobody uses it) but I should have been
> paying close attention.
Thank You for Your time and effort, it was important to have such
help. There were many errors in setup of this cluster. We didn't
relize that there could be so much strange things, which were f...ed
up...
> What is the state of the cluster now?
Cluster is dead. After few more days of fight with cluster we decieded
to shut it down and we fixed scripts for recovering volumes from
turned off ceph cluster (this one:
https://github.com/cmgitdream/ceph-rbd-recover-tool) and make it
running for jewel version (10.2.7). I setup brand new cluster on other
hardware and now images are importing to new cluster. With some direct
edition of mysql in openstack systems we didn't had to change
everything for our clients from horizon point of view. Once the dust
settles, we will add changes to github for this tool.
After end of migration we will try to run this dead cluster and make
some more agressive action to make it work anyway.
--
Regards,,
Lukasz
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2017-06-10 22:45 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <175484591.20170523135449@tlen.pl>
2017-05-23 12:48 ` Problem with query and any operation on PGs Łukasz Chrustek
2017-05-23 14:17 ` Sage Weil
2017-05-23 14:43 ` Łukasz Chrustek
[not found] ` <1464688590.20170523185052@tlen.pl>
2017-05-23 17:40 ` Sage Weil
2017-05-23 21:43 ` Łukasz Chrustek
2017-05-23 21:48 ` Sage Weil
2017-05-24 13:19 ` Łukasz Chrustek
2017-05-24 13:37 ` Sage Weil
2017-05-24 13:58 ` Łukasz Chrustek
2017-05-24 14:02 ` Sage Weil
2017-05-24 14:18 ` Łukasz Chrustek
2017-05-24 14:47 ` Sage Weil
2017-05-24 15:00 ` Łukasz Chrustek
2017-05-24 15:07 ` Łukasz Chrustek
2017-05-24 15:11 ` Sage Weil
2017-05-24 15:24 ` Łukasz Chrustek
2017-05-24 15:54 ` Łukasz Chrustek
2017-05-24 16:02 ` Łukasz Chrustek
2017-05-24 17:07 ` Łukasz Chrustek
2017-05-24 17:16 ` Sage Weil
2017-05-24 17:28 ` Łukasz Chrustek
2017-05-24 18:16 ` Sage Weil
2017-05-24 19:47 ` Łukasz Chrustek
2017-05-24 17:30 ` Łukasz Chrustek
2017-05-24 17:35 ` Łukasz Chrustek
2017-05-24 21:38 ` Łukasz Chrustek
2017-05-24 21:53 ` Sage Weil
2017-05-24 22:09 ` Łukasz Chrustek
2017-05-24 22:27 ` Sage Weil
2017-05-24 22:46 ` Łukasz Chrustek
2017-05-25 2:06 ` Sage Weil
2017-05-25 11:22 ` Łukasz Chrustek
2017-05-29 15:31 ` Łukasz Chrustek
2017-05-30 13:21 ` Sage Weil
2017-06-10 22:45 ` Łukasz Chrustek
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.