From: Sage Weil <sage@newdream.net>
To: "Łukasz Chrustek" <skidoo@tlen.pl>
Cc: ceph-devel@vger.kernel.org
Subject: Re: Problem with query and any operation on PGs
Date: Tue, 23 May 2017 14:17:38 +0000 (UTC) [thread overview]
Message-ID: <alpine.DEB.2.11.1705231415400.3646@piezo.novalocal> (raw)
In-Reply-To: <483467685.20170523144818@tlen.pl>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 11624 bytes --]
On Tue, 23 May 2017, Łukasz Chrustek wrote:
> Cześć,
>
> Hello,
>
> After terrible outage coused by failure of 10Gbit switch, ceph cluster
> went to HEALTH_ERR (three whole storage servers go offline in the same time
> and didn't back in short time). After cluster recovery two PGs goto to
> incomplite state, I can't them query, and can't do with them anything,
The thing where you can't query a PG is because the OSD is throttling
incoming work and the throttle is exhausted (the PG can't do work so it
isn't making progress). A workaround for jewel is to restart the OSD
serving the PG and do the query quickly after that (probably in a loop so
that you catch it after it starts up but before the throttle is
exhausted again). (In luminous this is fixed.)
Once you have the query output ('ceph tell $pgid query') you'll be able to
tell what is preventing the PG from peering.
You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'.
HTH!
sage
> what would allow back working cluster back. here is strace of
> this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off:
>
> [root@cc1 ~]# rbd ls management-vms
> os-mongodb1
> os-mongodb1-database
> os-gitlab-root
> os-mongodb1-database2
> os-wiki-root
> [root@cc1 ~]# rbd ls volumes
> ^C
> [root@cc1 ~]#
>
> and for all mon hosts (don't put all three here)
>
> [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms
> os-mongodb1
> os-mongodb1-database
> os-gitlab-root
> os-mongodb1-database2
> os-wiki-root
> [root@cc1 ~]# rbd -m 192.168.128.1 list volumes
> ^C
> [root@cc1 ~]#
>
> and all other POOLs from list, except (most important) volumes, I can
> list images.
>
> Fanny thing, I can list rbd info for particular image:
>
> [root@cc1 ~]# rbd info
> volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497
> rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497':
> size 20480 MB in 1280 objects
> order 24 (16384 kB objects)
> block_name_prefix: rbd_data.64a21a0a9acf52
> format: 2
> features: layering
> flags:
> parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap
> overlap: 3072 MB
>
> but can't list the whole content of pool volumes.
>
> [root@cc1 ~]# ceph osd pool ls
> volumes
> images
> backups
> volumes-ssd-intel-s3700
> management-vms
> .rgw.root
> .rgw.control
> .rgw
> .rgw.gc
> .log
> .users.uid
> .rgw.buckets.index
> .users
> .rgw.buckets.extra
> .rgw.buckets
> volumes-cached
> cache-ssd
>
> here is ceph osd tree:
>
> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -7 20.88388 root ssd-intel-s3700
> -11 3.19995 host ssd-stor1
> 56 0.79999 osd.56 up 1.00000 1.00000
> 57 0.79999 osd.57 up 1.00000 1.00000
> 58 0.79999 osd.58 up 1.00000 1.00000
> 59 0.79999 osd.59 up 1.00000 1.00000
> -9 2.12999 host ssd-stor2
> 60 0.70999 osd.60 up 1.00000 1.00000
> 61 0.70999 osd.61 up 1.00000 1.00000
> 62 0.70999 osd.62 up 1.00000 1.00000
> -8 2.12999 host ssd-stor3
> 63 0.70999 osd.63 up 1.00000 1.00000
> 64 0.70999 osd.64 up 1.00000 1.00000
> 65 0.70999 osd.65 up 1.00000 1.00000
> -10 4.19998 host ssd-stor4
> 25 0.70000 osd.25 up 1.00000 1.00000
> 26 0.70000 osd.26 up 1.00000 1.00000
> 27 0.70000 osd.27 up 1.00000 1.00000
> 28 0.70000 osd.28 up 1.00000 1.00000
> 29 0.70000 osd.29 up 1.00000 1.00000
> 24 0.70000 osd.24 up 1.00000 1.00000
> -12 3.41199 host ssd-stor5
> 73 0.85300 osd.73 up 1.00000 1.00000
> 74 0.85300 osd.74 up 1.00000 1.00000
> 75 0.85300 osd.75 up 1.00000 1.00000
> 76 0.85300 osd.76 up 1.00000 1.00000
> -13 3.41199 host ssd-stor6
> 77 0.85300 osd.77 up 1.00000 1.00000
> 78 0.85300 osd.78 up 1.00000 1.00000
> 79 0.85300 osd.79 up 1.00000 1.00000
> 80 0.85300 osd.80 up 1.00000 1.00000
> -15 2.39999 host ssd-stor7
> 90 0.79999 osd.90 up 1.00000 1.00000
> 91 0.79999 osd.91 up 1.00000 1.00000
> 92 0.79999 osd.92 up 1.00000 1.00000
> -1 167.69969 root default
> -2 33.99994 host stor1
> 6 3.39999 osd.6 down 0 1.00000
> 7 3.39999 osd.7 up 1.00000 1.00000
> 8 3.39999 osd.8 up 1.00000 1.00000
> 9 3.39999 osd.9 up 1.00000 1.00000
> 10 3.39999 osd.10 down 0 1.00000
> 11 3.39999 osd.11 down 0 1.00000
> 69 3.39999 osd.69 up 1.00000 1.00000
> 70 3.39999 osd.70 up 1.00000 1.00000
> 71 3.39999 osd.71 down 0 1.00000
> 81 3.39999 osd.81 up 1.00000 1.00000
> -3 20.99991 host stor2
> 13 2.09999 osd.13 up 1.00000 1.00000
> 12 2.09999 osd.12 up 1.00000 1.00000
> 14 2.09999 osd.14 up 1.00000 1.00000
> 15 2.09999 osd.15 up 1.00000 1.00000
> 16 2.09999 osd.16 up 1.00000 1.00000
> 17 2.09999 osd.17 up 1.00000 1.00000
> 18 2.09999 osd.18 down 0 1.00000
> 19 2.09999 osd.19 up 1.00000 1.00000
> 20 2.09999 osd.20 up 1.00000 1.00000
> 21 2.09999 osd.21 up 1.00000 1.00000
> -4 25.00000 host stor3
> 30 2.50000 osd.30 up 1.00000 1.00000
> 31 2.50000 osd.31 up 1.00000 1.00000
> 32 2.50000 osd.32 up 1.00000 1.00000
> 33 2.50000 osd.33 down 0 1.00000
> 34 2.50000 osd.34 up 1.00000 1.00000
> 35 2.50000 osd.35 up 1.00000 1.00000
> 66 2.50000 osd.66 up 1.00000 1.00000
> 67 2.50000 osd.67 up 1.00000 1.00000
> 68 2.50000 osd.68 up 1.00000 1.00000
> 72 2.50000 osd.72 down 0 1.00000
> -5 25.00000 host stor4
> 44 2.50000 osd.44 up 1.00000 1.00000
> 45 2.50000 osd.45 up 1.00000 1.00000
> 46 2.50000 osd.46 down 0 1.00000
> 47 2.50000 osd.47 up 1.00000 1.00000
> 0 2.50000 osd.0 up 1.00000 1.00000
> 1 2.50000 osd.1 up 1.00000 1.00000
> 2 2.50000 osd.2 up 1.00000 1.00000
> 3 2.50000 osd.3 up 1.00000 1.00000
> 4 2.50000 osd.4 up 1.00000 1.00000
> 5 2.50000 osd.5 up 1.00000 1.00000
> -6 14.19991 host stor5
> 48 1.79999 osd.48 up 1.00000 1.00000
> 49 1.59999 osd.49 up 1.00000 1.00000
> 50 1.79999 osd.50 up 1.00000 1.00000
> 51 1.79999 osd.51 down 0 1.00000
> 52 1.79999 osd.52 up 1.00000 1.00000
> 53 1.79999 osd.53 up 1.00000 1.00000
> 54 1.79999 osd.54 up 1.00000 1.00000
> 55 1.79999 osd.55 up 1.00000 1.00000
> -14 14.39999 host stor6
> 82 1.79999 osd.82 up 1.00000 1.00000
> 83 1.79999 osd.83 up 1.00000 1.00000
> 84 1.79999 osd.84 up 1.00000 1.00000
> 85 1.79999 osd.85 up 1.00000 1.00000
> 86 1.79999 osd.86 up 1.00000 1.00000
> 87 1.79999 osd.87 up 1.00000 1.00000
> 88 1.79999 osd.88 up 1.00000 1.00000
> 89 1.79999 osd.89 up 1.00000 1.00000
> -16 12.59999 host stor7
> 93 1.79999 osd.93 up 1.00000 1.00000
> 94 1.79999 osd.94 up 1.00000 1.00000
> 95 1.79999 osd.95 up 1.00000 1.00000
> 96 1.79999 osd.96 up 1.00000 1.00000
> 97 1.79999 osd.97 up 1.00000 1.00000
> 98 1.79999 osd.98 up 1.00000 1.00000
> 99 1.79999 osd.99 up 1.00000 1.00000
> -17 21.49995 host stor8
> 22 1.59999 osd.22 up 1.00000 1.00000
> 23 1.59999 osd.23 up 1.00000 1.00000
> 36 2.09999 osd.36 up 1.00000 1.00000
> 37 2.09999 osd.37 up 1.00000 1.00000
> 38 2.50000 osd.38 up 1.00000 1.00000
> 39 2.50000 osd.39 up 1.00000 1.00000
> 40 2.50000 osd.40 up 1.00000 1.00000
> 41 2.50000 osd.41 down 0 1.00000
> 42 2.50000 osd.42 up 1.00000 1.00000
> 43 1.59999 osd.43 up 1.00000 1.00000
> [root@cc1 ~]#
>
> and ceph health detail:
>
> ceph health detail | grep down
> HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs
> peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs
> undersized; recovery 176211/14148564 objects degraded (1.245%);
> recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set
> pg 1.60 is stuck inactive since forever, current state
> down+remapped+peering, last acting [66,69,40]
> pg 1.165 is stuck inactive since forever, current state
> down+remapped+peering, last acting [37]
> pg 1.60 is stuck unclean since forever, current state
> down+remapped+peering, last acting [66,69,40]
> pg 1.165 is stuck unclean since forever, current state
> down+remapped+peering, last acting [37]
> pg 1.165 is down+remapped+peering, acting [37]
> pg 1.60 is down+remapped+peering, acting [66,69,40]
>
>
> problematic pgs are 1.165 and 1.60.
>
> Please advice how to unblock pool volumes and/or make this two pgs
> working - in a last night and day, when we tried to solve this issue
> these pgs are for 100% empty from data.
>
>
>
>
> --
> Pozdrowienia,
> Łukasz Chrustek
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
next prev parent reply other threads:[~2017-05-23 14:17 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <175484591.20170523135449@tlen.pl>
2017-05-23 12:48 ` Problem with query and any operation on PGs Łukasz Chrustek
2017-05-23 14:17 ` Sage Weil [this message]
2017-05-23 14:43 ` Łukasz Chrustek
[not found] ` <1464688590.20170523185052@tlen.pl>
2017-05-23 17:40 ` Sage Weil
2017-05-23 21:43 ` Łukasz Chrustek
2017-05-23 21:48 ` Sage Weil
2017-05-24 13:19 ` Łukasz Chrustek
2017-05-24 13:37 ` Sage Weil
2017-05-24 13:58 ` Łukasz Chrustek
2017-05-24 14:02 ` Sage Weil
2017-05-24 14:18 ` Łukasz Chrustek
2017-05-24 14:47 ` Sage Weil
2017-05-24 15:00 ` Łukasz Chrustek
2017-05-24 15:07 ` Łukasz Chrustek
2017-05-24 15:11 ` Sage Weil
2017-05-24 15:24 ` Łukasz Chrustek
2017-05-24 15:54 ` Łukasz Chrustek
2017-05-24 16:02 ` Łukasz Chrustek
2017-05-24 17:07 ` Łukasz Chrustek
2017-05-24 17:16 ` Sage Weil
2017-05-24 17:28 ` Łukasz Chrustek
2017-05-24 18:16 ` Sage Weil
2017-05-24 19:47 ` Łukasz Chrustek
2017-05-24 17:30 ` Łukasz Chrustek
2017-05-24 17:35 ` Łukasz Chrustek
2017-05-24 21:38 ` Łukasz Chrustek
2017-05-24 21:53 ` Sage Weil
2017-05-24 22:09 ` Łukasz Chrustek
2017-05-24 22:27 ` Sage Weil
2017-05-24 22:46 ` Łukasz Chrustek
2017-05-25 2:06 ` Sage Weil
2017-05-25 11:22 ` Łukasz Chrustek
2017-05-29 15:31 ` Łukasz Chrustek
2017-05-30 13:21 ` Sage Weil
2017-06-10 22:45 ` Łukasz Chrustek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.11.1705231415400.3646@piezo.novalocal \
--to=sage@newdream.net \
--cc=ceph-devel@vger.kernel.org \
--cc=skidoo@tlen.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.