All of lore.kernel.org
 help / color / mirror / Atom feed
* Problem with query and any operation on PGs
       [not found] <175484591.20170523135449@tlen.pl>
@ 2017-05-23 12:48 ` Łukasz Chrustek
  2017-05-23 14:17   ` Sage Weil
  0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-23 12:48 UTC (permalink / raw)
  To: ceph-devel

Cześć,

Hello,

After terrible outage coused by failure of 10Gbit switch, ceph cluster
went  to HEALTH_ERR (three whole storage servers go offline in the same time
and didn't back in short time). After cluster recovery two PGs goto to
incomplite state, I can't them query, and can't do with them anything,
what   would   allow   back  working cluster back. here is strace of
this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off:

[root@cc1 ~]# rbd ls management-vms
os-mongodb1
os-mongodb1-database
os-gitlab-root
os-mongodb1-database2
os-wiki-root
[root@cc1 ~]# rbd ls volumes
^C
[root@cc1 ~]#

and for all mon hosts (don't put all three here)

[root@cc1 ~]# rbd -m 192.168.128.1 list management-vms
os-mongodb1
os-mongodb1-database
os-gitlab-root
os-mongodb1-database2
os-wiki-root
[root@cc1 ~]# rbd -m 192.168.128.1 list volumes
^C
[root@cc1 ~]#

and  all other POOLs from list, except (most important) volumes, I can
list images.

Fanny thing, I can list rbd info for particular image:

[root@cc1 ~]# rbd info
volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497
rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497':
        size 20480 MB in 1280 objects
        order 24 (16384 kB objects)
        block_name_prefix: rbd_data.64a21a0a9acf52
        format: 2
        features: layering
        flags:
        parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap
        overlap: 3072 MB

but can't list the whole content of pool volumes.

[root@cc1 ~]# ceph osd pool ls
volumes
images
backups
volumes-ssd-intel-s3700
management-vms
.rgw.root
.rgw.control
.rgw
.rgw.gc
.log
.users.uid
.rgw.buckets.index
.users
.rgw.buckets.extra
.rgw.buckets
volumes-cached
cache-ssd

here is ceph osd tree:

ID  WEIGHT    TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -7  20.88388 root ssd-intel-s3700
-11   3.19995     host ssd-stor1
 56   0.79999         osd.56            up  1.00000          1.00000
 57   0.79999         osd.57            up  1.00000          1.00000
 58   0.79999         osd.58            up  1.00000          1.00000
 59   0.79999         osd.59            up  1.00000          1.00000
 -9   2.12999     host ssd-stor2
 60   0.70999         osd.60            up  1.00000          1.00000
 61   0.70999         osd.61            up  1.00000          1.00000
 62   0.70999         osd.62            up  1.00000          1.00000
 -8   2.12999     host ssd-stor3
 63   0.70999         osd.63            up  1.00000          1.00000
 64   0.70999         osd.64            up  1.00000          1.00000
 65   0.70999         osd.65            up  1.00000          1.00000
-10   4.19998     host ssd-stor4
 25   0.70000         osd.25            up  1.00000          1.00000
 26   0.70000         osd.26            up  1.00000          1.00000
 27   0.70000         osd.27            up  1.00000          1.00000
 28   0.70000         osd.28            up  1.00000          1.00000
 29   0.70000         osd.29            up  1.00000          1.00000
 24   0.70000         osd.24            up  1.00000          1.00000
-12   3.41199     host ssd-stor5
 73   0.85300         osd.73            up  1.00000          1.00000
 74   0.85300         osd.74            up  1.00000          1.00000
 75   0.85300         osd.75            up  1.00000          1.00000
 76   0.85300         osd.76            up  1.00000          1.00000
-13   3.41199     host ssd-stor6
 77   0.85300         osd.77            up  1.00000          1.00000
 78   0.85300         osd.78            up  1.00000          1.00000
 79   0.85300         osd.79            up  1.00000          1.00000
 80   0.85300         osd.80            up  1.00000          1.00000
-15   2.39999     host ssd-stor7
 90   0.79999         osd.90            up  1.00000          1.00000
 91   0.79999         osd.91            up  1.00000          1.00000
 92   0.79999         osd.92            up  1.00000          1.00000
 -1 167.69969 root default
 -2  33.99994     host stor1
  6   3.39999         osd.6           down        0          1.00000
  7   3.39999         osd.7             up  1.00000          1.00000
  8   3.39999         osd.8             up  1.00000          1.00000
  9   3.39999         osd.9             up  1.00000          1.00000
 10   3.39999         osd.10          down        0          1.00000
 11   3.39999         osd.11          down        0          1.00000
 69   3.39999         osd.69            up  1.00000          1.00000
 70   3.39999         osd.70            up  1.00000          1.00000
 71   3.39999         osd.71          down        0          1.00000
 81   3.39999         osd.81            up  1.00000          1.00000
 -3  20.99991     host stor2
 13   2.09999         osd.13            up  1.00000          1.00000
 12   2.09999         osd.12            up  1.00000          1.00000
 14   2.09999         osd.14            up  1.00000          1.00000
 15   2.09999         osd.15            up  1.00000          1.00000
 16   2.09999         osd.16            up  1.00000          1.00000
 17   2.09999         osd.17            up  1.00000          1.00000
 18   2.09999         osd.18          down        0          1.00000
 19   2.09999         osd.19            up  1.00000          1.00000
 20   2.09999         osd.20            up  1.00000          1.00000
 21   2.09999         osd.21            up  1.00000          1.00000
 -4  25.00000     host stor3
 30   2.50000         osd.30            up  1.00000          1.00000
 31   2.50000         osd.31            up  1.00000          1.00000
 32   2.50000         osd.32            up  1.00000          1.00000
 33   2.50000         osd.33          down        0          1.00000
 34   2.50000         osd.34            up  1.00000          1.00000
 35   2.50000         osd.35            up  1.00000          1.00000
 66   2.50000         osd.66            up  1.00000          1.00000
 67   2.50000         osd.67            up  1.00000          1.00000
 68   2.50000         osd.68            up  1.00000          1.00000
 72   2.50000         osd.72          down        0          1.00000
 -5  25.00000     host stor4
 44   2.50000         osd.44            up  1.00000          1.00000
 45   2.50000         osd.45            up  1.00000          1.00000
 46   2.50000         osd.46          down        0          1.00000
 47   2.50000         osd.47            up  1.00000          1.00000
  0   2.50000         osd.0             up  1.00000          1.00000
  1   2.50000         osd.1             up  1.00000          1.00000
  2   2.50000         osd.2             up  1.00000          1.00000
  3   2.50000         osd.3             up  1.00000          1.00000
  4   2.50000         osd.4             up  1.00000          1.00000
  5   2.50000         osd.5             up  1.00000          1.00000
 -6  14.19991     host stor5
 48   1.79999         osd.48            up  1.00000          1.00000
 49   1.59999         osd.49            up  1.00000          1.00000
 50   1.79999         osd.50            up  1.00000          1.00000
 51   1.79999         osd.51          down        0          1.00000
 52   1.79999         osd.52            up  1.00000          1.00000
 53   1.79999         osd.53            up  1.00000          1.00000
 54   1.79999         osd.54            up  1.00000          1.00000
 55   1.79999         osd.55            up  1.00000          1.00000
-14  14.39999     host stor6
 82   1.79999         osd.82            up  1.00000          1.00000
 83   1.79999         osd.83            up  1.00000          1.00000
 84   1.79999         osd.84            up  1.00000          1.00000
 85   1.79999         osd.85            up  1.00000          1.00000
 86   1.79999         osd.86            up  1.00000          1.00000
 87   1.79999         osd.87            up  1.00000          1.00000
 88   1.79999         osd.88            up  1.00000          1.00000
 89   1.79999         osd.89            up  1.00000          1.00000
-16  12.59999     host stor7
 93   1.79999         osd.93            up  1.00000          1.00000
 94   1.79999         osd.94            up  1.00000          1.00000
 95   1.79999         osd.95            up  1.00000          1.00000
 96   1.79999         osd.96            up  1.00000          1.00000
 97   1.79999         osd.97            up  1.00000          1.00000
 98   1.79999         osd.98            up  1.00000          1.00000
 99   1.79999         osd.99            up  1.00000          1.00000
-17  21.49995     host stor8
 22   1.59999         osd.22            up  1.00000          1.00000
 23   1.59999         osd.23            up  1.00000          1.00000
 36   2.09999         osd.36            up  1.00000          1.00000
 37   2.09999         osd.37            up  1.00000          1.00000
 38   2.50000         osd.38            up  1.00000          1.00000
 39   2.50000         osd.39            up  1.00000          1.00000
 40   2.50000         osd.40            up  1.00000          1.00000
 41   2.50000         osd.41          down        0          1.00000
 42   2.50000         osd.42            up  1.00000          1.00000
 43   1.59999         osd.43            up  1.00000          1.00000
[root@cc1 ~]#

and ceph health detail:

ceph health detail | grep down
HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs
peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs
undersized; recovery 176211/14148564 objects degraded (1.245%);
recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set
pg 1.60 is stuck inactive since forever, current state
down+remapped+peering, last acting [66,69,40]
pg 1.165 is stuck inactive since forever, current state
down+remapped+peering, last acting [37]
pg 1.60 is stuck unclean since forever, current state
down+remapped+peering, last acting [66,69,40]
pg 1.165 is stuck unclean since forever, current state
down+remapped+peering, last acting [37]
pg 1.165 is down+remapped+peering, acting [37]
pg 1.60 is down+remapped+peering, acting [66,69,40]


problematic pgs are 1.165 and 1.60.

Please  advice  how  to  unblock pool volumes and/or make this two pgs
working  -  in a last night and day, when we tried to solve this issue
these pgs are for 100% empty from data.




-- 
Pozdrowienia,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-23 12:48 ` Problem with query and any operation on PGs Łukasz Chrustek
@ 2017-05-23 14:17   ` Sage Weil
  2017-05-23 14:43     ` Łukasz Chrustek
       [not found]     ` <1464688590.20170523185052@tlen.pl>
  0 siblings, 2 replies; 35+ messages in thread
From: Sage Weil @ 2017-05-23 14:17 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11624 bytes --]

On Tue, 23 May 2017, Łukasz Chrustek wrote:
> Cześć,
> 
> Hello,
> 
> After terrible outage coused by failure of 10Gbit switch, ceph cluster
> went  to HEALTH_ERR (three whole storage servers go offline in the same time
> and didn't back in short time). After cluster recovery two PGs goto to
> incomplite state, I can't them query, and can't do with them anything,

The thing where you can't query a PG is because the OSD is throttling 
incoming work and the throttle is exhausted (the PG can't do work so it 
isn't making progress).  A workaround for jewel is to restart the OSD 
serving the PG and do the query quickly after that (probably in a loop so 
that you catch it after it starts up but before the throttle is 
exhausted again).  (In luminous this is fixed.)

Once you have the query output ('ceph tell $pgid query') you'll be able to 
tell what is preventing the PG from peering.

You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'.

HTH!
sage


> what   would   allow   back  working cluster back. here is strace of
> this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off:
> 
> [root@cc1 ~]# rbd ls management-vms
> os-mongodb1
> os-mongodb1-database
> os-gitlab-root
> os-mongodb1-database2
> os-wiki-root
> [root@cc1 ~]# rbd ls volumes
> ^C
> [root@cc1 ~]#
> 
> and for all mon hosts (don't put all three here)
> 
> [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms
> os-mongodb1
> os-mongodb1-database
> os-gitlab-root
> os-mongodb1-database2
> os-wiki-root
> [root@cc1 ~]# rbd -m 192.168.128.1 list volumes
> ^C
> [root@cc1 ~]#
> 
> and  all other POOLs from list, except (most important) volumes, I can
> list images.
> 
> Fanny thing, I can list rbd info for particular image:
> 
> [root@cc1 ~]# rbd info
> volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497
> rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497':
>         size 20480 MB in 1280 objects
>         order 24 (16384 kB objects)
>         block_name_prefix: rbd_data.64a21a0a9acf52
>         format: 2
>         features: layering
>         flags:
>         parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap
>         overlap: 3072 MB
> 
> but can't list the whole content of pool volumes.
> 
> [root@cc1 ~]# ceph osd pool ls
> volumes
> images
> backups
> volumes-ssd-intel-s3700
> management-vms
> .rgw.root
> .rgw.control
> .rgw
> .rgw.gc
> .log
> .users.uid
> .rgw.buckets.index
> .users
> .rgw.buckets.extra
> .rgw.buckets
> volumes-cached
> cache-ssd
> 
> here is ceph osd tree:
> 
> ID  WEIGHT    TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
>  -7  20.88388 root ssd-intel-s3700
> -11   3.19995     host ssd-stor1
>  56   0.79999         osd.56            up  1.00000          1.00000
>  57   0.79999         osd.57            up  1.00000          1.00000
>  58   0.79999         osd.58            up  1.00000          1.00000
>  59   0.79999         osd.59            up  1.00000          1.00000
>  -9   2.12999     host ssd-stor2
>  60   0.70999         osd.60            up  1.00000          1.00000
>  61   0.70999         osd.61            up  1.00000          1.00000
>  62   0.70999         osd.62            up  1.00000          1.00000
>  -8   2.12999     host ssd-stor3
>  63   0.70999         osd.63            up  1.00000          1.00000
>  64   0.70999         osd.64            up  1.00000          1.00000
>  65   0.70999         osd.65            up  1.00000          1.00000
> -10   4.19998     host ssd-stor4
>  25   0.70000         osd.25            up  1.00000          1.00000
>  26   0.70000         osd.26            up  1.00000          1.00000
>  27   0.70000         osd.27            up  1.00000          1.00000
>  28   0.70000         osd.28            up  1.00000          1.00000
>  29   0.70000         osd.29            up  1.00000          1.00000
>  24   0.70000         osd.24            up  1.00000          1.00000
> -12   3.41199     host ssd-stor5
>  73   0.85300         osd.73            up  1.00000          1.00000
>  74   0.85300         osd.74            up  1.00000          1.00000
>  75   0.85300         osd.75            up  1.00000          1.00000
>  76   0.85300         osd.76            up  1.00000          1.00000
> -13   3.41199     host ssd-stor6
>  77   0.85300         osd.77            up  1.00000          1.00000
>  78   0.85300         osd.78            up  1.00000          1.00000
>  79   0.85300         osd.79            up  1.00000          1.00000
>  80   0.85300         osd.80            up  1.00000          1.00000
> -15   2.39999     host ssd-stor7
>  90   0.79999         osd.90            up  1.00000          1.00000
>  91   0.79999         osd.91            up  1.00000          1.00000
>  92   0.79999         osd.92            up  1.00000          1.00000
>  -1 167.69969 root default
>  -2  33.99994     host stor1
>   6   3.39999         osd.6           down        0          1.00000
>   7   3.39999         osd.7             up  1.00000          1.00000
>   8   3.39999         osd.8             up  1.00000          1.00000
>   9   3.39999         osd.9             up  1.00000          1.00000
>  10   3.39999         osd.10          down        0          1.00000
>  11   3.39999         osd.11          down        0          1.00000
>  69   3.39999         osd.69            up  1.00000          1.00000
>  70   3.39999         osd.70            up  1.00000          1.00000
>  71   3.39999         osd.71          down        0          1.00000
>  81   3.39999         osd.81            up  1.00000          1.00000
>  -3  20.99991     host stor2
>  13   2.09999         osd.13            up  1.00000          1.00000
>  12   2.09999         osd.12            up  1.00000          1.00000
>  14   2.09999         osd.14            up  1.00000          1.00000
>  15   2.09999         osd.15            up  1.00000          1.00000
>  16   2.09999         osd.16            up  1.00000          1.00000
>  17   2.09999         osd.17            up  1.00000          1.00000
>  18   2.09999         osd.18          down        0          1.00000
>  19   2.09999         osd.19            up  1.00000          1.00000
>  20   2.09999         osd.20            up  1.00000          1.00000
>  21   2.09999         osd.21            up  1.00000          1.00000
>  -4  25.00000     host stor3
>  30   2.50000         osd.30            up  1.00000          1.00000
>  31   2.50000         osd.31            up  1.00000          1.00000
>  32   2.50000         osd.32            up  1.00000          1.00000
>  33   2.50000         osd.33          down        0          1.00000
>  34   2.50000         osd.34            up  1.00000          1.00000
>  35   2.50000         osd.35            up  1.00000          1.00000
>  66   2.50000         osd.66            up  1.00000          1.00000
>  67   2.50000         osd.67            up  1.00000          1.00000
>  68   2.50000         osd.68            up  1.00000          1.00000
>  72   2.50000         osd.72          down        0          1.00000
>  -5  25.00000     host stor4
>  44   2.50000         osd.44            up  1.00000          1.00000
>  45   2.50000         osd.45            up  1.00000          1.00000
>  46   2.50000         osd.46          down        0          1.00000
>  47   2.50000         osd.47            up  1.00000          1.00000
>   0   2.50000         osd.0             up  1.00000          1.00000
>   1   2.50000         osd.1             up  1.00000          1.00000
>   2   2.50000         osd.2             up  1.00000          1.00000
>   3   2.50000         osd.3             up  1.00000          1.00000
>   4   2.50000         osd.4             up  1.00000          1.00000
>   5   2.50000         osd.5             up  1.00000          1.00000
>  -6  14.19991     host stor5
>  48   1.79999         osd.48            up  1.00000          1.00000
>  49   1.59999         osd.49            up  1.00000          1.00000
>  50   1.79999         osd.50            up  1.00000          1.00000
>  51   1.79999         osd.51          down        0          1.00000
>  52   1.79999         osd.52            up  1.00000          1.00000
>  53   1.79999         osd.53            up  1.00000          1.00000
>  54   1.79999         osd.54            up  1.00000          1.00000
>  55   1.79999         osd.55            up  1.00000          1.00000
> -14  14.39999     host stor6
>  82   1.79999         osd.82            up  1.00000          1.00000
>  83   1.79999         osd.83            up  1.00000          1.00000
>  84   1.79999         osd.84            up  1.00000          1.00000
>  85   1.79999         osd.85            up  1.00000          1.00000
>  86   1.79999         osd.86            up  1.00000          1.00000
>  87   1.79999         osd.87            up  1.00000          1.00000
>  88   1.79999         osd.88            up  1.00000          1.00000
>  89   1.79999         osd.89            up  1.00000          1.00000
> -16  12.59999     host stor7
>  93   1.79999         osd.93            up  1.00000          1.00000
>  94   1.79999         osd.94            up  1.00000          1.00000
>  95   1.79999         osd.95            up  1.00000          1.00000
>  96   1.79999         osd.96            up  1.00000          1.00000
>  97   1.79999         osd.97            up  1.00000          1.00000
>  98   1.79999         osd.98            up  1.00000          1.00000
>  99   1.79999         osd.99            up  1.00000          1.00000
> -17  21.49995     host stor8
>  22   1.59999         osd.22            up  1.00000          1.00000
>  23   1.59999         osd.23            up  1.00000          1.00000
>  36   2.09999         osd.36            up  1.00000          1.00000
>  37   2.09999         osd.37            up  1.00000          1.00000
>  38   2.50000         osd.38            up  1.00000          1.00000
>  39   2.50000         osd.39            up  1.00000          1.00000
>  40   2.50000         osd.40            up  1.00000          1.00000
>  41   2.50000         osd.41          down        0          1.00000
>  42   2.50000         osd.42            up  1.00000          1.00000
>  43   1.59999         osd.43            up  1.00000          1.00000
> [root@cc1 ~]#
> 
> and ceph health detail:
> 
> ceph health detail | grep down
> HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs
> peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs
> undersized; recovery 176211/14148564 objects degraded (1.245%);
> recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set
> pg 1.60 is stuck inactive since forever, current state
> down+remapped+peering, last acting [66,69,40]
> pg 1.165 is stuck inactive since forever, current state
> down+remapped+peering, last acting [37]
> pg 1.60 is stuck unclean since forever, current state
> down+remapped+peering, last acting [66,69,40]
> pg 1.165 is stuck unclean since forever, current state
> down+remapped+peering, last acting [37]
> pg 1.165 is down+remapped+peering, acting [37]
> pg 1.60 is down+remapped+peering, acting [66,69,40]
> 
> 
> problematic pgs are 1.165 and 1.60.
> 
> Please  advice  how  to  unblock pool volumes and/or make this two pgs
> working  -  in a last night and day, when we tried to solve this issue
> these pgs are for 100% empty from data.
> 
> 
> 
> 
> -- 
> Pozdrowienia,
>  Łukasz Chrustek
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-23 14:17   ` Sage Weil
@ 2017-05-23 14:43     ` Łukasz Chrustek
       [not found]     ` <1464688590.20170523185052@tlen.pl>
  1 sibling, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-23 14:43 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Cześć,

> On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>> 
>> Hello,
>> 
>> After terrible outage coused by failure of 10Gbit switch, ceph cluster
>> went  to HEALTH_ERR (three whole storage servers go offline in the same time
>> and didn't back in short time). After cluster recovery two PGs goto to
>> incomplite state, I can't them query, and can't do with them anything,

> The thing where you can't query a PG is because the OSD is throttling 
> incoming work and the throttle is exhausted (the PG can't do work so it
> isn't making progress).  A workaround for jewel is to restart the OSD 
> serving the PG and do the query quickly after that (probably in a loop so
> that you catch it after it starts up but before the throttle is 
> exhausted again).  (In luminous this is fixed.)

Thank You for claryfication.

> Once you have the query output ('ceph tell $pgid query') you'll be able to
> tell what is preventing the PG from peering.

Hm..  what  kind of loop You sugests ? When I do ceph tell $pgid query
it hangs, not relasing to the console.

> You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'.

it is somehting strange here for 1.165, how it is posible, that acting
is 37 and it isn't in range of [84,38,48] ?:

ceph pg map 1.165
osdmap e114855 pg 1.165 (1.165) -> up [84,38,48] acting [37]

second one is ok, but also no ability to make pg query:

[root@cc1 ~]# ceph pg map 1.60
osdmap e114855 pg 1.60 (1.60) -> up [66,84,40] acting [66,69,40]


do I need to restart all three osds in the same time ?

Can   You   advice  how to unblock access to one of pool for this kind
of command:

[root@cc1 ~]# rbd ls volumes
^C

strace  for this is here: https://pastebin.com/hpbDg6gP - this time it
hangs  on  some futex function. Are this cases (pg query hang and this
rbd ls problem) are connected each other ?

If I find solution for this, You will make my day (and night :) ).


Regards
Lukasz

> HTH!
> sage


>> what   would   allow   back  working cluster back. here is strace of
>> this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off:
>> 
>> [root@cc1 ~]# rbd ls management-vms
>> os-mongodb1
>> os-mongodb1-database
>> os-gitlab-root
>> os-mongodb1-database2
>> os-wiki-root
>> [root@cc1 ~]# rbd ls volumes
>> ^C
>> [root@cc1 ~]#
>> 
>> and for all mon hosts (don't put all three here)
>> 
>> [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms
>> os-mongodb1
>> os-mongodb1-database
>> os-gitlab-root
>> os-mongodb1-database2
>> os-wiki-root
>> [root@cc1 ~]# rbd -m 192.168.128.1 list volumes
>> ^C
>> [root@cc1 ~]#
>> 
>> and  all other POOLs from list, except (most important) volumes, I can
>> list images.
>> 
>> Fanny thing, I can list rbd info for particular image:
>> 
>> [root@cc1 ~]# rbd info
>> volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497
>> rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497':
>>         size 20480 MB in 1280 objects
>>         order 24 (16384 kB objects)
>>         block_name_prefix: rbd_data.64a21a0a9acf52
>>         format: 2
>>         features: layering
>>         flags:
>>         parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap
>>         overlap: 3072 MB
>> 
>> but can't list the whole content of pool volumes.
>> 
>> [root@cc1 ~]# ceph osd pool ls
>> volumes
>> images
>> backups
>> volumes-ssd-intel-s3700
>> management-vms
>> .rgw.root
>> .rgw.control
>> .rgw
>> .rgw.gc
>> .log
>> .users.uid
>> .rgw.buckets.index
>> .users
>> .rgw.buckets.extra
>> .rgw.buckets
>> volumes-cached
>> cache-ssd
>> 
>> here is ceph osd tree:
>> 
>> ID  WEIGHT    TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>  -7  20.88388 root ssd-intel-s3700
>> -11   3.19995     host ssd-stor1
>>  56   0.79999         osd.56            up  1.00000          1.00000
>>  57   0.79999         osd.57            up  1.00000          1.00000
>>  58   0.79999         osd.58            up  1.00000          1.00000
>>  59   0.79999         osd.59            up  1.00000          1.00000
>>  -9   2.12999     host ssd-stor2
>>  60   0.70999         osd.60            up  1.00000          1.00000
>>  61   0.70999         osd.61            up  1.00000          1.00000
>>  62   0.70999         osd.62            up  1.00000          1.00000
>>  -8   2.12999     host ssd-stor3
>>  63   0.70999         osd.63            up  1.00000          1.00000
>>  64   0.70999         osd.64            up  1.00000          1.00000
>>  65   0.70999         osd.65            up  1.00000          1.00000
>> -10   4.19998     host ssd-stor4
>>  25   0.70000         osd.25            up  1.00000          1.00000
>>  26   0.70000         osd.26            up  1.00000          1.00000
>>  27   0.70000         osd.27            up  1.00000          1.00000
>>  28   0.70000         osd.28            up  1.00000          1.00000
>>  29   0.70000         osd.29            up  1.00000          1.00000
>>  24   0.70000         osd.24            up  1.00000          1.00000
>> -12   3.41199     host ssd-stor5
>>  73   0.85300         osd.73            up  1.00000          1.00000
>>  74   0.85300         osd.74            up  1.00000          1.00000
>>  75   0.85300         osd.75            up  1.00000          1.00000
>>  76   0.85300         osd.76            up  1.00000          1.00000
>> -13   3.41199     host ssd-stor6
>>  77   0.85300         osd.77            up  1.00000          1.00000
>>  78   0.85300         osd.78            up  1.00000          1.00000
>>  79   0.85300         osd.79            up  1.00000          1.00000
>>  80   0.85300         osd.80            up  1.00000          1.00000
>> -15   2.39999     host ssd-stor7
>>  90   0.79999         osd.90            up  1.00000          1.00000
>>  91   0.79999         osd.91            up  1.00000          1.00000
>>  92   0.79999         osd.92            up  1.00000          1.00000
>>  -1 167.69969 root default
>>  -2  33.99994     host stor1
>>   6   3.39999         osd.6           down        0          1.00000
>>   7   3.39999         osd.7             up  1.00000          1.00000
>>   8   3.39999         osd.8             up  1.00000          1.00000
>>   9   3.39999         osd.9             up  1.00000          1.00000
>>  10   3.39999         osd.10          down        0          1.00000
>>  11   3.39999         osd.11          down        0          1.00000
>>  69   3.39999         osd.69            up  1.00000          1.00000
>>  70   3.39999         osd.70            up  1.00000          1.00000
>>  71   3.39999         osd.71          down        0          1.00000
>>  81   3.39999         osd.81            up  1.00000          1.00000
>>  -3  20.99991     host stor2
>>  13   2.09999         osd.13            up  1.00000          1.00000
>>  12   2.09999         osd.12            up  1.00000          1.00000
>>  14   2.09999         osd.14            up  1.00000          1.00000
>>  15   2.09999         osd.15            up  1.00000          1.00000
>>  16   2.09999         osd.16            up  1.00000          1.00000
>>  17   2.09999         osd.17            up  1.00000          1.00000
>>  18   2.09999         osd.18          down        0          1.00000
>>  19   2.09999         osd.19            up  1.00000          1.00000
>>  20   2.09999         osd.20            up  1.00000          1.00000
>>  21   2.09999         osd.21            up  1.00000          1.00000
>>  -4  25.00000     host stor3
>>  30   2.50000         osd.30            up  1.00000          1.00000
>>  31   2.50000         osd.31            up  1.00000          1.00000
>>  32   2.50000         osd.32            up  1.00000          1.00000
>>  33   2.50000         osd.33          down        0          1.00000
>>  34   2.50000         osd.34            up  1.00000          1.00000
>>  35   2.50000         osd.35            up  1.00000          1.00000
>>  66   2.50000         osd.66            up  1.00000          1.00000
>>  67   2.50000         osd.67            up  1.00000          1.00000
>>  68   2.50000         osd.68            up  1.00000          1.00000
>>  72   2.50000         osd.72          down        0          1.00000
>>  -5  25.00000     host stor4
>>  44   2.50000         osd.44            up  1.00000          1.00000
>>  45   2.50000         osd.45            up  1.00000          1.00000
>>  46   2.50000         osd.46          down        0          1.00000
>>  47   2.50000         osd.47            up  1.00000          1.00000
>>   0   2.50000         osd.0             up  1.00000          1.00000
>>   1   2.50000         osd.1             up  1.00000          1.00000
>>   2   2.50000         osd.2             up  1.00000          1.00000
>>   3   2.50000         osd.3             up  1.00000          1.00000
>>   4   2.50000         osd.4             up  1.00000          1.00000
>>   5   2.50000         osd.5             up  1.00000          1.00000
>>  -6  14.19991     host stor5
>>  48   1.79999         osd.48            up  1.00000          1.00000
>>  49   1.59999         osd.49            up  1.00000          1.00000
>>  50   1.79999         osd.50            up  1.00000          1.00000
>>  51   1.79999         osd.51          down        0          1.00000
>>  52   1.79999         osd.52            up  1.00000          1.00000
>>  53   1.79999         osd.53            up  1.00000          1.00000
>>  54   1.79999         osd.54            up  1.00000          1.00000
>>  55   1.79999         osd.55            up  1.00000          1.00000
>> -14  14.39999     host stor6
>>  82   1.79999         osd.82            up  1.00000          1.00000
>>  83   1.79999         osd.83            up  1.00000          1.00000
>>  84   1.79999         osd.84            up  1.00000          1.00000
>>  85   1.79999         osd.85            up  1.00000          1.00000
>>  86   1.79999         osd.86            up  1.00000          1.00000
>>  87   1.79999         osd.87            up  1.00000          1.00000
>>  88   1.79999         osd.88            up  1.00000          1.00000
>>  89   1.79999         osd.89            up  1.00000          1.00000
>> -16  12.59999     host stor7
>>  93   1.79999         osd.93            up  1.00000          1.00000
>>  94   1.79999         osd.94            up  1.00000          1.00000
>>  95   1.79999         osd.95            up  1.00000          1.00000
>>  96   1.79999         osd.96            up  1.00000          1.00000
>>  97   1.79999         osd.97            up  1.00000          1.00000
>>  98   1.79999         osd.98            up  1.00000          1.00000
>>  99   1.79999         osd.99            up  1.00000          1.00000
>> -17  21.49995     host stor8
>>  22   1.59999         osd.22            up  1.00000          1.00000
>>  23   1.59999         osd.23            up  1.00000          1.00000
>>  36   2.09999         osd.36            up  1.00000          1.00000
>>  37   2.09999         osd.37            up  1.00000          1.00000
>>  38   2.50000         osd.38            up  1.00000          1.00000
>>  39   2.50000         osd.39            up  1.00000          1.00000
>>  40   2.50000         osd.40            up  1.00000          1.00000
>>  41   2.50000         osd.41          down        0          1.00000
>>  42   2.50000         osd.42            up  1.00000          1.00000
>>  43   1.59999         osd.43            up  1.00000          1.00000
>> [root@cc1 ~]#
>> 
>> and ceph health detail:
>> 
>> ceph health detail | grep down
>> HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs
>> peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs
>> undersized; recovery 176211/14148564 objects degraded (1.245%);
>> recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set
>> pg 1.60 is stuck inactive since forever, current state
>> down+remapped+peering, last acting [66,69,40]
>> pg 1.165 is stuck inactive since forever, current state
>> down+remapped+peering, last acting [37]
>> pg 1.60 is stuck unclean since forever, current state
>> down+remapped+peering, last acting [66,69,40]
>> pg 1.165 is stuck unclean since forever, current state
>> down+remapped+peering, last acting [37]
>> pg 1.165 is down+remapped+peering, acting [37]
>> pg 1.60 is down+remapped+peering, acting [66,69,40]
>> 
>> 
>> problematic pgs are 1.165 and 1.60.
>> 
>> Please  advice  how  to  unblock pool volumes and/or make this two pgs
>> working  -  in a last night and day, when we tried to solve this issue
>> these pgs are for 100% empty from data.
>> 
>> 
>> 
>> 
>> -- 
>> Pozdrowienia,
>>  Łukasz Chrustek
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 



-- 
Pozdrowienia,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
       [not found]     ` <1464688590.20170523185052@tlen.pl>
@ 2017-05-23 17:40       ` Sage Weil
  2017-05-23 21:43         ` Łukasz Chrustek
  0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-23 17:40 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 13855 bytes --]

On Tue, 23 May 2017, Łukasz Chrustek wrote:
> I'm  not  sleeping for over 30 hours, and still can't find solution. I
> did,      as      You      wrote,     but     turning     off     this
> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...

The important bit is:

            "blocked": "peering is blocked due to down osds",
            "down_osds_we_would_probe": [
                6,
                10,
                33,
                37,
                72
            ],
            "peering_blocked_by": [
                {
                    "osd": 6,
                    "current_lost_at": 0,
                    "comment": "starting or marking this osd lost may let 
us proceed"
                },
                {
                    "osd": 10,
                    "current_lost_at": 0,
                    "comment": "starting or marking this osd lost may let 
us proceed"
                },
                {
                    "osd": 37,
                    "current_lost_at": 0,
                    "comment": "starting or marking this osd lost may let 
us proceed"
                },
                {
                    "osd": 72,
                    "current_lost_at": 113771,
                    "comment": "starting or marking this osd lost may let 
us proceed"
                }
            ]
        },

Are any of those OSDs startable?

sage


> 
> Regards
> Lukasz Chrustek
> 
> 
> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >> 
> >> Hello,
> >> 
> >> After terrible outage coused by failure of 10Gbit switch, ceph cluster
> >> went  to HEALTH_ERR (three whole storage servers go offline in the same time
> >> and didn't back in short time). After cluster recovery two PGs goto to
> >> incomplite state, I can't them query, and can't do with them anything,
> 
> > The thing where you can't query a PG is because the OSD is throttling 
> > incoming work and the throttle is exhausted (the PG can't do work so it
> > isn't making progress).  A workaround for jewel is to restart the OSD 
> > serving the PG and do the query quickly after that (probably in a loop so
> > that you catch it after it starts up but before the throttle is 
> > exhausted again).  (In luminous this is fixed.)
> 
> > Once you have the query output ('ceph tell $pgid query') you'll be able to
> > tell what is preventing the PG from peering.
> 
> > You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'.
> 
> > HTH!
> > sage
> 
> 
> >> what   would   allow   back  working cluster back. here is strace of
> >> this command: https://pastebin.com/HpNFvR8Z. But... this cluster isn't enteriely off:
> >> 
> >> [root@cc1 ~]# rbd ls management-vms
> >> os-mongodb1
> >> os-mongodb1-database
> >> os-gitlab-root
> >> os-mongodb1-database2
> >> os-wiki-root
> >> [root@cc1 ~]# rbd ls volumes
> >> ^C
> >> [root@cc1 ~]#
> >> 
> >> and for all mon hosts (don't put all three here)
> >> 
> >> [root@cc1 ~]# rbd -m 192.168.128.1 list management-vms
> >> os-mongodb1
> >> os-mongodb1-database
> >> os-gitlab-root
> >> os-mongodb1-database2
> >> os-wiki-root
> >> [root@cc1 ~]# rbd -m 192.168.128.1 list volumes
> >> ^C
> >> [root@cc1 ~]#
> >> 
> >> and  all other POOLs from list, except (most important) volumes, I can
> >> list images.
> >> 
> >> Fanny thing, I can list rbd info for particular image:
> >> 
> >> [root@cc1 ~]# rbd info
> >> volumes/volume-197602d7-40f9-40ad-b286-cdec688b1497
> >> rbd image 'volume-197602d7-40f9-40ad-b286-cdec688b1497':
> >>         size 20480 MB in 1280 objects
> >>         order 24 (16384 kB objects)
> >>         block_name_prefix: rbd_data.64a21a0a9acf52
> >>         format: 2
> >>         features: layering
> >>         flags:
> >>         parent: images/37bdf0ca-f1f3-46ce-95b9-c04bb9ac8a53@snap
> >>         overlap: 3072 MB
> >> 
> >> but can't list the whole content of pool volumes.
> >> 
> >> [root@cc1 ~]# ceph osd pool ls
> >> volumes
> >> images
> >> backups
> >> volumes-ssd-intel-s3700
> >> management-vms
> >> .rgw.root
> >> .rgw.control
> >> .rgw
> >> .rgw.gc
> >> .log
> >> .users.uid
> >> .rgw.buckets.index
> >> .users
> >> .rgw.buckets.extra
> >> .rgw.buckets
> >> volumes-cached
> >> cache-ssd
> >> 
> >> here is ceph osd tree:
> >> 
> >> ID  WEIGHT    TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
> >>  -7  20.88388 root ssd-intel-s3700
> >> -11   3.19995     host ssd-stor1
> >>  56   0.79999         osd.56            up  1.00000          1.00000
> >>  57   0.79999         osd.57            up  1.00000          1.00000
> >>  58   0.79999         osd.58            up  1.00000          1.00000
> >>  59   0.79999         osd.59            up  1.00000          1.00000
> >>  -9   2.12999     host ssd-stor2
> >>  60   0.70999         osd.60            up  1.00000          1.00000
> >>  61   0.70999         osd.61            up  1.00000          1.00000
> >>  62   0.70999         osd.62            up  1.00000          1.00000
> >>  -8   2.12999     host ssd-stor3
> >>  63   0.70999         osd.63            up  1.00000          1.00000
> >>  64   0.70999         osd.64            up  1.00000          1.00000
> >>  65   0.70999         osd.65            up  1.00000          1.00000
> >> -10   4.19998     host ssd-stor4
> >>  25   0.70000         osd.25            up  1.00000          1.00000
> >>  26   0.70000         osd.26            up  1.00000          1.00000
> >>  27   0.70000         osd.27            up  1.00000          1.00000
> >>  28   0.70000         osd.28            up  1.00000          1.00000
> >>  29   0.70000         osd.29            up  1.00000          1.00000
> >>  24   0.70000         osd.24            up  1.00000          1.00000
> >> -12   3.41199     host ssd-stor5
> >>  73   0.85300         osd.73            up  1.00000          1.00000
> >>  74   0.85300         osd.74            up  1.00000          1.00000
> >>  75   0.85300         osd.75            up  1.00000          1.00000
> >>  76   0.85300         osd.76            up  1.00000          1.00000
> >> -13   3.41199     host ssd-stor6
> >>  77   0.85300         osd.77            up  1.00000          1.00000
> >>  78   0.85300         osd.78            up  1.00000          1.00000
> >>  79   0.85300         osd.79            up  1.00000          1.00000
> >>  80   0.85300         osd.80            up  1.00000          1.00000
> >> -15   2.39999     host ssd-stor7
> >>  90   0.79999         osd.90            up  1.00000          1.00000
> >>  91   0.79999         osd.91            up  1.00000          1.00000
> >>  92   0.79999         osd.92            up  1.00000          1.00000
> >>  -1 167.69969 root default
> >>  -2  33.99994     host stor1
> >>   6   3.39999         osd.6           down        0          1.00000
> >>   7   3.39999         osd.7             up  1.00000          1.00000
> >>   8   3.39999         osd.8             up  1.00000          1.00000
> >>   9   3.39999         osd.9             up  1.00000          1.00000
> >>  10   3.39999         osd.10          down        0          1.00000
> >>  11   3.39999         osd.11          down        0          1.00000
> >>  69   3.39999         osd.69            up  1.00000          1.00000
> >>  70   3.39999         osd.70            up  1.00000          1.00000
> >>  71   3.39999         osd.71          down        0          1.00000
> >>  81   3.39999         osd.81            up  1.00000          1.00000
> >>  -3  20.99991     host stor2
> >>  13   2.09999         osd.13            up  1.00000          1.00000
> >>  12   2.09999         osd.12            up  1.00000          1.00000
> >>  14   2.09999         osd.14            up  1.00000          1.00000
> >>  15   2.09999         osd.15            up  1.00000          1.00000
> >>  16   2.09999         osd.16            up  1.00000          1.00000
> >>  17   2.09999         osd.17            up  1.00000          1.00000
> >>  18   2.09999         osd.18          down        0          1.00000
> >>  19   2.09999         osd.19            up  1.00000          1.00000
> >>  20   2.09999         osd.20            up  1.00000          1.00000
> >>  21   2.09999         osd.21            up  1.00000          1.00000
> >>  -4  25.00000     host stor3
> >>  30   2.50000         osd.30            up  1.00000          1.00000
> >>  31   2.50000         osd.31            up  1.00000          1.00000
> >>  32   2.50000         osd.32            up  1.00000          1.00000
> >>  33   2.50000         osd.33          down        0          1.00000
> >>  34   2.50000         osd.34            up  1.00000          1.00000
> >>  35   2.50000         osd.35            up  1.00000          1.00000
> >>  66   2.50000         osd.66            up  1.00000          1.00000
> >>  67   2.50000         osd.67            up  1.00000          1.00000
> >>  68   2.50000         osd.68            up  1.00000          1.00000
> >>  72   2.50000         osd.72          down        0          1.00000
> >>  -5  25.00000     host stor4
> >>  44   2.50000         osd.44            up  1.00000          1.00000
> >>  45   2.50000         osd.45            up  1.00000          1.00000
> >>  46   2.50000         osd.46          down        0          1.00000
> >>  47   2.50000         osd.47            up  1.00000          1.00000
> >>   0   2.50000         osd.0             up  1.00000          1.00000
> >>   1   2.50000         osd.1             up  1.00000          1.00000
> >>   2   2.50000         osd.2             up  1.00000          1.00000
> >>   3   2.50000         osd.3             up  1.00000          1.00000
> >>   4   2.50000         osd.4             up  1.00000          1.00000
> >>   5   2.50000         osd.5             up  1.00000          1.00000
> >>  -6  14.19991     host stor5
> >>  48   1.79999         osd.48            up  1.00000          1.00000
> >>  49   1.59999         osd.49            up  1.00000          1.00000
> >>  50   1.79999         osd.50            up  1.00000          1.00000
> >>  51   1.79999         osd.51          down        0          1.00000
> >>  52   1.79999         osd.52            up  1.00000          1.00000
> >>  53   1.79999         osd.53            up  1.00000          1.00000
> >>  54   1.79999         osd.54            up  1.00000          1.00000
> >>  55   1.79999         osd.55            up  1.00000          1.00000
> >> -14  14.39999     host stor6
> >>  82   1.79999         osd.82            up  1.00000          1.00000
> >>  83   1.79999         osd.83            up  1.00000          1.00000
> >>  84   1.79999         osd.84            up  1.00000          1.00000
> >>  85   1.79999         osd.85            up  1.00000          1.00000
> >>  86   1.79999         osd.86            up  1.00000          1.00000
> >>  87   1.79999         osd.87            up  1.00000          1.00000
> >>  88   1.79999         osd.88            up  1.00000          1.00000
> >>  89   1.79999         osd.89            up  1.00000          1.00000
> >> -16  12.59999     host stor7
> >>  93   1.79999         osd.93            up  1.00000          1.00000
> >>  94   1.79999         osd.94            up  1.00000          1.00000
> >>  95   1.79999         osd.95            up  1.00000          1.00000
> >>  96   1.79999         osd.96            up  1.00000          1.00000
> >>  97   1.79999         osd.97            up  1.00000          1.00000
> >>  98   1.79999         osd.98            up  1.00000          1.00000
> >>  99   1.79999         osd.99            up  1.00000          1.00000
> >> -17  21.49995     host stor8
> >>  22   1.59999         osd.22            up  1.00000          1.00000
> >>  23   1.59999         osd.23            up  1.00000          1.00000
> >>  36   2.09999         osd.36            up  1.00000          1.00000
> >>  37   2.09999         osd.37            up  1.00000          1.00000
> >>  38   2.50000         osd.38            up  1.00000          1.00000
> >>  39   2.50000         osd.39            up  1.00000          1.00000
> >>  40   2.50000         osd.40            up  1.00000          1.00000
> >>  41   2.50000         osd.41          down        0          1.00000
> >>  42   2.50000         osd.42            up  1.00000          1.00000
> >>  43   1.59999         osd.43            up  1.00000          1.00000
> >> [root@cc1 ~]#
> >> 
> >> and ceph health detail:
> >> 
> >> ceph health detail | grep down
> >> HEALTH_WARN 23 pgs backfilling; 23 pgs degraded; 2 pgs down; 2 pgs
> >> peering; 2 pgs stuck inactive; 25 pgs stuck unclean; 23 pgs
> >> undersized; recovery 176211/14148564 objects degraded (1.245%);
> >> recovery 238972/14148564 objects misplaced (1.689%); noout flag(s) set
> >> pg 1.60 is stuck inactive since forever, current state
> >> down+remapped+peering, last acting [66,69,40]
> >> pg 1.165 is stuck inactive since forever, current state
> >> down+remapped+peering, last acting [37]
> >> pg 1.60 is stuck unclean since forever, current state
> >> down+remapped+peering, last acting [66,69,40]
> >> pg 1.165 is stuck unclean since forever, current state
> >> down+remapped+peering, last acting [37]
> >> pg 1.165 is down+remapped+peering, acting [37]
> >> pg 1.60 is down+remapped+peering, acting [66,69,40]
> >> 
> >> 
> >> problematic pgs are 1.165 and 1.60.
> >> 
> >> Please  advice  how  to  unblock pool volumes and/or make this two pgs
> >> working  -  in a last night and day, when we tried to solve this issue
> >> these pgs are for 100% empty from data.
> >> 
> >> 
> >> 
> >> 
> >> -- 
> >> Pozdrowienia,
> >>  Łukasz Chrustek
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> 
> >> 
> 
> 
> 
> -- 
> Pozdrowienia,
>  Łukasz Chrustek
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-23 17:40       ` Sage Weil
@ 2017-05-23 21:43         ` Łukasz Chrustek
  2017-05-23 21:48           ` Sage Weil
  0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-23 21:43 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Cześć,

> On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> I'm  not  sleeping for over 30 hours, and still can't find solution. I
>> did,      as      You      wrote,     but     turning     off     this
>> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...

> The important bit is:

>             "blocked": "peering is blocked due to down osds",
>             "down_osds_we_would_probe": [
>                 6,
>                 10,
>                 33,
>                 37,
>                 72
>             ],
>             "peering_blocked_by": [
>                 {
>                     "osd": 6,
>                     "current_lost_at": 0,
>                     "comment": "starting or marking this osd lost may let
> us proceed"
>                 },
>                 {
>                     "osd": 10,
>                     "current_lost_at": 0,
>                     "comment": "starting or marking this osd lost may let
> us proceed"
>                 },
>                 {
>                     "osd": 37,
>                     "current_lost_at": 0,
>                     "comment": "starting or marking this osd lost may let
> us proceed"
>                 },
>                 {
>                     "osd": 72,
>                     "current_lost_at": 113771,
>                     "comment": "starting or marking this osd lost may let
> us proceed"
>                 }
>             ]
>         },

> Are any of those OSDs startable?

They were all up and running - but I decided to shut them down and out
them  from  ceph, now it looks like ceph working ok, but still two PGs
are in down state, how to get rid of it ?

ceph health detail
HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
pg 1.165 is stuck inactive since forever, current state down+remapped+peering, last acting [38,48]
pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
pg 1.60 is down+remapped+peering, acting [66,40]
pg 1.165 is down+remapped+peering, acting [38,48]
[root@cc1 ~]# ceph -s
    cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
     health HEALTH_WARN
            2 pgs down
            2 pgs peering
            2 pgs stuck inactive
     monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
            election epoch 872, quorum 0,1,2 cc1,cc2,cc3
     osdmap e115175: 100 osds: 88 up, 86 in; 2 remapped pgs
      pgmap v67583069: 3520 pgs, 17 pools, 26675 GB data, 4849 kobjects
            76638 GB used, 107 TB / 182 TB avail
                3515 active+clean
                   3 active+clean+scrubbing+deep
                   2 down+remapped+peering
  client io 0 B/s rd, 869 kB/s wr, 14 op/s rd, 113 op/s wr

-- 
Regards
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-23 21:43         ` Łukasz Chrustek
@ 2017-05-23 21:48           ` Sage Weil
  2017-05-24 13:19             ` Łukasz Chrustek
  0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-23 21:48 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3462 bytes --]

On Tue, 23 May 2017, Łukasz Chrustek wrote:
> Cześć,
> 
> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
> >> did,      as      You      wrote,     but     turning     off     this
> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
> 
> > The important bit is:
> 
> >             "blocked": "peering is blocked due to down osds",
> >             "down_osds_we_would_probe": [
> >                 6,
> >                 10,
> >                 33,
> >                 37,
> >                 72
> >             ],
> >             "peering_blocked_by": [
> >                 {
> >                     "osd": 6,
> >                     "current_lost_at": 0,
> >                     "comment": "starting or marking this osd lost may let
> > us proceed"
> >                 },
> >                 {
> >                     "osd": 10,
> >                     "current_lost_at": 0,
> >                     "comment": "starting or marking this osd lost may let
> > us proceed"
> >                 },
> >                 {
> >                     "osd": 37,
> >                     "current_lost_at": 0,
> >                     "comment": "starting or marking this osd lost may let
> > us proceed"
> >                 },
> >                 {
> >                     "osd": 72,
> >                     "current_lost_at": 113771,
> >                     "comment": "starting or marking this osd lost may let
> > us proceed"
> >                 }
> >             ]
> >         },
> 
> > Are any of those OSDs startable?
> 
> They were all up and running - but I decided to shut them down and out
> them  from  ceph, now it looks like ceph working ok, but still two PGs
> are in down state, how to get rid of it ?

If you haven't deleted the data, you should start the OSDs back up.

If they are partially damanged you can use ceph-objectstore-tool to 
extract just the PGs in question to make sure you haven't lost anything, 
inject them on some other OSD(s) and restart those, and *then* mark the 
bad OSDs as 'lost'.

If all else fails, you can just mark those OSDs 'lost', but in doing so 
you might be telling the cluster to lose data.

The best thing to do is definitely to get those OSDs started again.

sage


> 
> ceph health detail
> HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
> pg 1.165 is stuck inactive since forever, current state down+remapped+peering, last acting [38,48]
> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
> pg 1.60 is down+remapped+peering, acting [66,40]
> pg 1.165 is down+remapped+peering, acting [38,48]
> [root@cc1 ~]# ceph -s
>     cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
>      health HEALTH_WARN
>             2 pgs down
>             2 pgs peering
>             2 pgs stuck inactive
>      monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
>             election epoch 872, quorum 0,1,2 cc1,cc2,cc3
>      osdmap e115175: 100 osds: 88 up, 86 in; 2 remapped pgs
>       pgmap v67583069: 3520 pgs, 17 pools, 26675 GB data, 4849 kobjects
>             76638 GB used, 107 TB / 182 TB avail
>                 3515 active+clean
>                    3 active+clean+scrubbing+deep
>                    2 down+remapped+peering
>   client io 0 B/s rd, 869 kB/s wr, 14 op/s rd, 113 op/s wr
> 
> -- 
> Regards
>  Łukasz Chrustek
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-23 21:48           ` Sage Weil
@ 2017-05-24 13:19             ` Łukasz Chrustek
  2017-05-24 13:37               ` Sage Weil
  0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 13:19 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Cześć,

> On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>> 
>> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
>> >> did,      as      You      wrote,     but     turning     off     this
>> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>> 
>> > The important bit is:
>> 
>> >             "blocked": "peering is blocked due to down osds",
>> >             "down_osds_we_would_probe": [
>> >                 6,
>> >                 10,
>> >                 33,
>> >                 37,
>> >                 72
>> >             ],
>> >             "peering_blocked_by": [
>> >                 {
>> >                     "osd": 6,
>> >                     "current_lost_at": 0,
>> >                     "comment": "starting or marking this osd lost may let
>> > us proceed"
>> >                 },
>> >                 {
>> >                     "osd": 10,
>> >                     "current_lost_at": 0,
>> >                     "comment": "starting or marking this osd lost may let
>> > us proceed"
>> >                 },
>> >                 {
>> >                     "osd": 37,
>> >                     "current_lost_at": 0,
>> >                     "comment": "starting or marking this osd lost may let
>> > us proceed"
>> >                 },
>> >                 {
>> >                     "osd": 72,
>> >                     "current_lost_at": 113771,
>> >                     "comment": "starting or marking this osd lost may let
>> > us proceed"
>> >                 }
>> >             ]
>> >         },
>> 
>> > Are any of those OSDs startable?
>> 
>> They were all up and running - but I decided to shut them down and out
>> them  from  ceph, now it looks like ceph working ok, but still two PGs
>> are in down state, how to get rid of it ?

> If you haven't deleted the data, you should start the OSDs back up.

> If they are partially damanged you can use ceph-objectstore-tool to 
> extract just the PGs in question to make sure you haven't lost anything,
> inject them on some other OSD(s) and restart those, and *then* mark the
> bad OSDs as 'lost'.

> If all else fails, you can just mark those OSDs 'lost', but in doing so
> you might be telling the cluster to lose data.

> The best thing to do is definitely to get those OSDs started again.

Now situation looks like this:

[root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
        size 500 GB in 128000 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.ed9d394a851426
        format: 2
        features: layering
        flags:

[root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
(output cutted)
rbd_data.ed9d394a851426.000000000000447c
rbd_data.ed9d394a851426.0000000000010857
rbd_data.ed9d394a851426.000000000000ec8b
rbd_data.ed9d394a851426.000000000000fa43
rbd_data.ed9d394a851426.000000000001ef2d
^C

it hangs on this object and isn't going further. rbd cp also hangs...
rbd map - also...

can  You advice what can be solution for this case ?


-- 
Regards,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 13:19             ` Łukasz Chrustek
@ 2017-05-24 13:37               ` Sage Weil
  2017-05-24 13:58                 ` Łukasz Chrustek
  0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-24 13:37 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3882 bytes --]

On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Cześć,
> 
> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >> 
> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
> >> >> did,      as      You      wrote,     but     turning     off     this
> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
> >> 
> >> > The important bit is:
> >> 
> >> >             "blocked": "peering is blocked due to down osds",
> >> >             "down_osds_we_would_probe": [
> >> >                 6,
> >> >                 10,
> >> >                 33,
> >> >                 37,
> >> >                 72
> >> >             ],
> >> >             "peering_blocked_by": [
> >> >                 {
> >> >                     "osd": 6,
> >> >                     "current_lost_at": 0,
> >> >                     "comment": "starting or marking this osd lost may let
> >> > us proceed"
> >> >                 },
> >> >                 {
> >> >                     "osd": 10,
> >> >                     "current_lost_at": 0,
> >> >                     "comment": "starting or marking this osd lost may let
> >> > us proceed"
> >> >                 },
> >> >                 {
> >> >                     "osd": 37,
> >> >                     "current_lost_at": 0,
> >> >                     "comment": "starting or marking this osd lost may let
> >> > us proceed"
> >> >                 },
> >> >                 {
> >> >                     "osd": 72,
> >> >                     "current_lost_at": 113771,
> >> >                     "comment": "starting or marking this osd lost may let
> >> > us proceed"
> >> >                 }
> >> >             ]
> >> >         },
> >> 
> >> > Are any of those OSDs startable?
> >> 
> >> They were all up and running - but I decided to shut them down and out
> >> them  from  ceph, now it looks like ceph working ok, but still two PGs
> >> are in down state, how to get rid of it ?
> 
> > If you haven't deleted the data, you should start the OSDs back up.
> 
> > If they are partially damanged you can use ceph-objectstore-tool to 
> > extract just the PGs in question to make sure you haven't lost anything,
> > inject them on some other OSD(s) and restart those, and *then* mark the
> > bad OSDs as 'lost'.
> 
> > If all else fails, you can just mark those OSDs 'lost', but in doing so
> > you might be telling the cluster to lose data.
> 
> > The best thing to do is definitely to get those OSDs started again.
> 
> Now situation looks like this:
> 
> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
>         size 500 GB in 128000 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rbd_data.ed9d394a851426
>         format: 2
>         features: layering
>         flags:
> 
> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
> (output cutted)
> rbd_data.ed9d394a851426.000000000000447c
> rbd_data.ed9d394a851426.0000000000010857
> rbd_data.ed9d394a851426.000000000000ec8b
> rbd_data.ed9d394a851426.000000000000fa43
> rbd_data.ed9d394a851426.000000000001ef2d
> ^C
> 
> it hangs on this object and isn't going further. rbd cp also hangs...
> rbd map - also...
> 
> can  You advice what can be solution for this case ?

The hang is due to OSD throttling (see my first reply for how to wrok 
around that and get a pg query).  But you already did that and the cluster 
told you which OSDs it needs to see up in order for it to peer and 
recover.  If you haven't destroyed those disks, you should start those 
osds and it shoudl be fine.  If you've destroyed the data or the disks are 
truly broken and dead, then you can mark those OSDs lost and the cluster 
*maybe* recover (but hard to say given the information you've shared).

sage

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 13:37               ` Sage Weil
@ 2017-05-24 13:58                 ` Łukasz Chrustek
  2017-05-24 14:02                   ` Sage Weil
  0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 13:58 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Cześć,

> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>> 
>> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >> 
>> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
>> >> >> did,      as      You      wrote,     but     turning     off     this
>> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>> >> 
>> >> > The important bit is:
>> >> 
>> >> >             "blocked": "peering is blocked due to down osds",
>> >> >             "down_osds_we_would_probe": [
>> >> >                 6,
>> >> >                 10,
>> >> >                 33,
>> >> >                 37,
>> >> >                 72
>> >> >             ],
>> >> >             "peering_blocked_by": [
>> >> >                 {
>> >> >                     "osd": 6,
>> >> >                     "current_lost_at": 0,
>> >> >                     "comment": "starting or marking this osd lost may let
>> >> > us proceed"
>> >> >                 },
>> >> >                 {
>> >> >                     "osd": 10,
>> >> >                     "current_lost_at": 0,
>> >> >                     "comment": "starting or marking this osd lost may let
>> >> > us proceed"
>> >> >                 },
>> >> >                 {
>> >> >                     "osd": 37,
>> >> >                     "current_lost_at": 0,
>> >> >                     "comment": "starting or marking this osd lost may let
>> >> > us proceed"
>> >> >                 },
>> >> >                 {
>> >> >                     "osd": 72,
>> >> >                     "current_lost_at": 113771,
>> >> >                     "comment": "starting or marking this osd lost may let
>> >> > us proceed"
>> >> >                 }
>> >> >             ]
>> >> >         },
>> >> 
>> >> > Are any of those OSDs startable?
>> >> 
>> >> They were all up and running - but I decided to shut them down and out
>> >> them  from  ceph, now it looks like ceph working ok, but still two PGs
>> >> are in down state, how to get rid of it ?
>> 
>> > If you haven't deleted the data, you should start the OSDs back up.
>> 
>> > If they are partially damanged you can use ceph-objectstore-tool to 
>> > extract just the PGs in question to make sure you haven't lost anything,
>> > inject them on some other OSD(s) and restart those, and *then* mark the
>> > bad OSDs as 'lost'.
>> 
>> > If all else fails, you can just mark those OSDs 'lost', but in doing so
>> > you might be telling the cluster to lose data.
>> 
>> > The best thing to do is definitely to get those OSDs started again.
>> 
>> Now situation looks like this:
>> 
>> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
>> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
>>         size 500 GB in 128000 objects
>>         order 22 (4096 kB objects)
>>         block_name_prefix: rbd_data.ed9d394a851426
>>         format: 2
>>         features: layering
>>         flags:
>> 
>> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
>> (output cutted)
>> rbd_data.ed9d394a851426.000000000000447c
>> rbd_data.ed9d394a851426.0000000000010857
>> rbd_data.ed9d394a851426.000000000000ec8b
>> rbd_data.ed9d394a851426.000000000000fa43
>> rbd_data.ed9d394a851426.000000000001ef2d
>> ^C
>> 
>> it hangs on this object and isn't going further. rbd cp also hangs...
>> rbd map - also...
>> 
>> can  You advice what can be solution for this case ?

> The hang is due to OSD throttling (see my first reply for how to wrok 
> around that and get a pg query).  But you already did that and the cluster
> told you which OSDs it needs to see up in order for it to peer and 
> recover.  If you haven't destroyed those disks, you should start those
> osds and it shoudl be fine.  If you've destroyed the data or the disks are
> truly broken and dead, then you can mark those OSDs lost and the cluster
> *maybe* recover (but hard to say given the information you've shared).

> sage

What information I can bring to You to say it is recoverable ?

here are ceph -s and ceph health detail:

[root@cc1 ~]# ceph -s
    cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
     health HEALTH_WARN
            2 pgs down
            2 pgs peering
            2 pgs stuck inactive
     monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
            election epoch 872, quorum 0,1,2 cc1,cc2,cc3
     osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs
      pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects
            76705 GB used, 107 TB / 182 TB avail
                4030 active+clean
                   1 down+remapped+peering
                   1 down+peering
  client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr
[root@cc1 ~]# ceph health detail
HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48]
pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
pg 1.60 is down+remapped+peering, acting [66,40]
pg 1.165 is down+peering, acting [67,88,48]
[root@cc1 ~]#

-- 
Regards,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 13:58                 ` Łukasz Chrustek
@ 2017-05-24 14:02                   ` Sage Weil
  2017-05-24 14:18                     ` Łukasz Chrustek
  0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-24 14:02 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5806 bytes --]

On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Cześć,
> 
> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >> 
> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> Cześć,
> >> >> 
> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
> >> >> >> did,      as      You      wrote,     but     turning     off     this
> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
> >> >> 
> >> >> > The important bit is:
> >> >> 
> >> >> >             "blocked": "peering is blocked due to down osds",
> >> >> >             "down_osds_we_would_probe": [
> >> >> >                 6,
> >> >> >                 10,
> >> >> >                 33,
> >> >> >                 37,
> >> >> >                 72
> >> >> >             ],
> >> >> >             "peering_blocked_by": [
> >> >> >                 {
> >> >> >                     "osd": 6,
> >> >> >                     "current_lost_at": 0,
> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> > us proceed"
> >> >> >                 },
> >> >> >                 {
> >> >> >                     "osd": 10,
> >> >> >                     "current_lost_at": 0,
> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> > us proceed"
> >> >> >                 },
> >> >> >                 {
> >> >> >                     "osd": 37,
> >> >> >                     "current_lost_at": 0,
> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> > us proceed"
> >> >> >                 },
> >> >> >                 {
> >> >> >                     "osd": 72,
> >> >> >                     "current_lost_at": 113771,
> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> > us proceed"

These are the osds (6, 10, 37, 72).

> >> >> >                 }
> >> >> >             ]
> >> >> >         },
> >> >> 
> >> >> > Are any of those OSDs startable?

This

> >> >> 
> >> >> They were all up and running - but I decided to shut them down and out
> >> >> them  from  ceph, now it looks like ceph working ok, but still two PGs
> >> >> are in down state, how to get rid of it ?
> >> 
> >> > If you haven't deleted the data, you should start the OSDs back up.

This

> >> 
> >> > If they are partially damanged you can use ceph-objectstore-tool to 
> >> > extract just the PGs in question to make sure you haven't lost anything,
> >> > inject them on some other OSD(s) and restart those, and *then* mark the
> >> > bad OSDs as 'lost'.
> >> 
> >> > If all else fails, you can just mark those OSDs 'lost', but in doing so
> >> > you might be telling the cluster to lose data.
> >> 
> >> > The best thing to do is definitely to get those OSDs started again.

This

> >> 
> >> Now situation looks like this:
> >> 
> >> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
> >> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
> >>         size 500 GB in 128000 objects
> >>         order 22 (4096 kB objects)
> >>         block_name_prefix: rbd_data.ed9d394a851426
> >>         format: 2
> >>         features: layering
> >>         flags:
> >> 
> >> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
> >> (output cutted)
> >> rbd_data.ed9d394a851426.000000000000447c
> >> rbd_data.ed9d394a851426.0000000000010857
> >> rbd_data.ed9d394a851426.000000000000ec8b
> >> rbd_data.ed9d394a851426.000000000000fa43
> >> rbd_data.ed9d394a851426.000000000001ef2d
> >> ^C
> >> 
> >> it hangs on this object and isn't going further. rbd cp also hangs...
> >> rbd map - also...
> >> 
> >> can  You advice what can be solution for this case ?
> 
> > The hang is due to OSD throttling (see my first reply for how to wrok 
> > around that and get a pg query).  But you already did that and the cluster
> > told you which OSDs it needs to see up in order for it to peer and 
> > recover.  If you haven't destroyed those disks, you should start those

> > osds and it shoudl be fine.  If you've destroyed the data or the disks are
> > truly broken and dead, then you can mark those OSDs lost and the cluster
> > *maybe* recover (but hard to say given the information you've shared).

This

> 
> > sage
> 
> What information I can bring to You to say it is recoverable ?
> 
> here are ceph -s and ceph health detail:
> 
> [root@cc1 ~]# ceph -s
>     cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
>      health HEALTH_WARN
>             2 pgs down
>             2 pgs peering
>             2 pgs stuck inactive
>      monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
>             election epoch 872, quorum 0,1,2 cc1,cc2,cc3
>      osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs
>       pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects
>             76705 GB used, 107 TB / 182 TB avail
>                 4030 active+clean
>                    1 down+remapped+peering
>                    1 down+peering
>   client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr
> [root@cc1 ~]# ceph health detail
> HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
> pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48]
> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
> pg 1.60 is down+remapped+peering, acting [66,40]
> pg 1.165 is down+peering, acting [67,88,48]
> [root@cc1 ~]#
> 
> -- 
> Regards,
>  Łukasz Chrustek
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 14:02                   ` Sage Weil
@ 2017-05-24 14:18                     ` Łukasz Chrustek
  2017-05-24 14:47                       ` Sage Weil
  0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 14:18 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Cześć,

> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>> 
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >> 
>> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> Cześć,
>> >> >> 
>> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
>> >> >> >> did,      as      You      wrote,     but     turning     off     this
>> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>> >> >> 
>> >> >> > The important bit is:
>> >> >> 
>> >> >> >             "blocked": "peering is blocked due to down osds",
>> >> >> >             "down_osds_we_would_probe": [
>> >> >> >                 6,
>> >> >> >                 10,
>> >> >> >                 33,
>> >> >> >                 37,
>> >> >> >                 72
>> >> >> >             ],
>> >> >> >             "peering_blocked_by": [
>> >> >> >                 {
>> >> >> >                     "osd": 6,
>> >> >> >                     "current_lost_at": 0,
>> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> > us proceed"
>> >> >> >                 },
>> >> >> >                 {
>> >> >> >                     "osd": 10,
>> >> >> >                     "current_lost_at": 0,
>> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> > us proceed"
>> >> >> >                 },
>> >> >> >                 {
>> >> >> >                     "osd": 37,
>> >> >> >                     "current_lost_at": 0,
>> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> > us proceed"
>> >> >> >                 },
>> >> >> >                 {
>> >> >> >                     "osd": 72,
>> >> >> >                     "current_lost_at": 113771,
>> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> > us proceed"

> These are the osds (6, 10, 37, 72).

>> >> >> >                 }
>> >> >> >             ]
>> >> >> >         },
>> >> >> 
>> >> >> > Are any of those OSDs startable?

> This

osd 6 - isn't startable

osd 10, 37, 72 are startable

>> >> >> 
>> >> >> They were all up and running - but I decided to shut them down and out
>> >> >> them  from  ceph, now it looks like ceph working ok, but still two PGs
>> >> >> are in down state, how to get rid of it ?
>> >> 
>> >> > If you haven't deleted the data, you should start the OSDs back up.

> This

By OSDs backup You mean copy /var/lib/ceph/osd/ceph-72/* to some other
(non ceph) disk ?

>> >> 
>> >> > If they are partially damanged you can use ceph-objectstore-tool to 
>> >> > extract just the PGs in question to make sure you haven't lost anything,
>> >> > inject them on some other OSD(s) and restart those, and *then* mark the
>> >> > bad OSDs as 'lost'.
>> >> 
>> >> > If all else fails, you can just mark those OSDs 'lost', but in doing so
>> >> > you might be telling the cluster to lose data.
>> >> 
>> >> > The best thing to do is definitely to get those OSDs started again.

> This

There were actions on this PGs, that make them destroy. I started this
osds   (these  three,  which  are  startable)  -  this  dosn't  solved
situation.  I  need to add, that on this cluster are other pools, only
with pool with broken/down PGs is problem.
>> >> 
>> >> Now situation looks like this:
>> >> 
>> >> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
>> >> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
>> >>         size 500 GB in 128000 objects
>> >>         order 22 (4096 kB objects)
>> >>         block_name_prefix: rbd_data.ed9d394a851426
>> >>         format: 2
>> >>         features: layering
>> >>         flags:
>> >> 
>> >> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
>> >> (output cutted)
>> >> rbd_data.ed9d394a851426.000000000000447c
>> >> rbd_data.ed9d394a851426.0000000000010857
>> >> rbd_data.ed9d394a851426.000000000000ec8b
>> >> rbd_data.ed9d394a851426.000000000000fa43
>> >> rbd_data.ed9d394a851426.000000000001ef2d
>> >> ^C
>> >> 
>> >> it hangs on this object and isn't going further. rbd cp also hangs...
>> >> rbd map - also...
>> >> 
>> >> can  You advice what can be solution for this case ?
>> 
>> > The hang is due to OSD throttling (see my first reply for how to wrok 
>> > around that and get a pg query).  But you already did that and the cluster
>> > told you which OSDs it needs to see up in order for it to peer and 
>> > recover.  If you haven't destroyed those disks, you should start those

>> > osds and it shoudl be fine.  If you've destroyed the data or the disks are
>> > truly broken and dead, then you can mark those OSDs lost and the cluster
>> > *maybe* recover (but hard to say given the information you've shared).

> This


[root@cc1 ~]# ceph osd lost 10 --yes-i-really-mean-it
marked osd lost in epoch 115310
[root@cc1 ~]# ceph osd lost 37 --yes-i-really-mean-it
marked osd lost in epoch 115314
[root@cc1 ~]# ceph osd lost 72 --yes-i-really-mean-it
marked osd lost in epoch 115317
[root@cc1 ~]# ceph -s
    cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
     health HEALTH_WARN
            2 pgs down
            2 pgs peering
            2 pgs stuck inactive
     monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
            election epoch 872, quorum 0,1,2 cc1,cc2,cc3
     osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs
      pgmap v67642483: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects
            76718 GB used, 107 TB / 182 TB avail
                4030 active+clean
                   1 down+remapped+peering
                   1 down+peering
  client io 14624 kB/s rd, 31619 kB/s wr, 382 op/s rd, 228 op/s wr
[root@cc1 ~]# ceph -s
    cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
     health HEALTH_WARN
            2 pgs down
            2 pgs peering
            2 pgs stuck inactive
     monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
            election epoch 872, quorum 0,1,2 cc1,cc2,cc3
     osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs
      pgmap v67642485: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects
            76718 GB used, 107 TB / 182 TB avail
                4030 active+clean
                   1 down+remapped+peering
                   1 down+peering
  client io 17805 kB/s rd, 18787 kB/s wr, 215 op/s rd, 107 op/s wr

>> 
>> > sage
>> 
>> What information I can bring to You to say it is recoverable ?
>> 
>> here are ceph -s and ceph health detail:
>> 
>> [root@cc1 ~]# ceph -s
>>     cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
>>      health HEALTH_WARN
>>             2 pgs down
>>             2 pgs peering
>>             2 pgs stuck inactive
>>      monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
>>             election epoch 872, quorum 0,1,2 cc1,cc2,cc3
>>      osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs
>>       pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects
>>             76705 GB used, 107 TB / 182 TB avail
>>                 4030 active+clean
>>                    1 down+remapped+peering
>>                    1 down+peering
>>   client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr
>> [root@cc1 ~]# ceph health detail
>> HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
>> pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48]
>> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
>> pg 1.60 is down+remapped+peering, acting [66,40]
>> pg 1.165 is down+peering, acting [67,88,48]
>> [root@cc1 ~]#
>> 
>> -- 
>> Regards,
>>  Łukasz Chrustek
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 



-- 
Pozdrowienia,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 14:18                     ` Łukasz Chrustek
@ 2017-05-24 14:47                       ` Sage Weil
  2017-05-24 15:00                         ` Łukasz Chrustek
  2017-05-24 21:38                         ` Łukasz Chrustek
  0 siblings, 2 replies; 35+ messages in thread
From: Sage Weil @ 2017-05-24 14:47 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 9357 bytes --]

On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Cześć,
> 
> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >> 
> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> >> Cześć,
> >> >> 
> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> >> Cześć,
> >> >> >> 
> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> >> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
> >> >> >> >> did,      as      You      wrote,     but     turning     off     this
> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
> >> >> >> 
> >> >> >> > The important bit is:
> >> >> >> 
> >> >> >> >             "blocked": "peering is blocked due to down osds",
> >> >> >> >             "down_osds_we_would_probe": [
> >> >> >> >                 6,
> >> >> >> >                 10,
> >> >> >> >                 33,
> >> >> >> >                 37,
> >> >> >> >                 72
> >> >> >> >             ],
> >> >> >> >             "peering_blocked_by": [
> >> >> >> >                 {
> >> >> >> >                     "osd": 6,
> >> >> >> >                     "current_lost_at": 0,
> >> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> >> > us proceed"
> >> >> >> >                 },
> >> >> >> >                 {
> >> >> >> >                     "osd": 10,
> >> >> >> >                     "current_lost_at": 0,
> >> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> >> > us proceed"
> >> >> >> >                 },
> >> >> >> >                 {
> >> >> >> >                     "osd": 37,
> >> >> >> >                     "current_lost_at": 0,
> >> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> >> > us proceed"
> >> >> >> >                 },
> >> >> >> >                 {
> >> >> >> >                     "osd": 72,
> >> >> >> >                     "current_lost_at": 113771,
> >> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> >> > us proceed"
> 
> > These are the osds (6, 10, 37, 72).
> 
> >> >> >> >                 }
> >> >> >> >             ]
> >> >> >> >         },
> >> >> >> 
> >> >> >> > Are any of those OSDs startable?
> 
> > This
> 
> osd 6 - isn't startable

Disk completely 100% dead, or just borken enough that ceph-osd won't 
start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs 
from this osd to recover any important writes on that osd.

> osd 10, 37, 72 are startable

With those started, I'd repeat the original sequence and get a fresh pg 
query to confirm that it still wants just osd.6.

use ceph-objectstore-tool to export the pg from osd.6, stop some other 
ranodm osd (not one of these ones), import the pg into that osd, and start 
again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that 
point.  repeat with the same basic process with the other pg.

s


> 
> >> >> >> 
> >> >> >> They were all up and running - but I decided to shut them down and out
> >> >> >> them  from  ceph, now it looks like ceph working ok, but still two PGs
> >> >> >> are in down state, how to get rid of it ?
> >> >> 
> >> >> > If you haven't deleted the data, you should start the OSDs back up.
> 
> > This
> 
> By OSDs backup You mean copy /var/lib/ceph/osd/ceph-72/* to some other
> (non ceph) disk ?
> 
> >> >> 
> >> >> > If they are partially damanged you can use ceph-objectstore-tool to 
> >> >> > extract just the PGs in question to make sure you haven't lost anything,
> >> >> > inject them on some other OSD(s) and restart those, and *then* mark the
> >> >> > bad OSDs as 'lost'.
> >> >> 
> >> >> > If all else fails, you can just mark those OSDs 'lost', but in doing so
> >> >> > you might be telling the cluster to lose data.
> >> >> 
> >> >> > The best thing to do is definitely to get those OSDs started again.
> 
> > This
> 
> There were actions on this PGs, that make them destroy. I started this
> osds   (these  three,  which  are  startable)  -  this  dosn't  solved
> situation.  I  need to add, that on this cluster are other pools, only
> with pool with broken/down PGs is problem.
> >> >> 
> >> >> Now situation looks like this:
> >> >> 
> >> >> [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b
> >> >> rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b':
> >> >>         size 500 GB in 128000 objects
> >> >>         order 22 (4096 kB objects)
> >> >>         block_name_prefix: rbd_data.ed9d394a851426
> >> >>         format: 2
> >> >>         features: layering
> >> >>         flags:
> >> >> 
> >> >> [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426
> >> >> (output cutted)
> >> >> rbd_data.ed9d394a851426.000000000000447c
> >> >> rbd_data.ed9d394a851426.0000000000010857
> >> >> rbd_data.ed9d394a851426.000000000000ec8b
> >> >> rbd_data.ed9d394a851426.000000000000fa43
> >> >> rbd_data.ed9d394a851426.000000000001ef2d
> >> >> ^C
> >> >> 
> >> >> it hangs on this object and isn't going further. rbd cp also hangs...
> >> >> rbd map - also...
> >> >> 
> >> >> can  You advice what can be solution for this case ?
> >> 
> >> > The hang is due to OSD throttling (see my first reply for how to wrok 
> >> > around that and get a pg query).  But you already did that and the cluster
> >> > told you which OSDs it needs to see up in order for it to peer and 
> >> > recover.  If you haven't destroyed those disks, you should start those
> 
> >> > osds and it shoudl be fine.  If you've destroyed the data or the disks are
> >> > truly broken and dead, then you can mark those OSDs lost and the cluster
> >> > *maybe* recover (but hard to say given the information you've shared).
> 
> > This
> 
> 
> [root@cc1 ~]# ceph osd lost 10 --yes-i-really-mean-it
> marked osd lost in epoch 115310
> [root@cc1 ~]# ceph osd lost 37 --yes-i-really-mean-it
> marked osd lost in epoch 115314
> [root@cc1 ~]# ceph osd lost 72 --yes-i-really-mean-it
> marked osd lost in epoch 115317
> [root@cc1 ~]# ceph -s
>     cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
>      health HEALTH_WARN
>             2 pgs down
>             2 pgs peering
>             2 pgs stuck inactive
>      monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
>             election epoch 872, quorum 0,1,2 cc1,cc2,cc3
>      osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs
>       pgmap v67642483: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects
>             76718 GB used, 107 TB / 182 TB avail
>                 4030 active+clean
>                    1 down+remapped+peering
>                    1 down+peering
>   client io 14624 kB/s rd, 31619 kB/s wr, 382 op/s rd, 228 op/s wr
> [root@cc1 ~]# ceph -s
>     cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
>      health HEALTH_WARN
>             2 pgs down
>             2 pgs peering
>             2 pgs stuck inactive
>      monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
>             election epoch 872, quorum 0,1,2 cc1,cc2,cc3
>      osdmap e115434: 100 osds: 89 up, 86 in; 1 remapped pgs
>       pgmap v67642485: 4032 pgs, 18 pools, 26713 GB data, 4857 kobjects
>             76718 GB used, 107 TB / 182 TB avail
>                 4030 active+clean
>                    1 down+remapped+peering
>                    1 down+peering
>   client io 17805 kB/s rd, 18787 kB/s wr, 215 op/s rd, 107 op/s wr
> 
> >> 
> >> > sage
> >> 
> >> What information I can bring to You to say it is recoverable ?
> >> 
> >> here are ceph -s and ceph health detail:
> >> 
> >> [root@cc1 ~]# ceph -s
> >>     cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
> >>      health HEALTH_WARN
> >>             2 pgs down
> >>             2 pgs peering
> >>             2 pgs stuck inactive
> >>      monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
> >>             election epoch 872, quorum 0,1,2 cc1,cc2,cc3
> >>      osdmap e115431: 100 osds: 89 up, 86 in; 1 remapped pgs
> >>       pgmap v67641261: 4032 pgs, 18 pools, 26706 GB data, 4855 kobjects
> >>             76705 GB used, 107 TB / 182 TB avail
> >>                 4030 active+clean
> >>                    1 down+remapped+peering
> >>                    1 down+peering
> >>   client io 5704 kB/s rd, 24685 kB/s wr, 49 op/s rd, 165 op/s wr
> >> [root@cc1 ~]# ceph health detail
> >> HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
> >> pg 1.165 is stuck inactive since forever, current state down+peering, last acting [67,88,48]
> >> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
> >> pg 1.60 is down+remapped+peering, acting [66,40]
> >> pg 1.165 is down+peering, acting [67,88,48]
> >> [root@cc1 ~]#
> >> 
> >> -- 
> >> Regards,
> >>  Łukasz Chrustek
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> 
> >> 
> 
> 
> 
> -- 
> Pozdrowienia,
>  Łukasz Chrustek
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 14:47                       ` Sage Weil
@ 2017-05-24 15:00                         ` Łukasz Chrustek
  2017-05-24 15:07                           ` Łukasz Chrustek
  2017-05-24 15:11                           ` Sage Weil
  2017-05-24 21:38                         ` Łukasz Chrustek
  1 sibling, 2 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 15:00 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hello,

> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>> 
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >> 
>> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> Cześć,
>> >> >> 
>> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> Cześć,
>> >> >> >> 
>> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
>> >> >> >> >> did,      as      You      wrote,     but     turning     off     this
>> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>> >> >> >> 
>> >> >> >> > The important bit is:
>> >> >> >> 
>> >> >> >> >             "blocked": "peering is blocked due to down osds",
>> >> >> >> >             "down_osds_we_would_probe": [
>> >> >> >> >                 6,
>> >> >> >> >                 10,
>> >> >> >> >                 33,
>> >> >> >> >                 37,
>> >> >> >> >                 72
>> >> >> >> >             ],
>> >> >> >> >             "peering_blocked_by": [
>> >> >> >> >                 {
>> >> >> >> >                     "osd": 6,
>> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> > us proceed"
>> >> >> >> >                 },
>> >> >> >> >                 {
>> >> >> >> >                     "osd": 10,
>> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> > us proceed"
>> >> >> >> >                 },
>> >> >> >> >                 {
>> >> >> >> >                     "osd": 37,
>> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> > us proceed"
>> >> >> >> >                 },
>> >> >> >> >                 {
>> >> >> >> >                     "osd": 72,
>> >> >> >> >                     "current_lost_at": 113771,
>> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> > us proceed"
>> 
>> > These are the osds (6, 10, 37, 72).
>> 
>> >> >> >> >                 }
>> >> >> >> >             ]
>> >> >> >> >         },
>> >> >> >> 
>> >> >> >> > Are any of those OSDs startable?
>> 
>> > This
>> 
>> osd 6 - isn't startable

> Disk completely 100% dead, or just borken enough that ceph-osd won't 
> start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> from this osd to recover any important writes on that osd.

2017-05-24 11:21:23.341938 7f6830a36940  0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375
2017-05-24 11:21:23.350180 7f6830a36940  0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e)
2017-05-24 11:21:23.350610 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work
2017-05-24 11:21:23.350617 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2017-05-24 11:21:23.350633 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported
2017-05-24 11:21:23.351897 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2017-05-24 11:21:23.351951 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported
2017-05-24 11:21:23.351970 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists
2017-05-24 11:21:23.351981 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported
2017-05-24 11:21:23.351984 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
2017-05-24 11:21:23.351987 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed
2017-05-24 11:21:23.351996 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING
2017-05-24 11:21:23.352573 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877)
2017-05-24 11:21:23.353001 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported
2017-05-24 11:21:23.353012 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test
2017-05-24 11:21:23.353016 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted
2017-05-24 11:21:23.353021 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported
2017-05-24 11:21:23.353022 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
2017-05-24 11:21:23.353027 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted
2017-05-24 11:21:23.355156 7f6830a36940  0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock
2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m

it is all I get for this osd in logs, when I try to start it.

>> osd 10, 37, 72 are startable

> With those started, I'd repeat the original sequence and get a fresh pg
> query to confirm that it still wants just osd.6.

You  mean about procedure with loop and taking down OSDs, which broken
PGs are pointing to ?
pg 1.60 is down+remapped+peering, acting [66,40]
pg 1.165 is down+peering, acting [67,88,48]

for pg 1.60 <--> 66 down, then in loop check pg query ?



> use ceph-objectstore-tool to export the pg from osd.6, stop some other
> ranodm osd (not one of these ones), import the pg into that osd, and start
> again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
> point.  repeat with the same basic process with the other pg.

I have already did 'ceph osd lost 6', do I need to do this once again ?


-- 
Regards
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 15:00                         ` Łukasz Chrustek
@ 2017-05-24 15:07                           ` Łukasz Chrustek
  2017-05-24 15:11                           ` Sage Weil
  1 sibling, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 15:07 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel






> it is all I get for this osd in logs, when I try to start it.

>>> osd 10, 37, 72 are startable

>> With those started, I'd repeat the original sequence and get a fresh pg
>> query to confirm that it still wants just osd.6.

> You  mean about procedure with loop and taking down OSDs, which broken
> PGs are pointing to ?
> pg 1.60 is down+remapped+peering, acting [66,40]
> pg 1.165 is down+peering, acting [67,88,48]

> for pg 1.60 <--> 66 down, then in loop check pg query ?



>> use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> ranodm osd (not one of these ones), import the pg into that osd, and start
>> again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
>> point.  repeat with the same basic process with the other pg.

> I have already did 'ceph osd lost 6', do I need to do this once again ?

/dev/sdb1       3,7T   34M  3,7T   1% /var/lib/ceph/osd/ceph-6

this disk have no data, they where migrated, when this osd was able to
be up.

-- 
Regards,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 15:00                         ` Łukasz Chrustek
  2017-05-24 15:07                           ` Łukasz Chrustek
@ 2017-05-24 15:11                           ` Sage Weil
  2017-05-24 15:24                             ` Łukasz Chrustek
  2017-05-24 15:54                             ` Łukasz Chrustek
  1 sibling, 2 replies; 35+ messages in thread
From: Sage Weil @ 2017-05-24 15:11 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7223 bytes --]

On Wed, 24 May 2017, Łukasz Chrustek wrote:

> Hello,
> 
> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >> 
> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> >> Cześć,
> >> >> 
> >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> >> >> Cześć,
> >> >> >> 
> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> >> >> Cześć,
> >> >> >> >> 
> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
> >> >> >> >> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
> >> >> >> >> >> did,      as      You      wrote,     but     turning     off     this
> >> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
> >> >> >> >> 
> >> >> >> >> > The important bit is:
> >> >> >> >> 
> >> >> >> >> >             "blocked": "peering is blocked due to down osds",
> >> >> >> >> >             "down_osds_we_would_probe": [
> >> >> >> >> >                 6,
> >> >> >> >> >                 10,
> >> >> >> >> >                 33,
> >> >> >> >> >                 37,
> >> >> >> >> >                 72
> >> >> >> >> >             ],
> >> >> >> >> >             "peering_blocked_by": [
> >> >> >> >> >                 {
> >> >> >> >> >                     "osd": 6,
> >> >> >> >> >                     "current_lost_at": 0,
> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> >> >> > us proceed"
> >> >> >> >> >                 },
> >> >> >> >> >                 {
> >> >> >> >> >                     "osd": 10,
> >> >> >> >> >                     "current_lost_at": 0,
> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> >> >> > us proceed"
> >> >> >> >> >                 },
> >> >> >> >> >                 {
> >> >> >> >> >                     "osd": 37,
> >> >> >> >> >                     "current_lost_at": 0,
> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> >> >> > us proceed"
> >> >> >> >> >                 },
> >> >> >> >> >                 {
> >> >> >> >> >                     "osd": 72,
> >> >> >> >> >                     "current_lost_at": 113771,
> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
> >> >> >> >> > us proceed"
> >> 
> >> > These are the osds (6, 10, 37, 72).
> >> 
> >> >> >> >> >                 }
> >> >> >> >> >             ]
> >> >> >> >> >         },
> >> >> >> >> 
> >> >> >> >> > Are any of those OSDs startable?
> >> 
> >> > This
> >> 
> >> osd 6 - isn't startable
> 
> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> > from this osd to recover any important writes on that osd.
> 
> 2017-05-24 11:21:23.341938 7f6830a36940  0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375
> 2017-05-24 11:21:23.350180 7f6830a36940  0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e)
> 2017-05-24 11:21:23.350610 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work
> 2017-05-24 11:21:23.350617 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
> 2017-05-24 11:21:23.350633 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported
> 2017-05-24 11:21:23.351897 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
> 2017-05-24 11:21:23.351951 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported
> 2017-05-24 11:21:23.351970 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists
> 2017-05-24 11:21:23.351981 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported
> 2017-05-24 11:21:23.351984 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
> 2017-05-24 11:21:23.351987 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed
> 2017-05-24 11:21:23.351996 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING
> 2017-05-24 11:21:23.352573 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877)
> 2017-05-24 11:21:23.353001 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported
> 2017-05-24 11:21:23.353012 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test
> 2017-05-24 11:21:23.353016 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted
> 2017-05-24 11:21:23.353021 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported
> 2017-05-24 11:21:23.353022 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
> 2017-05-24 11:21:23.353027 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted
> 2017-05-24 11:21:23.355156 7f6830a36940  0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
> 2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
> 2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock
> 2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m
> 
> it is all I get for this osd in logs, when I try to start it.
> 
> >> osd 10, 37, 72 are startable
> 
> > With those started, I'd repeat the original sequence and get a fresh pg
> > query to confirm that it still wants just osd.6.
> 
> You  mean about procedure with loop and taking down OSDs, which broken
> PGs are pointing to ?
> pg 1.60 is down+remapped+peering, acting [66,40]
> pg 1.165 is down+peering, acting [67,88,48]
> 
> for pg 1.60 <--> 66 down, then in loop check pg query ?

Right.

> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> > ranodm osd (not one of these ones), import the pg into that osd, and start
> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
> > point.  repeat with the same basic process with the other pg.
> 
> I have already did 'ceph osd lost 6', do I need to do this once again ?

Hmm not sure, if the OSD is empty then there is no harm in doing it again.  
Try that first since it might resolve it.  If not, do the query loop 
above.

s

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 15:11                           ` Sage Weil
@ 2017-05-24 15:24                             ` Łukasz Chrustek
  2017-05-24 15:54                             ` Łukasz Chrustek
  1 sibling, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 15:24 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hello,

>> 
>> >> osd 10, 37, 72 are startable
>> 
>> > With those started, I'd repeat the original sequence and get a fresh pg
>> > query to confirm that it still wants just osd.6.
>> 
>> You  mean about procedure with loop and taking down OSDs, which broken
>> PGs are pointing to ?
>> pg 1.60 is down+remapped+peering, acting [66,40]
>> pg 1.165 is down+peering, acting [67,88,48]
>> 
>> for pg 1.60 <--> 66 down, then in loop check pg query ?

> Right.

>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
>> > point.  repeat with the same basic process with the other pg.
>> 
>> I have already did 'ceph osd lost 6', do I need to do this once again ?

> Hmm not sure, if the OSD is empty then there is no harm in doing it again.
> Try that first since it might resolve it.  If not, do the query loop 
> above.
[root@cc1 ~]# ceph osd lost 6 --yes-i-really-mean-it
marked osd lost in epoch 113414
[root@cc1 ~]#
[root@cc1 ~]# ceph -s
    cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
     health HEALTH_WARN
            2 pgs down
            2 pgs peering
            2 pgs stuck inactive
     monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
            election epoch 872, quorum 0,1,2 cc1,cc2,cc3
     osdmap e115449: 100 osds: 88 up, 86 in; 1 remapped pgs
      pgmap v67646402: 4032 pgs, 18 pools, 26733 GB data, 4862 kobjects
            76759 GB used, 107 TB / 182 TB avail
                4030 active+clean
                   1 down+peering
                   1 down+remapped+peering
  client io 57154 kB/s rd, 1189 kB/s wr, 95 op/s


There is no action after marking again this osd as lost.


-- 
Regards,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 15:11                           ` Sage Weil
  2017-05-24 15:24                             ` Łukasz Chrustek
@ 2017-05-24 15:54                             ` Łukasz Chrustek
  2017-05-24 16:02                               ` Łukasz Chrustek
  1 sibling, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 15:54 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hello,

> On Wed, 24 May 2017, Łukasz Chrustek wrote:

>> Hello,
>> 
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >> 
>> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> Cześć,
>> >> >> 
>> >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> >> Cześć,
>> >> >> >> 
>> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> >> Cześć,
>> >> >> >> >> 
>> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> >> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
>> >> >> >> >> >> did,      as      You      wrote,     but     turning     off     this
>> >> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>> >> >> >> >> 
>> >> >> >> >> > The important bit is:
>> >> >> >> >> 
>> >> >> >> >> >             "blocked": "peering is blocked due to down osds",
>> >> >> >> >> >             "down_osds_we_would_probe": [
>> >> >> >> >> >                 6,
>> >> >> >> >> >                 10,
>> >> >> >> >> >                 33,
>> >> >> >> >> >                 37,
>> >> >> >> >> >                 72
>> >> >> >> >> >             ],
>> >> >> >> >> >             "peering_blocked_by": [
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 6,
>> >> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> >                 },
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 10,
>> >> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> >                 },
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 37,
>> >> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> >                 },
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 72,
>> >> >> >> >> >                     "current_lost_at": 113771,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> 
>> >> > These are the osds (6, 10, 37, 72).
>> >> 
>> >> >> >> >> >                 }
>> >> >> >> >> >             ]
>> >> >> >> >> >         },
>> >> >> >> >> 
>> >> >> >> >> > Are any of those OSDs startable?
>> >> 
>> >> > This
>> >> 
>> >> osd 6 - isn't startable
>> 
>> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
>> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
>> > from this osd to recover any important writes on that osd.
>> 
>> 2017-05-24 11:21:23.341938 7f6830a36940  0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375
>> 2017-05-24 11:21:23.350180 7f6830a36940  0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e)
>> 2017-05-24 11:21:23.350610 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work
>> 2017-05-24 11:21:23.350617 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
>> 2017-05-24 11:21:23.350633 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported
>> 2017-05-24 11:21:23.351897 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
>> 2017-05-24 11:21:23.351951 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported
>> 2017-05-24 11:21:23.351970 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists
>> 2017-05-24 11:21:23.351981 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported
>> 2017-05-24 11:21:23.351984 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
>> 2017-05-24 11:21:23.351987 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed
>> 2017-05-24 11:21:23.351996 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING
>> 2017-05-24 11:21:23.352573 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877)
>> 2017-05-24 11:21:23.353001 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported
>> 2017-05-24 11:21:23.353012 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test
>> 2017-05-24 11:21:23.353016 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted
>> 2017-05-24 11:21:23.353021 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported
>> 2017-05-24 11:21:23.353022 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
>> 2017-05-24 11:21:23.353027 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted
>> 2017-05-24 11:21:23.355156 7f6830a36940  0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
>> 2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
>> 2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock
>> 2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m
>> 
>> it is all I get for this osd in logs, when I try to start it.
>> 
>> >> osd 10, 37, 72 are startable
>> 
>> > With those started, I'd repeat the original sequence and get a fresh pg
>> > query to confirm that it still wants just osd.6.
>> 
>> You  mean about procedure with loop and taking down OSDs, which broken
>> PGs are pointing to ?
>> pg 1.60 is down+remapped+peering, acting [66,40]
>> pg 1.165 is down+peering, acting [67,88,48]
>> 
>> for pg 1.60 <--> 66 down, then in loop check pg query ?

> Right.

And  now  it  is very weird.... I made osd.37 up, and loop
while true;do; ceph tell 1.165 query ;done

catch this:

https://pastebin.com/zKu06fJn

Can You tell, what is wrong now ?

>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
>> > point.  repeat with the same basic process with the other pg.
>> 
>> I have already did 'ceph osd lost 6', do I need to do this once again ?

> Hmm not sure, if the OSD is empty then there is no harm in doing it again.
> Try that first since it might resolve it.  If not, do the query loop 
> above.

> s



-- 
Regards,,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 15:54                             ` Łukasz Chrustek
@ 2017-05-24 16:02                               ` Łukasz Chrustek
  2017-05-24 17:07                                 ` Łukasz Chrustek
  0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 16:02 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hello,

> And  now  it  is very weird.... I made osd.37 up, and loop
> while true;do; ceph tell 1.165 query ;done

Here  need  to  explain  more  - all I did was start ceph-osd id=37 on
storage node, in ceph osd tree this osd osd is marked as out:


-17  21.49995     host stor8
 22   1.59999         osd.22            up  1.00000          1.00000 
 23   1.59999         osd.23            up  1.00000          1.00000 
 36   2.09999         osd.36            up  1.00000          1.00000 
 37   2.09999         osd.37            up        0          1.00000 
 38   2.50000         osd.38            up  1.00000          1.00000 
 39   2.50000         osd.39            up  1.00000          1.00000 
 40   2.50000         osd.40            up        0          1.00000 
 41   2.50000         osd.41          down        0          1.00000 
 42   2.50000         osd.42            up  1.00000          1.00000 
 43   1.59999         osd.43            up  1.00000          1.00000

 after start of this osd, ceph tell 1.165 query  worked only for one call of this command
> catch this:

> https://pastebin.com/zKu06fJn

> Can You tell, what is wrong now ?


>>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>>> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
>>> > point.  repeat with the same basic process with the other pg.
>>> 
>>> I have already did 'ceph osd lost 6', do I need to do this once again ?

>> Hmm not sure, if the OSD is empty then there is no harm in doing it again.
>> Try that first since it might resolve it.  If not, do the query loop 
>> above.

>> s






-- 
Pozdrowienia,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 16:02                               ` Łukasz Chrustek
@ 2017-05-24 17:07                                 ` Łukasz Chrustek
  2017-05-24 17:16                                   ` Sage Weil
  0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 17:07 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel



>> And  now  it  is very weird.... I made osd.37 up, and loop
>> while true;do; ceph tell 1.165 query ;done

> Here  need  to  explain  more  - all I did was start ceph-osd id=37 on
> storage node, in ceph osd tree this osd osd is marked as out:


> -17  21.49995     host stor8
>  22   1.59999         osd.22            up  1.00000          1.00000 
>  23   1.59999         osd.23            up  1.00000          1.00000 
>  36   2.09999         osd.36            up  1.00000          1.00000 
>  37   2.09999         osd.37            up        0          1.00000 
>  38   2.50000         osd.38            up  1.00000          1.00000 
>  39   2.50000         osd.39            up  1.00000          1.00000 
>  40   2.50000         osd.40            up        0          1.00000 
>  41   2.50000         osd.41          down        0          1.00000 
>  42   2.50000         osd.42            up  1.00000          1.00000 
>  43   1.59999         osd.43            up  1.00000          1.00000

>  after start of this osd, ceph tell 1.165 query  worked only for one call of this command
>> catch this:

>> https://pastebin.com/zKu06fJn

here is for pg 1.60:

https://pastebin.com/Xuk5iFXr

>> Can You tell, what is wrong now ?


>>>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>>>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>>>> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
>>>> > point.  repeat with the same basic process with the other pg.
>>>> 
>>>> I have already did 'ceph osd lost 6', do I need to do this once again ?

>>> Hmm not sure, if the OSD is empty then there is no harm in doing it again.
>>> Try that first since it might resolve it.  If not, do the query loop 
>>> above.

>>> s









-- 
Pozdrowienia,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 17:07                                 ` Łukasz Chrustek
@ 2017-05-24 17:16                                   ` Sage Weil
  2017-05-24 17:28                                     ` Łukasz Chrustek
  2017-05-24 17:30                                     ` Łukasz Chrustek
  0 siblings, 2 replies; 35+ messages in thread
From: Sage Weil @ 2017-05-24 17:16 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1424 bytes --]

On Wed, 24 May 2017, Łukasz Chrustek wrote:
> 
> >> And  now  it  is very weird.... I made osd.37 up, and loop
> >> while true;do; ceph tell 1.165 query ;done
> 
> > Here  need  to  explain  more  - all I did was start ceph-osd id=37 on
> > storage node, in ceph osd tree this osd osd is marked as out:
> 
> 
> > -17  21.49995     host stor8
> >  22   1.59999         osd.22            up  1.00000          1.00000 
> >  23   1.59999         osd.23            up  1.00000          1.00000 
> >  36   2.09999         osd.36            up  1.00000          1.00000 
> >  37   2.09999         osd.37            up        0          1.00000 
> >  38   2.50000         osd.38            up  1.00000          1.00000 
> >  39   2.50000         osd.39            up  1.00000          1.00000 
> >  40   2.50000         osd.40            up        0          1.00000 
> >  41   2.50000         osd.41          down        0          1.00000 
> >  42   2.50000         osd.42            up  1.00000          1.00000 
> >  43   1.59999         osd.43            up  1.00000          1.00000
> 
> >  after start of this osd, ceph tell 1.165 query  worked only for one call of this command
> >> catch this:
> 
> >> https://pastebin.com/zKu06fJn
> 
> here is for pg 1.60:
> 
> https://pastebin.com/Xuk5iFXr

Look at the bottom, after it says

            "blocked": "peering is blocked due to down osds",

Did the 1.165 pg recover?

sage

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 17:16                                   ` Sage Weil
@ 2017-05-24 17:28                                     ` Łukasz Chrustek
  2017-05-24 18:16                                       ` Sage Weil
  2017-05-24 17:30                                     ` Łukasz Chrustek
  1 sibling, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 17:28 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Cześć,

> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> 
>> >> And  now  it  is very weird.... I made osd.37 up, and loop
>> >> while true;do; ceph tell 1.165 query ;done
>> 
>> > Here  need  to  explain  more  - all I did was start ceph-osd id=37 on
>> > storage node, in ceph osd tree this osd osd is marked as out:
>> 
>> 
>> > -17  21.49995     host stor8
>> >  22   1.59999         osd.22            up  1.00000          1.00000 
>> >  23   1.59999         osd.23            up  1.00000          1.00000 
>> >  36   2.09999         osd.36            up  1.00000          1.00000 
>> >  37   2.09999         osd.37            up        0          1.00000 
>> >  38   2.50000         osd.38            up  1.00000          1.00000 
>> >  39   2.50000         osd.39            up  1.00000          1.00000 
>> >  40   2.50000         osd.40            up        0          1.00000 
>> >  41   2.50000         osd.41          down        0          1.00000 
>> >  42   2.50000         osd.42            up  1.00000          1.00000 
>> >  43   1.59999         osd.43            up  1.00000          1.00000
>> 
>> >  after start of this osd, ceph tell 1.165 query  worked only for one call of this command
>> >> catch this:
>> 
>> >> https://pastebin.com/zKu06fJn
>> 
>> here is for pg 1.60:
>> 
>> https://pastebin.com/Xuk5iFXr

> Look at the bottom, after it says

>             "blocked": "peering is blocked due to down osds",

> Did the 1.165 pg recover?

No it didn't:

[root@cc1 ~]# ceph health detail
HEALTH_WARN 1 pgs down; 1 pgs incomplete; 1 pgs peering; 2 pgs stuck inactive
pg 1.165 is stuck inactive since forever, current state incomplete, last acting [67,88,48]
pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [68]
pg 1.60 is down+remapped+peering, acting [68]
pg 1.165 is incomplete, acting [67,88,48]
[root@cc1 ~]#

-- 
Pozdrowienia,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 17:16                                   ` Sage Weil
  2017-05-24 17:28                                     ` Łukasz Chrustek
@ 2017-05-24 17:30                                     ` Łukasz Chrustek
  2017-05-24 17:35                                       ` Łukasz Chrustek
  1 sibling, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 17:30 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Cześć,

> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> 
>> >> And  now  it  is very weird.... I made osd.37 up, and loop
>> >> while true;do; ceph tell 1.165 query ;done
>> 
>> > Here  need  to  explain  more  - all I did was start ceph-osd id=37 on
>> > storage node, in ceph osd tree this osd osd is marked as out:
>> 
>> 
>> > -17  21.49995     host stor8
>> >  22   1.59999         osd.22            up  1.00000          1.00000 
>> >  23   1.59999         osd.23            up  1.00000          1.00000 
>> >  36   2.09999         osd.36            up  1.00000          1.00000 
>> >  37   2.09999         osd.37            up        0          1.00000 
>> >  38   2.50000         osd.38            up  1.00000          1.00000 
>> >  39   2.50000         osd.39            up  1.00000          1.00000 
>> >  40   2.50000         osd.40            up        0          1.00000 
>> >  41   2.50000         osd.41          down        0          1.00000 
>> >  42   2.50000         osd.42            up  1.00000          1.00000 
>> >  43   1.59999         osd.43            up  1.00000          1.00000
>> 
>> >  after start of this osd, ceph tell 1.165 query  worked only for one call of this command
>> >> catch this:
>> 
>> >> https://pastebin.com/zKu06fJn
>> 
>> here is for pg 1.60:
>> 
>> https://pastebin.com/Xuk5iFXr

> Look at the bottom, after it says

>             "blocked": "peering is blocked due to down osds",

for   pg  1.60: all osds was down, when ceph tell 1.60 query catch one
'interrupt'.

> Did the 1.165 pg recover?

> sage



-- 
Pozdrowienia,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 17:30                                     ` Łukasz Chrustek
@ 2017-05-24 17:35                                       ` Łukasz Chrustek
  0 siblings, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 17:35 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hello,

>> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>>> 
>>> >> And  now  it  is very weird.... I made osd.37 up, and loop
>>> >> while true;do; ceph tell 1.165 query ;done
>>> 
>>> > Here  need  to  explain  more  - all I did was start ceph-osd id=37 on
>>> > storage node, in ceph osd tree this osd osd is marked as out:
>>> 
>>> 
>>> > -17  21.49995     host stor8
>>> >  22   1.59999         osd.22            up  1.00000          1.00000 
>>> >  23   1.59999         osd.23            up  1.00000          1.00000 
>>> >  36   2.09999         osd.36            up  1.00000          1.00000 
>>> >  37   2.09999         osd.37            up        0          1.00000 
>>> >  38   2.50000         osd.38            up  1.00000          1.00000 
>>> >  39   2.50000         osd.39            up  1.00000          1.00000 
>>> >  40   2.50000         osd.40            up        0          1.00000 
>>> >  41   2.50000         osd.41          down        0          1.00000 
>>> >  42   2.50000         osd.42            up  1.00000          1.00000 
>>> >  43   1.59999         osd.43            up  1.00000          1.00000
>>> 
>>> >  after start of this osd, ceph tell 1.165 query  worked only for one call of this command
>>> >> catch this:
>>> 
>>> >> https://pastebin.com/zKu06fJn
>>> 
>>> here is for pg 1.60:
>>> 
>>> https://pastebin.com/Xuk5iFXr

>> Look at the bottom, after it says

>>             "blocked": "peering is blocked due to down osds",

> for   pg  1.60: all osds was down, when ceph tell 1.60 query catch one
> 'interrupt'.

when I'm trying to use ceph-objectstore-tool I get:

[root@stor3 ~]# ceph-objectstore-tool --op export --pgid 1.60 --data-path /mnt --journal-path /mnt/journal --file 1.60.export
Mount failed with '(95) Operation not supported'

[root@stor3 ~]# du -sh /mnt/current/1.60_head
276M    /mnt/current/1.60_head
[root@stor3 ~]# ls -al /mnt/current/1.60_head | wc -l
49
[root@stor3 ~]#

-- 
Regards,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 17:28                                     ` Łukasz Chrustek
@ 2017-05-24 18:16                                       ` Sage Weil
  2017-05-24 19:47                                         ` Łukasz Chrustek
  0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-24 18:16 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2211 bytes --]

On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Cześć,
> 
> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> 
> >> >> And  now  it  is very weird.... I made osd.37 up, and loop
> >> >> while true;do; ceph tell 1.165 query ;done
> >> 
> >> > Here  need  to  explain  more  - all I did was start ceph-osd id=37 on
> >> > storage node, in ceph osd tree this osd osd is marked as out:
> >> 
> >> 
> >> > -17  21.49995     host stor8
> >> >  22   1.59999         osd.22            up  1.00000          1.00000 
> >> >  23   1.59999         osd.23            up  1.00000          1.00000 
> >> >  36   2.09999         osd.36            up  1.00000          1.00000 
> >> >  37   2.09999         osd.37            up        0          1.00000 
> >> >  38   2.50000         osd.38            up  1.00000          1.00000 
> >> >  39   2.50000         osd.39            up  1.00000          1.00000 
> >> >  40   2.50000         osd.40            up        0          1.00000 
> >> >  41   2.50000         osd.41          down        0          1.00000 
> >> >  42   2.50000         osd.42            up  1.00000          1.00000 
> >> >  43   1.59999         osd.43            up  1.00000          1.00000
> >> 
> >> >  after start of this osd, ceph tell 1.165 query  worked only for one call of this command
> >> >> catch this:
> >> 
> >> >> https://pastebin.com/zKu06fJn
> >> 
> >> here is for pg 1.60:
> >> 
> >> https://pastebin.com/Xuk5iFXr
> 
> > Look at the bottom, after it says
> 
> >             "blocked": "peering is blocked due to down osds",
> 
> > Did the 1.165 pg recover?
> 
> No it didn't:
> 
> [root@cc1 ~]# ceph health detail
> HEALTH_WARN 1 pgs down; 1 pgs incomplete; 1 pgs peering; 2 pgs stuck inactive
> pg 1.165 is stuck inactive since forever, current state incomplete, last acting [67,88,48]
> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [68]
> pg 1.60 is down+remapped+peering, acting [68]
> pg 1.165 is incomplete, acting [67,88,48]
> [root@cc1 ~]#

Hrm.

 ceph daemon osd.67 config set debug_osd 20
 ceph daemon osd.67 config set debug_ms 1
 ceph osd down 67

and capture the log resulting log segment, then post it with 
ceph-post-file.

sage

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 18:16                                       ` Sage Weil
@ 2017-05-24 19:47                                         ` Łukasz Chrustek
  0 siblings, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 19:47 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hello,

> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>> 
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> 
>> >> >> And  now  it  is very weird.... I made osd.37 up, and loop
>> >> >> while true;do; ceph tell 1.165 query ;done
>> >> 
>> >> > Here  need  to  explain  more  - all I did was start ceph-osd id=37 on
>> >> > storage node, in ceph osd tree this osd osd is marked as out:
>> >> 
>> >> 
>> >> > -17  21.49995     host stor8
>> >> >  22   1.59999         osd.22            up  1.00000          1.00000 
>> >> >  23   1.59999         osd.23            up  1.00000          1.00000 
>> >> >  36   2.09999         osd.36            up  1.00000          1.00000 
>> >> >  37   2.09999         osd.37            up        0          1.00000 
>> >> >  38   2.50000         osd.38            up  1.00000          1.00000 
>> >> >  39   2.50000         osd.39            up  1.00000          1.00000 
>> >> >  40   2.50000         osd.40            up        0          1.00000 
>> >> >  41   2.50000         osd.41          down        0          1.00000 
>> >> >  42   2.50000         osd.42            up  1.00000          1.00000 
>> >> >  43   1.59999         osd.43            up  1.00000          1.00000
>> >> 
>> >> >  after start of this osd, ceph tell 1.165 query  worked only for one call of this command
>> >> >> catch this:
>> >> 
>> >> >> https://pastebin.com/zKu06fJn
>> >> 
>> >> here is for pg 1.60:
>> >> 
>> >> https://pastebin.com/Xuk5iFXr
>> 
>> > Look at the bottom, after it says
>> 
>> >             "blocked": "peering is blocked due to down osds",
>> 
>> > Did the 1.165 pg recover?
>> 
>> No it didn't:
>> 
>> [root@cc1 ~]# ceph health detail
>> HEALTH_WARN 1 pgs down; 1 pgs incomplete; 1 pgs peering; 2 pgs stuck inactive
>> pg 1.165 is stuck inactive since forever, current state incomplete, last acting [67,88,48]
>> pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [68]
>> pg 1.60 is down+remapped+peering, acting [68]
>> pg 1.165 is incomplete, acting [67,88,48]
>> [root@cc1 ~]#

> Hrm.

>  ceph daemon osd.67 config set debug_osd 20
>  ceph daemon osd.67 config set debug_ms 1
>  ceph osd down 67

> and capture the log resulting log segment, then post it with 
> ceph-post-file.

args: -- /var/log/ceph/ceph-osd.67.log
/usr/bin/ceph-post-file: upload tag 05a02f14-8fd6-43da-9b9c-e42cd1fce560
/usr/bin/ceph-post-file: user: root@stor3
/usr/bin/ceph-post-file: will upload file /var/log/ceph/ceph-osd.67.log
sftp> mkdir post/05a02f14-8fd6-43da-9b9c-e42cd1fce560_root@stor3_8612f2d9-bb31-4d5e-b3e7-3722f8d13314
sftp> cd post/05a02f14-8fd6-43da-9b9c-e42cd1fce560_root@stor3_8612f2d9-bb31-4d5e-b3e7-3722f8d13314
sftp> put /tmp/tmp.rggR3suNMt user
sftp> put /var/log/ceph/ceph-osd.67.log

/usr/bin/ceph-post-file: copy the upload id below to share with a dev:

ceph-post-file: 05a02f14-8fd6-43da-9b9c-e42cd1fce560

-- 
Regards,,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 14:47                       ` Sage Weil
  2017-05-24 15:00                         ` Łukasz Chrustek
@ 2017-05-24 21:38                         ` Łukasz Chrustek
  2017-05-24 21:53                           ` Sage Weil
  1 sibling, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 21:38 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hello,

>>
>> > This
>> 
>> osd 6 - isn't startable

> Disk completely 100% dead, or just borken enough that ceph-osd won't 
> start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> from this osd to recover any important writes on that osd.

>> osd 10, 37, 72 are startable

> With those started, I'd repeat the original sequence and get a fresh pg
> query to confirm that it still wants just osd.6.

> use ceph-objectstore-tool to export the pg from osd.6, stop some other
> ranodm osd (not one of these ones), import the pg into that osd, and start
> again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
> point.  repeat with the same basic process with the other pg.

Here is output from ceph-objectstore-tool - also didn't success:

https://pastebin.com/7XGAHdKH


-- 
Pozdrowienia,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 21:38                         ` Łukasz Chrustek
@ 2017-05-24 21:53                           ` Sage Weil
  2017-05-24 22:09                             ` Łukasz Chrustek
  0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-24 21:53 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5583 bytes --]

On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Hello,
> 
> >>
> >> > This
> >> 
> >> osd 6 - isn't startable
> 
> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> > from this osd to recover any important writes on that osd.
> 
> >> osd 10, 37, 72 are startable
> 
> > With those started, I'd repeat the original sequence and get a fresh pg
> > query to confirm that it still wants just osd.6.
> 
> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> > ranodm osd (not one of these ones), import the pg into that osd, and start
> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
> > point.  repeat with the same basic process with the other pg.
> 
> Here is output from ceph-objectstore-tool - also didn't success:
> 
> https://pastebin.com/7XGAHdKH

Hmm, btrfs:

2017-05-24 23:28:58.547456 7f500948e940 -1 
filestore(/var/lib/ceph/osd/ceph-84) ERROR: 
/var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid 
losing new data

You could try setting --osd-use-stale-snap as suggested.

Is it the same error with the other one?


Looking at the log you sent earlier for 1.165 on osd.67, and the primary 
reports:

2017-05-24 21:37:11.505256 7efdbc1e5700  5 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] enter Started/Primary/Peering/GetLog
2017-05-24 21:37:11.505291 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.37 1.165( v 112598'67281552 (112574'67278547,112598'67281552] lb 1/56500165/rbd_data.674a3ed7dffd473.0000000000000b38/
head (NIBBLEWISE) local-les=112584 n=1 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505299 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.38 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h
ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505306 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.48 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h
ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505313 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.67 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h
ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505319 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.88 1.165( empty local-les=0 n=0 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505326 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] choose_acting failed

in particular, osd 37 38 48 67 all have incomplete copies of the PG (they 
are mid-backfill) and 68 has nothing.  Some data is lost unless you can 
recovery another OSD with that PG.

The set of OSDs that might have data are: 6,10,33,72,84

If that bears no fruit, then you can force last_backfill to report 
complete on one of those OSDs and it'll think it has all the data even 
though some of it is likely gone.  (We can pick one that is farther 
along... 38 48 and 67 seem to all match.)

sage

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 21:53                           ` Sage Weil
@ 2017-05-24 22:09                             ` Łukasz Chrustek
  2017-05-24 22:27                               ` Sage Weil
  0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 22:09 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Cześć,

> On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> Hello,
>> 
>> >>
>> >> > This
>> >> 
>> >> osd 6 - isn't startable
>> 
>> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
>> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
>> > from this osd to recover any important writes on that osd.
>> 
>> >> osd 10, 37, 72 are startable
>> 
>> > With those started, I'd repeat the original sequence and get a fresh pg
>> > query to confirm that it still wants just osd.6.
>> 
>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
>> > point.  repeat with the same basic process with the other pg.
>> 
>> Here is output from ceph-objectstore-tool - also didn't success:
>> 
>> https://pastebin.com/7XGAHdKH

> Hmm, btrfs:

> 2017-05-24 23:28:58.547456 7f500948e940 -1 
> filestore(/var/lib/ceph/osd/ceph-84) ERROR: 
> /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
> losing new data

> You could try setting --osd-use-stale-snap as suggested.

Yes... tried... and I simply get rided of 39GB data...

> Is it the same error with the other one?

Yes: https://pastebin.com/7XGAHdKH




> in particular, osd 37 38 48 67 all have incomplete copies of the PG (they
> are mid-backfill) and 68 has nothing.  Some data is lost unless you can
> recovery another OSD with that PG.

> The set of OSDs that might have data are: 6,10,33,72,84

> If that bears no fruit, then you can force last_backfill to report

how to force last_backfill ?

> complete on one of those OSDs and it'll think it has all the data even
> though some of it is likely gone.  (We can pick one that is farther 
> along... 38 48 and 67 seem to all match.)

> sage



-- 
Pozdrowienia,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 22:09                             ` Łukasz Chrustek
@ 2017-05-24 22:27                               ` Sage Weil
  2017-05-24 22:46                                 ` Łukasz Chrustek
  0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-24 22:27 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2337 bytes --]

On Thu, 25 May 2017, Łukasz Chrustek wrote:
> Cześć,
> 
> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> Hello,
> >> 
> >> >>
> >> >> > This
> >> >> 
> >> >> osd 6 - isn't startable
> >> 
> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
> >> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> >> > from this osd to recover any important writes on that osd.
> >> 
> >> >> osd 10, 37, 72 are startable
> >> 
> >> > With those started, I'd repeat the original sequence and get a fresh pg
> >> > query to confirm that it still wants just osd.6.
> >> 
> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> >> > ranodm osd (not one of these ones), import the pg into that osd, and start
> >> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
> >> > point.  repeat with the same basic process with the other pg.
> >> 
> >> Here is output from ceph-objectstore-tool - also didn't success:
> >> 
> >> https://pastebin.com/7XGAHdKH
> 
> > Hmm, btrfs:
> 
> > 2017-05-24 23:28:58.547456 7f500948e940 -1 
> > filestore(/var/lib/ceph/osd/ceph-84) ERROR: 
> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
> > losing new data
> 
> > You could try setting --osd-use-stale-snap as suggested.
> 
> Yes... tried... and I simply get rided of 39GB data...

What does "get rided" mean?



> 
> > Is it the same error with the other one?
> 
> Yes: https://pastebin.com/7XGAHdKH
> 
> 
> 
> 
> > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they
> > are mid-backfill) and 68 has nothing.  Some data is lost unless you can
> > recovery another OSD with that PG.
> 
> > The set of OSDs that might have data are: 6,10,33,72,84
> 
> > If that bears no fruit, then you can force last_backfill to report
> 
> how to force last_backfill ?
> 
> > complete on one of those OSDs and it'll think it has all the data even
> > though some of it is likely gone.  (We can pick one that is farther 
> > along... 38 48 and 67 seem to all match.)
> 
> > sage
> 
> 
> 
> -- 
> Pozdrowienia,
>  Łukasz Chrustek
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 22:27                               ` Sage Weil
@ 2017-05-24 22:46                                 ` Łukasz Chrustek
  2017-05-25  2:06                                   ` Sage Weil
  2017-05-30 13:21                                   ` Sage Weil
  0 siblings, 2 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-24 22:46 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Cześć,

> On Thu, 25 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>> 
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> Hello,
>> >> 
>> >> >>
>> >> >> > This
>> >> >> 
>> >> >> osd 6 - isn't startable
>> >> 
>> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
>> >> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
>> >> > from this osd to recover any important writes on that osd.
>> >> 
>> >> >> osd 10, 37, 72 are startable
>> >> 
>> >> > With those started, I'd repeat the original sequence and get a fresh pg
>> >> > query to confirm that it still wants just osd.6.
>> >> 
>> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> >> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> >> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
>> >> > point.  repeat with the same basic process with the other pg.
>> >> 
>> >> Here is output from ceph-objectstore-tool - also didn't success:
>> >> 
>> >> https://pastebin.com/7XGAHdKH
>> 
>> > Hmm, btrfs:
>> 
>> > 2017-05-24 23:28:58.547456 7f500948e940 -1 
>> > filestore(/var/lib/ceph/osd/ceph-84) ERROR: 
>> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
>> > losing new data
>> 
>> > You could try setting --osd-use-stale-snap as suggested.
>> 
>> Yes... tried... and I simply get rided of 39GB data...

> What does "get rided" mean?

according to this pastebin: https://pastebin.com/QPcpkjg4

ls -R /var/lib/ceph/osd/ceph-33/current/

/var/lib/ceph/osd/ceph-33/current/:

commit_op_seq  omap



/var/lib/ceph/osd/ceph-33/current/omap:

000003.log  CURRENT  LOCK  MANIFEST-000002

earlier there were same data files.

>> 
>> > Is it the same error with the other one?
>> 
>> Yes: https://pastebin.com/7XGAHdKH
>> 
>> 
>> 
>> 
>> > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they
>> > are mid-backfill) and 68 has nothing.  Some data is lost unless you can
>> > recovery another OSD with that PG.
>> 
>> > The set of OSDs that might have data are: 6,10,33,72,84
>> 
>> > If that bears no fruit, then you can force last_backfill to report
>> complete on one of those OSDs and it'll think it has all the data even
>> though some of it is likely gone.  (We can pick one that is farther
>> along... 38 48 and 67 seem to all match.

Can  You  explain  what  do You mean by 'force last_backfill to report
complete'  ?  The  current  value  for PG 1.60 is MAX and for 1.165 is
1\/db616165\/rbd_data.ed9979641a9d82.000000000001dcee\/head 

-- 
Pozdrowienia,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 22:46                                 ` Łukasz Chrustek
@ 2017-05-25  2:06                                   ` Sage Weil
  2017-05-25 11:22                                     ` Łukasz Chrustek
  2017-05-30 13:21                                   ` Sage Weil
  1 sibling, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-25  2:06 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3049 bytes --]

On Thu, 25 May 2017, Łukasz Chrustek wrote:
> Cześć,
> 
> > On Thu, 25 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >> 
> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> >> Hello,
> >> >> 
> >> >> >>
> >> >> >> > This
> >> >> >> 
> >> >> >> osd 6 - isn't startable
> >> >> 
> >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
> >> >> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> >> >> > from this osd to recover any important writes on that osd.
> >> >> 
> >> >> >> osd 10, 37, 72 are startable
> >> >> 
> >> >> > With those started, I'd repeat the original sequence and get a fresh pg
> >> >> > query to confirm that it still wants just osd.6.
> >> >> 
> >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start
> >> >> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
> >> >> > point.  repeat with the same basic process with the other pg.
> >> >> 
> >> >> Here is output from ceph-objectstore-tool - also didn't success:
> >> >> 
> >> >> https://pastebin.com/7XGAHdKH
> >> 
> >> > Hmm, btrfs:
> >> 
> >> > 2017-05-24 23:28:58.547456 7f500948e940 -1 
> >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR: 
> >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
> >> > losing new data
> >> 
> >> > You could try setting --osd-use-stale-snap as suggested.
> >> 
> >> Yes... tried... and I simply get rided of 39GB data...
> 
> > What does "get rided" mean?
> 
> according to this pastebin: https://pastebin.com/QPcpkjg4
> 
> ls -R /var/lib/ceph/osd/ceph-33/current/
> 
> /var/lib/ceph/osd/ceph-33/current/:
> 
> commit_op_seq  omap
> 
> 
> 
> /var/lib/ceph/osd/ceph-33/current/omap:
> 
> 000003.log  CURRENT  LOCK  MANIFEST-000002
> 
> earlier there were same data files.

Yeah, looks like all the data was deleted from the device.  :(

> >> 
> >> > Is it the same error with the other one?
> >> 
> >> Yes: https://pastebin.com/7XGAHdKH
> >> 
> >> 
> >> 
> >> 
> >> > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they
> >> > are mid-backfill) and 68 has nothing.  Some data is lost unless you can
> >> > recovery another OSD with that PG.
> >> 
> >> > The set of OSDs that might have data are: 6,10,33,72,84
> >> 
> >> > If that bears no fruit, then you can force last_backfill to report
> >> complete on one of those OSDs and it'll think it has all the data even
> >> though some of it is likely gone.  (We can pick one that is farther
> >> along... 38 48 and 67 seem to all match.
> 
> Can  You  explain  what  do You mean by 'force last_backfill to report
> complete'  ?  The  current  value  for PG 1.60 is MAX and for 1.165 is
> 1\/db616165\/rbd_data.ed9979641a9d82.000000000001dcee\/head 

ceph-objectstore-tool has a mark-complete operation.  Do that one one of 
the OSDs that has the more advanced last_backfill (like the one above).  
After you restart the PG should recover.

Good luck!
sage

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-25  2:06                                   ` Sage Weil
@ 2017-05-25 11:22                                     ` Łukasz Chrustek
  2017-05-29 15:31                                       ` Łukasz Chrustek
  0 siblings, 1 reply; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-25 11:22 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Cześć,

> On Thu, 25 May 2017, Łukasz Chrustek wrote:
>> Cześć,
>> 
>> > On Thu, 25 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >> 
>> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> Hello,
>> >> >> 
>> >> >> >>
>> >> >> >> > This
>> >> >> >> 
>> >> >> >> osd 6 - isn't startable
>> >> >> 
>> >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
>> >> >> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
>> >> >> > from this osd to recover any important writes on that osd.
>> >> >> 
>> >> >> >> osd 10, 37, 72 are startable
>> >> >> 
>> >> >> > With those started, I'd repeat the original sequence and get a fresh pg
>> >> >> > query to confirm that it still wants just osd.6.
>> >> >> 
>> >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> >> >> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
>> >> >> > point.  repeat with the same basic process with the other pg.
>> >> >> 
>> >> >> Here is output from ceph-objectstore-tool - also didn't success:
>> >> >> 
>> >> >> https://pastebin.com/7XGAHdKH
>> >> 
>> >> > Hmm, btrfs:
>> >> 
>> >> > 2017-05-24 23:28:58.547456 7f500948e940 -1 
>> >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR: 
>> >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
>> >> > losing new data
>> >> 
>> >> > You could try setting --osd-use-stale-snap as suggested.
>> >> 
>> >> Yes... tried... and I simply get rided of 39GB data...
>> 
>> > What does "get rided" mean?
>> 
>> according to this pastebin: https://pastebin.com/QPcpkjg4
>> 
>> ls -R /var/lib/ceph/osd/ceph-33/current/
>> 
>> /var/lib/ceph/osd/ceph-33/current/:
>> 
>> commit_op_seq  omap
>> 
>> 
>> 
>> /var/lib/ceph/osd/ceph-33/current/omap:
>> 
>> 000003.log  CURRENT  LOCK  MANIFEST-000002
>> 
>> earlier there were same data files.

> Yeah, looks like all the data was deleted from the device.  :(

>> >> 
>> >> > Is it the same error with the other one?
>> >> 
>> >> Yes: https://pastebin.com/7XGAHdKH
>> >> 
>> >> 
>> >> 
>> >> 
>> >> > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they
>> >> > are mid-backfill) and 68 has nothing.  Some data is lost unless you can
>> >> > recovery another OSD with that PG.
>> >> 
>> >> > The set of OSDs that might have data are: 6,10,33,72,84
>> >> 
>> >> > If that bears no fruit, then you can force last_backfill to report
>> >> complete on one of those OSDs and it'll think it has all the data even
>> >> though some of it is likely gone.  (We can pick one that is farther
>> >> along... 38 48 and 67 seem to all match.
>> 
>> Can  You  explain  what  do You mean by 'force last_backfill to report
>> complete'  ?  The  current  value  for PG 1.60 is MAX and for 1.165 is
>> 1\/db616165\/rbd_data.ed9979641a9d82.000000000001dcee\/head 

> ceph-objectstore-tool has a mark-complete operation.  Do that one one of
> the OSDs that has the more advanced last_backfill (like the one above).
> After you restart the PG should recover.

It is (https://pastebin.com/Jv2DpcB3) pg dump_stuck BEFORE running:
ceph-objectstore-tool --debug --op mark-complete --pgid 1.165 --data-path /var/lib/ceph/osd/ceph-48 --journal-path /var/lib/ceph/osd/ceph-48/journal --osd-use-stale-snap

as in previous usage of this tool data gone away:

[root@stor5 /var/lib/ceph/osd/ceph-48]# du -sh current
20K     current

[root@stor5 /var/lib/ceph/osd/ceph-48/current]# ls -R
.:
commit_op_seq  nosnap  omap/

./omap:
000011.log  CURRENT  LOCK  LOG  LOG.old  MANIFEST-000010

after running ceph-objectstore-tool it is:

ceph pg dump_stuck
ok
pg_stat state   up      up_primary      acting  acting_primary
1.39    active+remapped+backfilling     [11,4,39]       11      [5,39,70]       5
1.1a9   active+remapped+backfilling     [11,30,3]       11      [0,30,8]        0
1.b     active+remapped+backfilling     [11,36,94]      11      [38,97,70]      38
1.12f   active+remapped+backfilling     [14,11,47]      14      [14,5,69]       14
1.1d2   active+remapped+backfilling     [11,2,38]       11      [0,36,49]       0
1.133   active+remapped+backfilling     [42,11,83]      42      [42,89,21]      42
40.69   stale+active+undersized+degraded        [48]    48      [48]    48
1.9d    active+remapped+backfilling     [39,2,11]       39      [39,2,86]       39
1.a2    active+remapped+backfilling     [11,12,34]      11      [14,35,95]      14
1.10a   active+remapped+backfilling     [11,2,87]       11      [1,87,81]       1
1.70    active+remapped+backfilling     [14,39,11]      14      [14,39,4]       14
1.60    down+remapped+peering   [83,69,68]      83      [9]     9
1.eb    active+remapped+backfilling     [11,18,53]      11      [14,53,69]      14
1.8d    active+remapped+backfilling     [11,0,30]       11      [36,0,30]       36
1.118   active+remapped+backfilling     [34,11,12]      34      [34,20,86]      34
1.121   active+remapped+backfilling     [43,11,35]      43      [43,35,2]       43
1.177   active+remapped+backfilling     [14,1,11]       14      [14,1,38]       14
1.17c   active+remapped+backfilling     [5,94,11]       5       [5,94,7]        5
1.16d   active+remapped+backfilling     [96,11,53]      96      [96,52,9]       96
1.19a   active+remapped+backfilling     [11,0,14]       11      [0,17,35]       0
1.165   down+peering    [39,55,82]      39      [39,55,82]      39
1.1a    active+remapped+backfilling     [36,52,11]      36      [36,52,96]      36
1.e7    active+remapped+backfilling     [11,35,44]      11      [34,44,9]       34


Is there any chance to rescue this cluster ?


-- 
Regards,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-25 11:22                                     ` Łukasz Chrustek
@ 2017-05-29 15:31                                       ` Łukasz Chrustek
  0 siblings, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-05-29 15:31 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hello,

> ./omap:
> 000011.log  CURRENT  LOCK  LOG  LOG.old  MANIFEST-000010

> after running ceph-objectstore-tool it is:

> ceph pg dump_stuck
> ok
> pg_stat state   up      up_primary      acting  acting_primary
> 1.39    active+remapped+backfilling     [11,4,39]       11      [5,39,70]       5
> 1.1a9   active+remapped+backfilling     [11,30,3]       11      [0,30,8]        0
> 1.b     active+remapped+backfilling     [11,36,94]      11      [38,97,70]      38
> 1.12f   active+remapped+backfilling     [14,11,47]      14      [14,5,69]       14
> 1.1d2   active+remapped+backfilling     [11,2,38]       11      [0,36,49]       0
> 1.133   active+remapped+backfilling     [42,11,83]      42      [42,89,21]      42
> 40.69   stale+active+undersized+degraded        [48]    48      [48]    48
> 1.9d    active+remapped+backfilling     [39,2,11]       39      [39,2,86]       39
> 1.a2    active+remapped+backfilling     [11,12,34]      11      [14,35,95]      14
> 1.10a   active+remapped+backfilling     [11,2,87]       11      [1,87,81]       1
> 1.70    active+remapped+backfilling     [14,39,11]      14      [14,39,4]       14
> 1.60    down+remapped+peering   [83,69,68]      83      [9]     9
> 1.eb    active+remapped+backfilling     [11,18,53]      11      [14,53,69]      14
> 1.8d    active+remapped+backfilling     [11,0,30]       11      [36,0,30]       36
> 1.118   active+remapped+backfilling     [34,11,12]      34      [34,20,86]      34
> 1.121   active+remapped+backfilling     [43,11,35]      43      [43,35,2]       43
> 1.177   active+remapped+backfilling     [14,1,11]       14      [14,1,38]       14
> 1.17c   active+remapped+backfilling     [5,94,11]       5       [5,94,7]        5
> 1.16d   active+remapped+backfilling     [96,11,53]      96      [96,52,9]       96
> 1.19a   active+remapped+backfilling     [11,0,14]       11      [0,17,35]       0
> 1.165   down+peering    [39,55,82]      39      [39,55,82]      39
> 1.1a    active+remapped+backfilling     [36,52,11]      36      [36,52,96]      36
> 1.e7    active+remapped+backfilling     [11,35,44]      11      [34,44,9]       34


> Is there any chance to rescue this cluster ?

I  have  now turned off all OSDs and MONs, after that I turn on two of
three MONs to make qourum. On all osds all ceph processes are off. But
ceph osd tree see old/false data: https://pastebin.com/pVGLxAPs

Why  ceph doesn't see that all osds are down ? What can him block like
this ?



-- 
Regards,
 Łukasz Chrustek


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-24 22:46                                 ` Łukasz Chrustek
  2017-05-25  2:06                                   ` Sage Weil
@ 2017-05-30 13:21                                   ` Sage Weil
  2017-06-10 22:45                                     ` Łukasz Chrustek
  1 sibling, 1 reply; 35+ messages in thread
From: Sage Weil @ 2017-05-30 13:21 UTC (permalink / raw)
  To: Łukasz Chrustek; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2451 bytes --]

On Thu, 25 May 2017, Łukasz Chrustek wrote:
> Cześć,
> 
> > On Thu, 25 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >> 
> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> >> Hello,
> >> >> 
> >> >> >>
> >> >> >> > This
> >> >> >> 
> >> >> >> osd 6 - isn't startable
> >> >> 
> >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
> >> >> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> >> >> > from this osd to recover any important writes on that osd.
> >> >> 
> >> >> >> osd 10, 37, 72 are startable
> >> >> 
> >> >> > With those started, I'd repeat the original sequence and get a fresh pg
> >> >> > query to confirm that it still wants just osd.6.
> >> >> 
> >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start
> >> >> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
> >> >> > point.  repeat with the same basic process with the other pg.
> >> >> 
> >> >> Here is output from ceph-objectstore-tool - also didn't success:
> >> >> 
> >> >> https://pastebin.com/7XGAHdKH
> >> 
> >> > Hmm, btrfs:
> >> 
> >> > 2017-05-24 23:28:58.547456 7f500948e940 -1 
> >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR: 
> >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
> >> > losing new data
> >> 
> >> > You could try setting --osd-use-stale-snap as suggested.
> >> 
> >> Yes... tried... and I simply get rided of 39GB data...
> 
> > What does "get rided" mean?
> 
> according to this pastebin: https://pastebin.com/QPcpkjg4
> 
> ls -R /var/lib/ceph/osd/ceph-33/current/
> 
> /var/lib/ceph/osd/ceph-33/current/:
> 
> commit_op_seq  omap
> 
> 
> 
> /var/lib/ceph/osd/ceph-33/current/omap:
> 
> 000003.log  CURRENT  LOCK  MANIFEST-000002
> 
> earlier there were same data files.

Okay, sorry I took a while to get back to you.  It looks like I gave 
you bad advice here!  The 'nosnap' files means filestore was 
operating in non-snapshotting mode, and the --osd-use-stale-snap 
warning that it would lose data was real... it rolled back to an empty 
state and threw out the data on the device.  :( :(  I'm *very* sorry about 
this!  I haven't looked at or worked with the btrfs mode is ages (we 
don't recommend it and almost nobody uses it) but I should have been 
paying close attention.

What is the state of the cluster now?

sage

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Problem with query and any operation on PGs
  2017-05-30 13:21                                   ` Sage Weil
@ 2017-06-10 22:45                                     ` Łukasz Chrustek
  0 siblings, 0 replies; 35+ messages in thread
From: Łukasz Chrustek @ 2017-06-10 22:45 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi Sage,

> Okay, sorry I took a while to get back to you.

Sorry too - most of time I was focused on this problem.

>  It looks like I gave
> you bad advice here!  The 'nosnap' files means filestore was 
> operating in non-snapshotting mode, and the --osd-use-stale-snap 
> warning that it would lose data was real... it rolled back to an empty
> state and threw out the data on the device.  :( :(  I'm *very* sorry about
> this!  I haven't looked at or worked with the btrfs mode is ages (we 
> don't recommend it and almost nobody uses it) but I should have been 
> paying close attention.

Thank  You  for  Your  time  and effort, it was important to have such
help.  There  were  many  errors  in  setup of this cluster. We didn't
relize  that  there could be so much strange things, which were f...ed
up...

> What is the state of the cluster now?

Cluster is dead. After few more days of fight with cluster we decieded
to  shut  it  down  and  we  fixed scripts for recovering volumes from
turned         off        ceph        cluster        (this        one:
https://github.com/cmgitdream/ceph-rbd-recover-tool)   and   make   it
running for jewel version (10.2.7). I setup brand new cluster on other
hardware and now images are importing to new cluster. With some direct
edition  of  mysql  in  openstack  systems  we  didn't  had  to change
everything  for  our clients from horizon point of view. Once the dust
settles, we will add changes to github for this tool.

After  end  of migration we will try to run this dead cluster and make
some more agressive action to make it work anyway.



-- 
Regards,,
 Lukasz


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-06-10 22:45 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <175484591.20170523135449@tlen.pl>
2017-05-23 12:48 ` Problem with query and any operation on PGs Łukasz Chrustek
2017-05-23 14:17   ` Sage Weil
2017-05-23 14:43     ` Łukasz Chrustek
     [not found]     ` <1464688590.20170523185052@tlen.pl>
2017-05-23 17:40       ` Sage Weil
2017-05-23 21:43         ` Łukasz Chrustek
2017-05-23 21:48           ` Sage Weil
2017-05-24 13:19             ` Łukasz Chrustek
2017-05-24 13:37               ` Sage Weil
2017-05-24 13:58                 ` Łukasz Chrustek
2017-05-24 14:02                   ` Sage Weil
2017-05-24 14:18                     ` Łukasz Chrustek
2017-05-24 14:47                       ` Sage Weil
2017-05-24 15:00                         ` Łukasz Chrustek
2017-05-24 15:07                           ` Łukasz Chrustek
2017-05-24 15:11                           ` Sage Weil
2017-05-24 15:24                             ` Łukasz Chrustek
2017-05-24 15:54                             ` Łukasz Chrustek
2017-05-24 16:02                               ` Łukasz Chrustek
2017-05-24 17:07                                 ` Łukasz Chrustek
2017-05-24 17:16                                   ` Sage Weil
2017-05-24 17:28                                     ` Łukasz Chrustek
2017-05-24 18:16                                       ` Sage Weil
2017-05-24 19:47                                         ` Łukasz Chrustek
2017-05-24 17:30                                     ` Łukasz Chrustek
2017-05-24 17:35                                       ` Łukasz Chrustek
2017-05-24 21:38                         ` Łukasz Chrustek
2017-05-24 21:53                           ` Sage Weil
2017-05-24 22:09                             ` Łukasz Chrustek
2017-05-24 22:27                               ` Sage Weil
2017-05-24 22:46                                 ` Łukasz Chrustek
2017-05-25  2:06                                   ` Sage Weil
2017-05-25 11:22                                     ` Łukasz Chrustek
2017-05-29 15:31                                       ` Łukasz Chrustek
2017-05-30 13:21                                   ` Sage Weil
2017-06-10 22:45                                     ` Łukasz Chrustek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.