All of lore.kernel.org
 help / color / mirror / Atom feed
* Really slow cache-evict
@ 2015-10-23 23:13 Robert LeBlanc
  2015-10-24  0:47 ` Robert LeBlanc
  2015-10-24  1:00 ` Sage Weil
  0 siblings, 2 replies; 4+ messages in thread
From: Robert LeBlanc @ 2015-10-23 23:13 UTC (permalink / raw)
  To: ceph-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

We are testing out cache tiering, but when evicting the cache on an
idle cluster it is extremely slow (10 objects per minutes). Looking at
top some of the OSD processes are busy, but the disks are idle across
the cluster. I upped debug_osd to 20/20 and saw millions of messages
per object like:

2015-10-23 16:52:15.952561 7fa42a1fd700 20 osd.169 39454
should_share_map client.329656 10.208.16.31:0/1012854 39454
2015-10-23 16:52:15.952605 7fa42a1fd700 15 osd.169 39454 enqueue_op
0x287ba400 prio 63 cost 450560 latency 0.000999
osd_op(client.329656.0:165865996
rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
object_size 8388608 write_size 8388608,write 860160~450560]
12.549948fa ack+ondisk+wr
ite+known_if_redirected e39454) v5
2015-10-23 16:52:15.952642 7fa43fbeb700 10 osd.169 39454 dequeue_op
0x287ba400 prio 63 cost 450560 latency 0.001036
osd_op(client.329656.0:165865996
rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
object_size 8388608 write_size 8388608,write 860160~450560]
12.549948fa ack+ondisk+wr
ite+known_if_redirected e39454) v5 pg pg[12.fa( v 39422'12054
(39253'9006,39422'12054] local-les=39453 n=284 ec=31174 les/c
39453/39453 39452/39452/39442) [169,186,153,136] r=0 lpr=39452
crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
2015-10-23 16:52:15.952676 7fa43fbeb700 20 osd.169 pg_epoch: 39454
pg[12.fa( v 39422'12054 (39253'9006,39422'12054] local-les=39453 n=284
ec=31174 les/c 39453/39453 39452/39452/39442) [169,186,153,136] r=0
lpr=39452 crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
op_has_sufficient_caps pool=12
(ssd-pool ) owner=0 need_read_cap=1 need_write_cap=1
need_class_read_cap=0 need_class_write_cap=0 -> NO
2015-10-23 16:52:15.952688 7fa42a1fd700 20 osd.169 39454
should_share_map client.329656 10.208.16.31:0/1012854 39454
2015-10-23 16:52:15.952701 7fa43fbeb700 10 osd.169 39454 dequeue_op
0x287ba400 finish
2015-10-23 16:52:15.952729 7fa42a1fd700 15 osd.169 39454 enqueue_op
0x287c7500 prio 63 cost 4096 latency 0.000092
osd_op(client.329656.0:165866002
rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
object_size 8388608 write_size 8388608,write 6369280~4096] 12.549948fa
ack+ondisk+write
+known_if_redirected e39454) v5
2015-10-23 16:52:15.952762 7fa43d3e6700 10 osd.169 39454 dequeue_op
0x287c7500 prio 63 cost 4096 latency 0.000125
osd_op(client.329656.0:165866002
rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
object_size 8388608 write_size 8388608,write 6369280~4096] 12.549948fa
ack+ondisk+write
+known_if_redirected e39454) v5 pg pg[12.fa( v 39422'12054
(39253'9006,39422'12054] local-les=39453 n=284 ec=31174 les/c
39453/39453 39452/39452/39442) [169,186,153,136] r=0 lpr=39452
crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
2015-10-23 16:52:15.952787 7fa43d3e6700 20 osd.169 pg_epoch: 39454
pg[12.fa( v 39422'12054 (39253'9006,39422'12054] local-les=39453 n=284
ec=31174 les/c 39453/39453 39452/39452/39442) [169,186,153,136] r=0
lpr=39452 crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
op_has_sufficient_caps pool=12
(ssd-pool ) owner=0 need_read_cap=1 need_write_cap=1
need_class_read_cap=0 need_class_write_cap=0 -> NO
2015-10-23 16:52:15.952832 7fa43d3e6700 10 osd.169 39454 dequeue_op
0x287c7500 finish

It looks like maybe the OP is bouncing between threads or something
like that and never getting dispatched correctly. This is on
ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)

I followed the directions at [1] in the writeback section for
disabling the cache tier. I did this on another cluster recently and
didn't see this problem (only took an hour or two to evict 700GB of
data). I saw a very slow evict a while back, but just chalked it up to
btrfs being dumb.

[1] http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.2
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWKr8iCRDmVDuy+mK58QAAl7oP/2bfWT/DUxjxkEGIXOnW
x2dk7hlTAkTAts7bRQn+vL31bcHAEoN+nJH5J39nt98Kh4rpZyHcVtJammYM
hYNsc97Hah+O2iqDYWxr4Vu27eyq5icCw0oFo9hbsA6lGPcrgeVSGbnXaFk0
+4u6OCovRDb6Rgfc6IAJoHrkulnPaWGaKWa1sfofdlMDJoA2U5cnX7ZvkRnM
unnKKtAjmeVCFbt/QmwFJVEjdNh0yZXN2qLSJUVwf4APgB4G2trupQ6kuisn
oGZdJP1DRtFn5IgoZQzB4ZA1FNydrVoQmrPNTAkCGQoUWegbmhxrXxsrfLV9
jTky2inv8NLdP46VyEVyvOqFD3WVRYIP+KvVcaNnL2awMClOgZ4uish+rLK/
Uyhcm9dfXgBz1Fm+PzdzpMwK8/lvk/UcfnBs6s5rELgjCQ/WPLMu7y1ZP9FY
DzLsrtMdbMGZA7MTt9CKAe9fZ+sPC3Se8slh6mN7M7Tm2Knsv6uyKLJ06imW
M7DMGVMi2A3I+qTE/DEA/D5MLcx+vJtLD57/a2j3dWaEiiBqaAsMA18UiwEX
bMaHAbFMWBrlhIVytP4IaA7dbiEaO6C90pyHHtyVf1etKDfP+erNRtHVW4JI
8wpG87Z34o6r92fkbb0LRH/TvD+WQFU4Q0OiL5JqclJKJP6TNUUhMmstad62
vmqk
=dO6U
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Really slow cache-evict
  2015-10-23 23:13 Really slow cache-evict Robert LeBlanc
@ 2015-10-24  0:47 ` Robert LeBlanc
  2015-10-24  1:00 ` Sage Weil
  1 sibling, 0 replies; 4+ messages in thread
From: Robert LeBlanc @ 2015-10-24  0:47 UTC (permalink / raw)
  To: ceph-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

There might be something deeper here. We have four copies min_size 2,
if I take down half of the cluster, it evicts at the rate I expect. If
all OSDs are in, it slows and blocks for a long time.

I also have 15 PGs that are inconsistent. I deep scrub them and they
become healthy. I stop and restart all the OSDs and they come back.
Only one of those PGs are in the base tier. I tried taking out two of
the OSDs for that PG and it doesn't really do much  to help the
progress of the eviction.

Reducing the pools size and min-size to 1 didn't help at all. It is
really weird.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Oct 23, 2015 at 5:13 PM, Robert LeBlanc  wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> We are testing out cache tiering, but when evicting the cache on an
> idle cluster it is extremely slow (10 objects per minutes). Looking at
> top some of the OSD processes are busy, but the disks are idle across
> the cluster. I upped debug_osd to 20/20 and saw millions of messages
> per object like:
>
> 2015-10-23 16:52:15.952561 7fa42a1fd700 20 osd.169 39454
> should_share_map client.329656 10.208.16.31:0/1012854 39454
> 2015-10-23 16:52:15.952605 7fa42a1fd700 15 osd.169 39454 enqueue_op
> 0x287ba400 prio 63 cost 450560 latency 0.000999
> osd_op(client.329656.0:165865996
> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 860160~450560]
> 12.549948fa ack+ondisk+wr
> ite+known_if_redirected e39454) v5
> 2015-10-23 16:52:15.952642 7fa43fbeb700 10 osd.169 39454 dequeue_op
> 0x287ba400 prio 63 cost 450560 latency 0.001036
> osd_op(client.329656.0:165865996
> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 860160~450560]
> 12.549948fa ack+ondisk+wr
> ite+known_if_redirected e39454) v5 pg pg[12.fa( v 39422'12054
> (39253'9006,39422'12054] local-les=39453 n=284 ec=31174 les/c
> 39453/39453 39452/39452/39442) [169,186,153,136] r=0 lpr=39452
> crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
> 2015-10-23 16:52:15.952676 7fa43fbeb700 20 osd.169 pg_epoch: 39454
> pg[12.fa( v 39422'12054 (39253'9006,39422'12054] local-les=39453 n=284
> ec=31174 les/c 39453/39453 39452/39452/39442) [169,186,153,136] r=0
> lpr=39452 crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
> op_has_sufficient_caps pool=12
> (ssd-pool ) owner=0 need_read_cap=1 need_write_cap=1
> need_class_read_cap=0 need_class_write_cap=0 -> NO
> 2015-10-23 16:52:15.952688 7fa42a1fd700 20 osd.169 39454
> should_share_map client.329656 10.208.16.31:0/1012854 39454
> 2015-10-23 16:52:15.952701 7fa43fbeb700 10 osd.169 39454 dequeue_op
> 0x287ba400 finish
> 2015-10-23 16:52:15.952729 7fa42a1fd700 15 osd.169 39454 enqueue_op
> 0x287c7500 prio 63 cost 4096 latency 0.000092
> osd_op(client.329656.0:165866002
> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 6369280~4096] 12.549948fa
> ack+ondisk+write
> +known_if_redirected e39454) v5
> 2015-10-23 16:52:15.952762 7fa43d3e6700 10 osd.169 39454 dequeue_op
> 0x287c7500 prio 63 cost 4096 latency 0.000125
> osd_op(client.329656.0:165866002
> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 6369280~4096] 12.549948fa
> ack+ondisk+write
> +known_if_redirected e39454) v5 pg pg[12.fa( v 39422'12054
> (39253'9006,39422'12054] local-les=39453 n=284 ec=31174 les/c
> 39453/39453 39452/39452/39442) [169,186,153,136] r=0 lpr=39452
> crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
> 2015-10-23 16:52:15.952787 7fa43d3e6700 20 osd.169 pg_epoch: 39454
> pg[12.fa( v 39422'12054 (39253'9006,39422'12054] local-les=39453 n=284
> ec=31174 les/c 39453/39453 39452/39452/39442) [169,186,153,136] r=0
> lpr=39452 crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
> op_has_sufficient_caps pool=12
> (ssd-pool ) owner=0 need_read_cap=1 need_write_cap=1
> need_class_read_cap=0 need_class_write_cap=0 -> NO
> 2015-10-23 16:52:15.952832 7fa43d3e6700 10 osd.169 39454 dequeue_op
> 0x287c7500 finish
>
> It looks like maybe the OP is bouncing between threads or something
> like that and never getting dispatched correctly. This is on
> ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)
>
> I followed the directions at [1] in the writeback section for
> disabling the cache tier. I did this on another cluster recently and
> didn't see this problem (only took an hour or two to evict 700GB of
> data). I saw a very slow evict a while back, but just chalked it up to
> btrfs being dumb.
>
> [1] http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.2
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJWKr8iCRDmVDuy+mK58QAAl7oP/2bfWT/DUxjxkEGIXOnW
> x2dk7hlTAkTAts7bRQn+vL31bcHAEoN+nJH5J39nt98Kh4rpZyHcVtJammYM
> hYNsc97Hah+O2iqDYWxr4Vu27eyq5icCw0oFo9hbsA6lGPcrgeVSGbnXaFk0
> +4u6OCovRDb6Rgfc6IAJoHrkulnPaWGaKWa1sfofdlMDJoA2U5cnX7ZvkRnM
> unnKKtAjmeVCFbt/QmwFJVEjdNh0yZXN2qLSJUVwf4APgB4G2trupQ6kuisn
> oGZdJP1DRtFn5IgoZQzB4ZA1FNydrVoQmrPNTAkCGQoUWegbmhxrXxsrfLV9
> jTky2inv8NLdP46VyEVyvOqFD3WVRYIP+KvVcaNnL2awMClOgZ4uish+rLK/
> Uyhcm9dfXgBz1Fm+PzdzpMwK8/lvk/UcfnBs6s5rELgjCQ/WPLMu7y1ZP9FY
> DzLsrtMdbMGZA7MTt9CKAe9fZ+sPC3Se8slh6mN7M7Tm2Knsv6uyKLJ06imW
> M7DMGVMi2A3I+qTE/DEA/D5MLcx+vJtLD57/a2j3dWaEiiBqaAsMA18UiwEX
> bMaHAbFMWBrlhIVytP4IaA7dbiEaO6C90pyHHtyVf1etKDfP+erNRtHVW4JI
> 8wpG87Z34o6r92fkbb0LRH/TvD+WQFU4Q0OiL5JqclJKJP6TNUUhMmstad62
> vmqk
> =dO6U
> -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.2
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWKtUKCRDmVDuy+mK58QAAXMkQALsPsesb5+GuaY3uESQ/
y3DP+V3wAQPycZRHWPCSavgXHqvteQsYPigSxXlDsomompsvwtGcFU0Ube2w
lfM050e2ZnaQQ9IKMLfAyZHu7vnbiVoVd3bEfHpjT/WZfPMaVpO6dl9UCpMq
3AwPtpw2+DbTo6JGD4C0nB0Qug9k1fjiS3utGDtbTm54acsU7pDrlBDFWIBt
r5Ix7nd4ZGNdKvCIvberciw173QyKz5jLIp0VPJnl22oB7iGn4QcP6E6MN8M
ILe+S+Gqz1Hld7O+mqshOYwiBlkiefgOoaEQGzLIuG8SJPfls9eWD5ufmjh1
SUx+jEkZDit8dGz7a7TIUoIkRuBCBlR+m6zn/XYlSjgx6OkRxsLeF3EZcnfn
twQqxCi1QwsD0VtvODL/UwVa+uN9ZwUQ/OCxDulOUmKUybVSdM3GwSFs7yvJ
aVLMLar7kqRcDC5jsWk/xGA8clMaHHOo87aqCSQFVtDpFBLaVvDDcGkIlw0q
c46fQI6X343IUrDQyjMawUteE5M5osspBsaK8uS27XuU+8KOTciIRUprZG16
K0DxL8zlW7mG72fsi8UM5xdRifyEGVhbmNSohLfca7A/pORBp6rbRnZOvoqX
HtQc74qsfxqgIX7fZF3aV/lM5wqOoPZd426F9BsRuqryA61CtiVYnTLryzOW
w9vt
=hQQ4
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Really slow cache-evict
  2015-10-23 23:13 Really slow cache-evict Robert LeBlanc
  2015-10-24  0:47 ` Robert LeBlanc
@ 2015-10-24  1:00 ` Sage Weil
  2015-10-24  1:49   ` Robert LeBlanc
  1 sibling, 1 reply; 4+ messages in thread
From: Sage Weil @ 2015-10-24  1:00 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: ceph-devel

On Fri, 23 Oct 2015, Robert LeBlanc wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> We are testing out cache tiering, but when evicting the cache on an
> idle cluster it is extremely slow (10 objects per minutes). Looking at
> top some of the OSD processes are busy, but the disks are idle across
> the cluster. I upped debug_osd to 20/20 and saw millions of messages
> per object like:
> 
> 2015-10-23 16:52:15.952561 7fa42a1fd700 20 osd.169 39454
> should_share_map client.329656 10.208.16.31:0/1012854 39454
> 2015-10-23 16:52:15.952605 7fa42a1fd700 15 osd.169 39454 enqueue_op
> 0x287ba400 prio 63 cost 450560 latency 0.000999
> osd_op(client.329656.0:165865996
> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 860160~450560]
> 12.549948fa ack+ondisk+wr
> ite+known_if_redirected e39454) v5
> 2015-10-23 16:52:15.952642 7fa43fbeb700 10 osd.169 39454 dequeue_op
> 0x287ba400 prio 63 cost 450560 latency 0.001036
> osd_op(client.329656.0:165865996
> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 860160~450560]
> 12.549948fa ack+ondisk+wr
> ite+known_if_redirected e39454) v5 pg pg[12.fa( v 39422'12054
> (39253'9006,39422'12054] local-les=39453 n=284 ec=31174 les/c
> 39453/39453 39452/39452/39442) [169,186,153,136] r=0 lpr=39452
> crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
> 2015-10-23 16:52:15.952676 7fa43fbeb700 20 osd.169 pg_epoch: 39454
> pg[12.fa( v 39422'12054 (39253'9006,39422'12054] local-les=39453 n=284
> ec=31174 les/c 39453/39453 39452/39452/39442) [169,186,153,136] r=0
> lpr=39452 crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
> op_has_sufficient_caps pool=12
> (ssd-pool ) owner=0 need_read_cap=1 need_write_cap=1
> need_class_read_cap=0 need_class_write_cap=0 -> NO
> 2015-10-23 16:52:15.952688 7fa42a1fd700 20 osd.169 39454
> should_share_map client.329656 10.208.16.31:0/1012854 39454
> 2015-10-23 16:52:15.952701 7fa43fbeb700 10 osd.169 39454 dequeue_op
> 0x287ba400 finish
> 2015-10-23 16:52:15.952729 7fa42a1fd700 15 osd.169 39454 enqueue_op
> 0x287c7500 prio 63 cost 4096 latency 0.000092
> osd_op(client.329656.0:165866002
> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 6369280~4096] 12.549948fa
> ack+ondisk+write
> +known_if_redirected e39454) v5
> 2015-10-23 16:52:15.952762 7fa43d3e6700 10 osd.169 39454 dequeue_op
> 0x287c7500 prio 63 cost 4096 latency 0.000125
> osd_op(client.329656.0:165866002
> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 6369280~4096] 12.549948fa
> ack+ondisk+write
> +known_if_redirected e39454) v5 pg pg[12.fa( v 39422'12054
> (39253'9006,39422'12054] local-les=39453 n=284 ec=31174 les/c
> 39453/39453 39452/39452/39442) [169,186,153,136] r=0 lpr=39452
> crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
> 2015-10-23 16:52:15.952787 7fa43d3e6700 20 osd.169 pg_epoch: 39454
> pg[12.fa( v 39422'12054 (39253'9006,39422'12054] local-les=39453 n=284
> ec=31174 les/c 39453/39453 39452/39452/39442) [169,186,153,136] r=0
> lpr=39452 crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
> op_has_sufficient_caps pool=12
> (ssd-pool ) owner=0 need_read_cap=1 need_write_cap=1
> need_class_read_cap=0 need_class_write_cap=0 -> NO
> 2015-10-23 16:52:15.952832 7fa43d3e6700 10 osd.169 39454 dequeue_op
> 0x287c7500 finish
> 
> It looks like maybe the OP is bouncing between threads or something
> like that and never getting dispatched correctly. This is on
> ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)

I don't see the same op looping in teh snippet above (each only appears 
once for enqueue_op and once for dequeue_op), but it's hard to read after 
wordwrapping to 80 cols.  Do you mind posting a larger snippet of the log 
somewhere?

sage

> I followed the directions at [1] in the writeback section for
> disabling the cache tier. I did this on another cluster recently and
> didn't see this problem (only took an hour or two to evict 700GB of
> data). I saw a very slow evict a while back, but just chalked it up to
> btrfs being dumb.
> 
> [1] http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.2
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJWKr8iCRDmVDuy+mK58QAAl7oP/2bfWT/DUxjxkEGIXOnW
> x2dk7hlTAkTAts7bRQn+vL31bcHAEoN+nJH5J39nt98Kh4rpZyHcVtJammYM
> hYNsc97Hah+O2iqDYWxr4Vu27eyq5icCw0oFo9hbsA6lGPcrgeVSGbnXaFk0
> +4u6OCovRDb6Rgfc6IAJoHrkulnPaWGaKWa1sfofdlMDJoA2U5cnX7ZvkRnM
> unnKKtAjmeVCFbt/QmwFJVEjdNh0yZXN2qLSJUVwf4APgB4G2trupQ6kuisn
> oGZdJP1DRtFn5IgoZQzB4ZA1FNydrVoQmrPNTAkCGQoUWegbmhxrXxsrfLV9
> jTky2inv8NLdP46VyEVyvOqFD3WVRYIP+KvVcaNnL2awMClOgZ4uish+rLK/
> Uyhcm9dfXgBz1Fm+PzdzpMwK8/lvk/UcfnBs6s5rELgjCQ/WPLMu7y1ZP9FY
> DzLsrtMdbMGZA7MTt9CKAe9fZ+sPC3Se8slh6mN7M7Tm2Knsv6uyKLJ06imW
> M7DMGVMi2A3I+qTE/DEA/D5MLcx+vJtLD57/a2j3dWaEiiBqaAsMA18UiwEX
> bMaHAbFMWBrlhIVytP4IaA7dbiEaO6C90pyHHtyVf1etKDfP+erNRtHVW4JI
> 8wpG87Z34o6r92fkbb0LRH/TvD+WQFU4Q0OiL5JqclJKJP6TNUUhMmstad62
> vmqk
> =dO6U
> -----END PGP SIGNATURE-----
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Really slow cache-evict
  2015-10-24  1:00 ` Sage Weil
@ 2015-10-24  1:49   ` Robert LeBlanc
  0 siblings, 0 replies; 4+ messages in thread
From: Robert LeBlanc @ 2015-10-24  1:49 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

20 minutes of logs are located at

http://162.144.87.113/files/evict.log.xz

I was looking at the object rbd_data.14143b5faf8a00.0000000000000262
for instance. There was no client I/O on the cluster during the
eviction.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Oct 23, 2015 at 7:00 PM, Sage Weil  wrote:
> On Fri, 23 Oct 2015, Robert LeBlanc wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> We are testing out cache tiering, but when evicting the cache on an
>> idle cluster it is extremely slow (10 objects per minutes). Looking at
>> top some of the OSD processes are busy, but the disks are idle across
>> the cluster. I upped debug_osd to 20/20 and saw millions of messages
>> per object like:
>>
>> 2015-10-23 16:52:15.952561 7fa42a1fd700 20 osd.169 39454
>> should_share_map client.329656 10.208.16.31:0/1012854 39454
>> 2015-10-23 16:52:15.952605 7fa42a1fd700 15 osd.169 39454 enqueue_op
>> 0x287ba400 prio 63 cost 450560 latency 0.000999
>> osd_op(client.329656.0:165865996
>> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
>> object_size 8388608 write_size 8388608,write 860160~450560]
>> 12.549948fa ack+ondisk+wr
>> ite+known_if_redirected e39454) v5
>> 2015-10-23 16:52:15.952642 7fa43fbeb700 10 osd.169 39454 dequeue_op
>> 0x287ba400 prio 63 cost 450560 latency 0.001036
>> osd_op(client.329656.0:165865996
>> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
>> object_size 8388608 write_size 8388608,write 860160~450560]
>> 12.549948fa ack+ondisk+wr
>> ite+known_if_redirected e39454) v5 pg pg[12.fa( v 39422'12054
>> (39253'9006,39422'12054] local-les=39453 n=284 ec=31174 les/c
>> 39453/39453 39452/39452/39442) [169,186,153,136] r=0 lpr=39452
>> crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
>> 2015-10-23 16:52:15.952676 7fa43fbeb700 20 osd.169 pg_epoch: 39454
>> pg[12.fa( v 39422'12054 (39253'9006,39422'12054] local-les=39453 n=284
>> ec=31174 les/c 39453/39453 39452/39452/39442) [169,186,153,136] r=0
>> lpr=39452 crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
>> op_has_sufficient_caps pool=12
>> (ssd-pool ) owner=0 need_read_cap=1 need_write_cap=1
>> need_class_read_cap=0 need_class_write_cap=0 -> NO
>> 2015-10-23 16:52:15.952688 7fa42a1fd700 20 osd.169 39454
>> should_share_map client.329656 10.208.16.31:0/1012854 39454
>> 2015-10-23 16:52:15.952701 7fa43fbeb700 10 osd.169 39454 dequeue_op
>> 0x287ba400 finish
>> 2015-10-23 16:52:15.952729 7fa42a1fd700 15 osd.169 39454 enqueue_op
>> 0x287c7500 prio 63 cost 4096 latency 0.000092
>> osd_op(client.329656.0:165866002
>> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
>> object_size 8388608 write_size 8388608,write 6369280~4096] 12.549948fa
>> ack+ondisk+write
>> +known_if_redirected e39454) v5
>> 2015-10-23 16:52:15.952762 7fa43d3e6700 10 osd.169 39454 dequeue_op
>> 0x287c7500 prio 63 cost 4096 latency 0.000125
>> osd_op(client.329656.0:165866002
>> rbd_data.14143b5faf8a00.000000000000041f [stat,set-alloc-hint
>> object_size 8388608 write_size 8388608,write 6369280~4096] 12.549948fa
>> ack+ondisk+write
>> +known_if_redirected e39454) v5 pg pg[12.fa( v 39422'12054
>> (39253'9006,39422'12054] local-les=39453 n=284 ec=31174 les/c
>> 39453/39453 39452/39452/39442) [169,186,153,136] r=0 lpr=39452
>> crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
>> 2015-10-23 16:52:15.952787 7fa43d3e6700 20 osd.169 pg_epoch: 39454
>> pg[12.fa( v 39422'12054 (39253'9006,39422'12054] local-les=39453 n=284
>> ec=31174 les/c 39453/39453 39452/39452/39442) [169,186,153,136] r=0
>> lpr=39452 crt=39422'12051 lcod 0'0 mlcod 0'0 active+clean]
>> op_has_sufficient_caps pool=12
>> (ssd-pool ) owner=0 need_read_cap=1 need_write_cap=1
>> need_class_read_cap=0 need_class_write_cap=0 -> NO
>> 2015-10-23 16:52:15.952832 7fa43d3e6700 10 osd.169 39454 dequeue_op
>> 0x287c7500 finish
>>
>> It looks like maybe the OP is bouncing between threads or something
>> like that and never getting dispatched correctly. This is on
>> ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)
>
> I don't see the same op looping in teh snippet above (each only appears
> once for enqueue_op and once for dequeue_op), but it's hard to read after
> wordwrapping to 80 cols.  Do you mind posting a larger snippet of the log
> somewhere?
>
> sage
>
>> I followed the directions at [1] in the writeback section for
>> disabling the cache tier. I did this on another cluster recently and
>> didn't see this problem (only took an hour or two to evict 700GB of
>> data). I saw a very slow evict a while back, but just chalked it up to
>> btrfs being dumb.
>>
>> [1] http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache
>> - ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>> -----BEGIN PGP SIGNATURE-----
>> Version: Mailvelope v1.2.2
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJWKr8iCRDmVDuy+mK58QAAl7oP/2bfWT/DUxjxkEGIXOnW
>> x2dk7hlTAkTAts7bRQn+vL31bcHAEoN+nJH5J39nt98Kh4rpZyHcVtJammYM
>> hYNsc97Hah+O2iqDYWxr4Vu27eyq5icCw0oFo9hbsA6lGPcrgeVSGbnXaFk0
>> +4u6OCovRDb6Rgfc6IAJoHrkulnPaWGaKWa1sfofdlMDJoA2U5cnX7ZvkRnM
>> unnKKtAjmeVCFbt/QmwFJVEjdNh0yZXN2qLSJUVwf4APgB4G2trupQ6kuisn
>> oGZdJP1DRtFn5IgoZQzB4ZA1FNydrVoQmrPNTAkCGQoUWegbmhxrXxsrfLV9
>> jTky2inv8NLdP46VyEVyvOqFD3WVRYIP+KvVcaNnL2awMClOgZ4uish+rLK/
>> Uyhcm9dfXgBz1Fm+PzdzpMwK8/lvk/UcfnBs6s5rELgjCQ/WPLMu7y1ZP9FY
>> DzLsrtMdbMGZA7MTt9CKAe9fZ+sPC3Se8slh6mN7M7Tm2Knsv6uyKLJ06imW
>> M7DMGVMi2A3I+qTE/DEA/D5MLcx+vJtLD57/a2j3dWaEiiBqaAsMA18UiwEX
>> bMaHAbFMWBrlhIVytP4IaA7dbiEaO6C90pyHHtyVf1etKDfP+erNRtHVW4JI
>> 8wpG87Z34o6r92fkbb0LRH/TvD+WQFU4Q0OiL5JqclJKJP6TNUUhMmstad62
>> vmqk
>> =dO6U
>> -----END PGP SIGNATURE-----
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.2
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWKuOXCRDmVDuy+mK58QAA5cAQAJqYTaELyx7luQ4fk54w
vcGfnlWBSHTN1gTyHWBefEQYZLylybiLy+bPJNsvav/QqMVUhRkBM+Wl4Ovy
p24VFArwWPchVjPKdpSS9c+9qRgh9sKMYBOqCCVv1Za0vAFDofwQkU0r29tQ
smm5afG5Bs93N4rKALd7coE8WLy7DHGdJIssEVcrSSn6Zqh56w1aA8RLgIWz
qCOEbPftLia0xEY/avnyViQzvPbUIV9DzAhTgxrR3Q5EXFn6kEmZa1O2MNlz
K9kwDrWqFZzMEsZs/BMT5G7r/LNxZyk+aqYkZGSTi2EGOPlhZoIx0iqEyIlD
/cQXOWFbbka/Z4iQNPWAY9jpP8/QW8Ls7ha0x6q9Nb1WofYGsTOE1mzoDEwq
5OJe9WKKUM337JKZM3ddbZi0kKWRwGoX7zaeARVKWG2rDDMjbvpXLJZRbxne
ukfJJCzbvDMPk4y7pgkHU7vgxgG7EZE9WNA7+BBhYYYAvkakY1PiSeoXcb5S
1ctSUjRNosJe2GPs3hmczbwjFmnMJJuSkO5IJPaszfRL17JZ5LvP7yRQKX22
P2+020894Z01o1+wDBP6J2qhIkew6v9/pgCU3ye4M39NRj42AoozBJHHtMrZ
iMoP1ODp/ujUu9hx3T6v4DfAGvV+uYsDU0WDzR7mdCoGPfJ/5DoaqYJxHRJj
51QV
=ScAS
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-10-24  1:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-23 23:13 Really slow cache-evict Robert LeBlanc
2015-10-24  0:47 ` Robert LeBlanc
2015-10-24  1:00 ` Sage Weil
2015-10-24  1:49   ` Robert LeBlanc

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.