All of lore.kernel.org
 help / color / mirror / Atom feed
* Cache tiering slow request issue: currently waiting for rw locks
@ 2014-08-29  6:43 Wang, Zhiqiang
  2014-08-29 14:18 ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Wang, Zhiqiang @ 2014-08-29  6:43 UTC (permalink / raw)
  To: 'ceph-devel@vger.kernel.org'

[-- Attachment #1: Type: text/plain, Size: 1793 bytes --]

Hi all,

I've ran into this slow request issue some time ago. The problem is like this: when running with cache tieing, there are 'slow request' warning messages in the log file like below.

2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.996595 secs
2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 currently waiting for rw locks

Recently I made some changes to the log, captured this problem, and finally figured out its root cause. You can check the attachment for the logs.

Here is the root cause:
There is a cache miss when doing read. During promotion, after copying the data from base tier osd, the cache tier primary osd replicates the data to other cache tier osds. Some times this takes quite a long time. During this period of time, the promoted object may be evicted because the cache tier is full. When the primary osd finally gets the replication response and restarts the original read request, it doesn't find the object in the cache tier, and do promotion again. This loops for several times, and we'll see the 'slow request' in the logs. Theoretically, this could loops forever, and the request from the client would never be finished.

There is a simple fix for this:
Add a field in the object state, indicating the status of the promotion. It's set to true after the copy of data from base tier and before the replication. It's reset to false after the replication and the original client request starts to execute. Evicting is not allowed when this field is true.

What do you think?

[-- Attachment #2: slow_request.log --]
[-- Type: application/octet-stream, Size: 43229 bytes --]

2014-08-29 10:17:53.673180 7f9b175ce700  1 -- 10.44.44.6:6812/56971 <== client.114176 10.44.44.7:0/2003169 1939 ==== osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 ==== 183+0+0 (2555001966 0 0) 0x3b156900 con 0x1e026420
2014-08-29 10:17:53.673200 7f9b175ce700  1 osd.20 26168 zqw _dispatch 0x3b156900 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4
2014-08-29 10:17:53.673250 7f9b07daf700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 0.000108 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1729 (0'0,26168'1729] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 crt=0'0 lcod 26168'1728 mlcod 26168'1728 active+clean]
2014-08-29 10:17:53.673275 7f9b07daf700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1729 (0'0,26168'1729] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 crt=0'0 lcod 26168'1728 mlcod 26168'1728 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:17:53.673235 7f9b175ce700  1 osd.20 26168 zqw enqueue_op 0x4cfbd1e0 prio 63 cost 0 latency 0.000086 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4
2014-08-29 10:17:53.673419 7f9b07daf700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1729 (0'0,26168'1729] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 crt=0'0 lcod 26168'1728 mlcod 26168'1728 active+clean] zqw promote_object cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45
2014-08-29 10:17:53.673467 7f9b07daf700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:19891 rb.0.17f56.6b8b4567.000000000935@snapdir [list-snaps] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x1f448240 con 0x1e2ad420
2014-08-29 10:17:53.673505 7f9b07daf700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:19892 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x1f0e4b40 con 0x1e2ad420
2014-08-29 10:17:53.674178 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1183 ==== osd_op_reply(19891 rb.0.17f56.6b8b4567.000000000935 [list-snaps] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+48 (932158059 0 984938543) 0x4cd23500 con 0x1e2ad420
2014-08-29 10:17:53.718270 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1184 ==== osd_op_reply(19892 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+4194379 (2177865586 0 2973039499) 0x2b509780 con 0x1e2ad420
2014-08-29 10:17:53.718441 7f9afe59c700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:20386 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1730 snapset=0=[]:[] snapc=0=[]) v10 -- ?+4195168 0x2e742c00
2014-08-29 10:17:53.718558 7f9afe59c700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1730 (0'0,26168'1730] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1729 crt=0'0 lcod 26168'1728 mlcod 26168'1728 active+clean] zqw process_copy_chunk cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 tid 19892 (0) Success
2014-08-29 10:17:53.718594 7f9b04da9700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 0.045452 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1730 (0'0,26168'1730] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1729 crt=0'0 lcod 26168'1728 mlcod 26168'1728 active+clean]
2014-08-29 10:17:53.718625 7f9b04da9700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1730 (0'0,26168'1730] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1729 crt=0'0 lcod 26168'1728 mlcod 26168'1728 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:17:58.724414 7f9afc497700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1730 (0'0,26168'1730] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1729 crt=0'0 lcod 26168'1728 mlcod 26168'1728 active+clean] zqw agent_maybe_evict evicting cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45(26168'1730 osd.20.0:20386 wrlock_by=unknown.0.0:0 s 4194304 uv122580)
2014-08-29 10:17:58.724641 7f9afc497700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:20479 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1731 snapset=0=[]:[] snapc=0=[]) v10 -- ?+138 0x4ce35c00
2014-08-29 10:18:01.397626 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12235 ==== osd_sub_op_reply(osd.20.0:20386 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (4279504679 0 0) 0x328c6280 con 0x1dee5580
2014-08-29 10:18:01.397653 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x328c6280 osd_sub_op_reply(osd.20.0:20386 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:01.397695 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x1e8b8b40 prio 196 cost 0 latency 0.000090 osd_sub_op_reply(osd.20.0:20386 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:01.397711 7f9b06dad700  1 osd.20 26168 zqw dequeue_op 0x1e8b8b40 prio 196 cost 0 latency 0.000116 osd_sub_op_reply(osd.20.0:20386 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1731 (0'0,26168'1731] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1729 crt=0'0 lcod 26168'1728 mlcod 26168'1728 active+clean]
2014-08-29 10:18:01.397767 7f9b085b0700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 7.724625 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1731 (0'0,26168'1731] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1730 crt=0'0 lcod 26168'1729 mlcod 26168'1729 active+clean]
2014-08-29 10:18:01.397790 7f9b085b0700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1731 (0'0,26168'1731] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1730 crt=0'0 lcod 26168'1729 mlcod 26168'1729 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:18:01.397817 7f9b085b0700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1731 (0'0,26168'1731] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1730 crt=0'0 lcod 26168'1729 mlcod 26168'1729 active+clean] zqw promote_object cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45
2014-08-29 10:18:01.397861 7f9b085b0700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:20033 rb.0.17f56.6b8b4567.000000000935@snapdir [list-snaps] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x1e0f8d80 con 0x1e2ad420
2014-08-29 10:18:01.397896 7f9b085b0700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:20034 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x4d072d80 con 0x1e2ad420
2014-08-29 10:18:01.398219 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1197 ==== osd_op_reply(20033 rb.0.17f56.6b8b4567.000000000935 [list-snaps] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+48 (932158059 0 984938543) 0x1e73d780 con 0x1e2ad420
2014-08-29 10:18:01.417255 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1199 ==== osd_op_reply(20034 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+4194379 (2177865586 0 2973039499) 0x1a10780 con 0x1e2ad420
2014-08-29 10:18:01.417372 7f9afe59c700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:20537 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1732 snapset=0=[]:[] snapc=0=[]) v10 -- ?+4195168 0x4cbe1800
2014-08-29 10:18:01.417474 7f9afe59c700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1732 (0'0,26168'1732] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1730 crt=0'0 lcod 26168'1729 mlcod 26168'1729 active+clean] zqw process_copy_chunk cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 tid 20034 (0) Success
2014-08-29 10:18:01.417508 7f9b035a6700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 7.744366 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1732 (0'0,26168'1732] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1730 crt=0'0 lcod 26168'1729 mlcod 26168'1729 active+clean]
2014-08-29 10:18:01.417527 7f9b035a6700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1732 (0'0,26168'1732] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1730 crt=0'0 lcod 26168'1729 mlcod 26168'1729 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:18:03.732540 7f9afc497700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1732 (0'0,26168'1732] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1730 crt=0'0 lcod 26168'1729 mlcod 26168'1729 active+clean] zqw agent_maybe_evict evicting cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45(26168'1732 osd.20.0:20537 wrlock_by=unknown.0.0:0 s 4194304 uv122580)
2014-08-29 10:18:03.732633 7f9afc497700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:20599 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1733 snapset=0=[]:[] snapc=0=[]) v10 -- ?+138 0x229f6400
2014-08-29 10:18:05.980037 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12291 ==== osd_sub_op_reply(osd.20.0:20479 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (1987887600 0 0) 0x1e4f3000 con 0x1dee5580
2014-08-29 10:18:05.980052 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x1e4f3000 osd_sub_op_reply(osd.20.0:20479 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:05.980073 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x1f9474b0 prio 196 cost 0 latency 0.000127 osd_sub_op_reply(osd.20.0:20479 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:05.980142 7f9b0bdb7700  1 osd.20 26168 zqw dequeue_op 0x1f9474b0 prio 196 cost 0 latency 0.000203 osd_sub_op_reply(osd.20.0:20479 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1733 (0'0,26168'1733] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1730 crt=0'0 lcod 26168'1729 mlcod 26168'1729 active+clean]
2014-08-29 10:18:06.172895 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12303 ==== osd_sub_op_reply(osd.20.0:20537 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (3629758774 0 0) 0x1f446a00 con 0x1dee5580
2014-08-29 10:18:06.172905 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x1f446a00 osd_sub_op_reply(osd.20.0:20537 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:06.172919 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x1ecafe10 prio 196 cost 0 latency 0.000070 osd_sub_op_reply(osd.20.0:20537 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:06.172939 7f9b03da7700  1 osd.20 26168 zqw dequeue_op 0x1ecafe10 prio 196 cost 0 latency 0.000095 osd_sub_op_reply(osd.20.0:20537 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1733 (0'0,26168'1733] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1731 crt=0'0 lcod 26168'1730 mlcod 26168'1730 active+clean]
2014-08-29 10:18:06.172975 7f9b105c0700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 12.499833 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1733 (0'0,26168'1733] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1732 crt=0'0 lcod 26168'1731 mlcod 26168'1731 active+clean]
2014-08-29 10:18:06.172994 7f9b105c0700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1733 (0'0,26168'1733] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1732 crt=0'0 lcod 26168'1731 mlcod 26168'1731 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:18:06.173018 7f9b105c0700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1733 (0'0,26168'1733] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1732 crt=0'0 lcod 26168'1731 mlcod 26168'1731 active+clean] zqw promote_object cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45
2014-08-29 10:18:06.173054 7f9b105c0700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:20123 rb.0.17f56.6b8b4567.000000000935@snapdir [list-snaps] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x20d45d80 con 0x1e2ad420
2014-08-29 10:18:06.173084 7f9b105c0700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:20124 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x20b726c0 con 0x1e2ad420
2014-08-29 10:18:06.173416 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1205 ==== osd_op_reply(20123 rb.0.17f56.6b8b4567.000000000935 [list-snaps] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+48 (932158059 0 984938543) 0x22702500 con 0x1e2ad420
2014-08-29 10:18:06.182529 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1206 ==== osd_op_reply(20124 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+4194379 (2177865586 0 2973039499) 0x1f4c0000 con 0x1e2ad420
2014-08-29 10:18:06.182633 7f9afe59c700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:20642 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1734 snapset=0=[]:[] snapc=0=[]) v10 -- ?+4195168 0x44f1a000
2014-08-29 10:18:06.182730 7f9afe59c700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1734 (0'0,26168'1734] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1732 crt=0'0 lcod 26168'1731 mlcod 26168'1731 active+clean] zqw process_copy_chunk cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 tid 20124 (0) Success
2014-08-29 10:18:06.182757 7f9b10dc1700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 12.509615 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1734 (0'0,26168'1734] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1732 crt=0'0 lcod 26168'1731 mlcod 26168'1731 active+clean]
2014-08-29 10:18:06.182778 7f9b10dc1700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1734 (0'0,26168'1734] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1732 crt=0'0 lcod 26168'1731 mlcod 26168'1731 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:18:08.739310 7f9afc497700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1734 (0'0,26168'1734] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1732 crt=0'0 lcod 26168'1731 mlcod 26168'1731 active+clean] zqw agent_maybe_evict evicting cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45(26168'1734 osd.20.0:20642 wrlock_by=unknown.0.0:0 s 4194304 uv122580)
2014-08-29 10:18:08.739420 7f9afc497700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:20667 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1735 snapset=0=[]:[] snapc=0=[]) v10 -- ?+138 0x44e90000
2014-08-29 10:18:09.772786 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12327 ==== osd_sub_op_reply(osd.20.0:20599 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (3496573979 0 0) 0x44fb6000 con 0x1dee5580
2014-08-29 10:18:09.772797 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x44fb6000 osd_sub_op_reply(osd.20.0:20599 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:09.772814 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x19060f0 prio 196 cost 0 latency 0.000726 osd_sub_op_reply(osd.20.0:20599 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:09.772895 7f9b0edbd700  1 osd.20 26168 zqw dequeue_op 0x19060f0 prio 196 cost 0 latency 0.000815 osd_sub_op_reply(osd.20.0:20599 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1735 (0'0,26168'1735] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1732 crt=0'0 lcod 26168'1731 mlcod 26168'1731 active+clean]
2014-08-29 10:18:16.630541 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12373 ==== osd_sub_op_reply(osd.20.0:20642 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (1109421249 0 0) 0x22702c80 con 0x1dee5580
2014-08-29 10:18:16.630552 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x22702c80 osd_sub_op_reply(osd.20.0:20642 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:16.630567 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x1f947a50 prio 196 cost 0 latency 0.000473 osd_sub_op_reply(osd.20.0:20642 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:16.630591 7f9b0f5be700  1 osd.20 26168 zqw dequeue_op 0x1f947a50 prio 196 cost 0 latency 0.000502 osd_sub_op_reply(osd.20.0:20642 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1735 (0'0,26168'1735] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1733 crt=0'0 lcod 26168'1732 mlcod 26168'1732 active+clean]
2014-08-29 10:18:16.630710 7f9b0edbd700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 22.957568 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1735 (0'0,26168'1735] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1734 crt=0'0 lcod 26168'1733 mlcod 26168'1733 active+clean]
2014-08-29 10:18:16.630738 7f9b0edbd700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1735 (0'0,26168'1735] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1734 crt=0'0 lcod 26168'1733 mlcod 26168'1733 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:18:16.630768 7f9b0edbd700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1735 (0'0,26168'1735] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1734 crt=0'0 lcod 26168'1733 mlcod 26168'1733 active+clean] zqw promote_object cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45
2014-08-29 10:18:16.630810 7f9b0edbd700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:20269 rb.0.17f56.6b8b4567.000000000935@snapdir [list-snaps] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x1e1dd6c0 con 0x1e2ad420
2014-08-29 10:18:16.630850 7f9b0edbd700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:20270 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x1f6f7480 con 0x1e2ad420
2014-08-29 10:18:16.631349 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1220 ==== osd_op_reply(20269 rb.0.17f56.6b8b4567.000000000935 [list-snaps] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+48 (932158059 0 984938543) 0x1f763280 con 0x1e2ad420
2014-08-29 10:18:16.634150 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12379 ==== osd_sub_op_reply(osd.20.0:20667 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (3086610680 0 0) 0x20a94780 con 0x1dee5580
2014-08-29 10:18:16.634158 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x20a94780 osd_sub_op_reply(osd.20.0:20667 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:16.634743 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x1e818a50 prio 196 cost 0 latency 0.000311 osd_sub_op_reply(osd.20.0:20667 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:16.634798 7f9b08db1700  1 osd.20 26168 zqw dequeue_op 0x1e818a50 prio 196 cost 0 latency 0.000944 osd_sub_op_reply(osd.20.0:20667 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1735 (0'0,26168'1735] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1734 crt=0'0 lcod 26168'1733 mlcod 26168'1733 active+clean]
2014-08-29 10:18:16.640722 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1221 ==== osd_op_reply(20270 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+4194379 (2177865586 0 2973039499) 0x20d75c80 con 0x1e2ad420
2014-08-29 10:18:16.640847 7f9afe59c700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:20773 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1736 snapset=0=[]:[] snapc=0=[]) v10 -- ?+4195168 0x1e8b4400
2014-08-29 10:18:16.640970 7f9afe59c700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1736 (0'0,26168'1736] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1735 crt=0'0 lcod 26168'1734 mlcod 26168'1734 active+clean] zqw process_copy_chunk cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 tid 20270 (0) Success
2014-08-29 10:18:16.641002 7f9b07daf700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 22.967860 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1736 (0'0,26168'1736] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1735 crt=0'0 lcod 26168'1734 mlcod 26168'1734 active+clean]
2014-08-29 10:18:16.641040 7f9b07daf700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1736 (0'0,26168'1736] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1735 crt=0'0 lcod 26168'1734 mlcod 26168'1734 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:18:21.644823 7f9afc497700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1736 (0'0,26168'1736] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1735 crt=0'0 lcod 26168'1734 mlcod 26168'1734 active+clean] zqw agent_maybe_evict evicting cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45(26168'1736 osd.20.0:20773 wrlock_by=unknown.0.0:0 s 4194304 uv122580)
2014-08-29 10:18:21.644884 7f9afc497700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:20852 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1737 snapset=0=[]:[] snapc=0=[]) v10 -- ?+138 0x1ee1c400
2014-08-29 10:18:21.878816 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12429 ==== osd_sub_op_reply(osd.20.0:20773 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (699447218 0 0) 0x20a97500 con 0x1dee5580
2014-08-29 10:18:21.878825 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x20a97500 osd_sub_op_reply(osd.20.0:20773 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:21.878837 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x1e5a4c30 prio 196 cost 0 latency 0.000134 osd_sub_op_reply(osd.20.0:20773 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:21.878854 7f9b085b0700  1 osd.20 26168 zqw dequeue_op 0x1e5a4c30 prio 196 cost 0 latency 0.000157 osd_sub_op_reply(osd.20.0:20773 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1737 (0'0,26168'1737] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1735 crt=0'0 lcod 26168'1734 mlcod 26168'1734 active+clean]
2014-08-29 10:18:21.878927 7f9b055aa700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 28.205785 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1737 (0'0,26168'1737] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1736 crt=0'0 lcod 26168'1735 mlcod 26168'1735 active+clean]
2014-08-29 10:18:21.878943 7f9b055aa700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1737 (0'0,26168'1737] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1736 crt=0'0 lcod 26168'1735 mlcod 26168'1735 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:18:21.878963 7f9b055aa700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1737 (0'0,26168'1737] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1736 crt=0'0 lcod 26168'1735 mlcod 26168'1735 active+clean] zqw promote_object cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45
2014-08-29 10:18:21.879094 7f9b055aa700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:20351 rb.0.17f56.6b8b4567.000000000935@snapdir [list-snaps] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x2b449d80 con 0x1e2ad420
2014-08-29 10:18:21.879120 7f9b055aa700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:20352 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x20c82d80 con 0x1e2ad420
2014-08-29 10:18:21.879430 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1229 ==== osd_op_reply(20351 rb.0.17f56.6b8b4567.000000000935 [list-snaps] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+48 (932158059 0 984938543) 0x1f619500 con 0x1e2ad420
2014-08-29 10:18:21.889361 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1230 ==== osd_op_reply(20352 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+4194379 (2177865586 0 2973039499) 0x1f619280 con 0x1e2ad420
2014-08-29 10:18:21.889565 7f9afe59c700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:20853 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1738 snapset=0=[]:[] snapc=0=[]) v10 -- ?+4195168 0x3b204400
2014-08-29 10:18:21.889708 7f9afe59c700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1738 (0'0,26168'1738] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1736 crt=0'0 lcod 26168'1735 mlcod 26168'1735 active+clean] zqw process_copy_chunk cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 tid 20352 (0) Success
2014-08-29 10:18:21.889747 7f9b0bdb7700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 28.216605 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1738 (0'0,26168'1738] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1736 crt=0'0 lcod 26168'1735 mlcod 26168'1735 active+clean]
2014-08-29 10:18:21.889778 7f9b0bdb7700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1738 (0'0,26168'1738] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1736 crt=0'0 lcod 26168'1735 mlcod 26168'1735 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 currently waiting for rw locks
2014-08-29 10:18:26.649173 7f9afc497700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1738 (0'0,26168'1738] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1736 crt=0'0 lcod 26168'1735 mlcod 26168'1735 active+clean] zqw agent_maybe_evict evicting cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45(26168'1738 osd.20.0:20853 wrlock_by=unknown.0.0:0 s 4194304 uv122580)
2014-08-29 10:18:26.649242 7f9afc497700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:20919 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1739 snapset=0=[]:[] snapc=0=[]) v10 -- ?+138 0x20d59800
2014-08-29 10:18:28.015252 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12471 ==== osd_sub_op_reply(osd.20.0:20852 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (555826943 0 0) 0x1f5e0000 con 0x1dee5580
2014-08-29 10:18:28.015259 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x1f5e0000 osd_sub_op_reply(osd.20.0:20852 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:28.015267 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x1eed81e0 prio 196 cost 0 latency 0.000059 osd_sub_op_reply(osd.20.0:20852 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:28.015275 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12472 ==== osd_sub_op_reply(osd.20.0:20853 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (3154299610 0 0) 0x3277f280 con 0x1dee5580
2014-08-29 10:18:28.015283 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x3277f280 osd_sub_op_reply(osd.20.0:20853 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:28.015295 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x1e966b40 prio 196 cost 0 latency 0.000060 osd_sub_op_reply(osd.20.0:20853 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:28.015353 7f9b0b5b6700  1 osd.20 26168 zqw dequeue_op 0x1eed81e0 prio 196 cost 0 latency 0.000148 osd_sub_op_reply(osd.20.0:20852 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1739 (0'0,26168'1739] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1736 crt=0'0 lcod 26168'1735 mlcod 26168'1735 active+clean]
2014-08-29 10:18:28.015395 7f9b0b5b6700  1 osd.20 26168 zqw dequeue_op 0x1e966b40 prio 196 cost 0 latency 0.000165 osd_sub_op_reply(osd.20.0:20853 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1739 (0'0,26168'1739] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1737 crt=0'0 lcod 26168'1736 mlcod 26168'1736 active+clean]
2014-08-29 10:18:28.015476 7f9b0cdb9700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 34.342334 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1739 (0'0,26168'1739] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1738 crt=0'0 lcod 26168'1737 mlcod 26168'1737 active+clean]
2014-08-29 10:18:28.015505 7f9b0cdb9700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1739 (0'0,26168'1739] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1738 crt=0'0 lcod 26168'1737 mlcod 26168'1737 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:18:28.015536 7f9b0cdb9700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1739 (0'0,26168'1739] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1738 crt=0'0 lcod 26168'1737 mlcod 26168'1737 active+clean] zqw promote_object cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45
2014-08-29 10:18:28.015583 7f9b0cdb9700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:20447 rb.0.17f56.6b8b4567.000000000935@snapdir [list-snaps] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x44fefb40 con 0x1e2ad420
2014-08-29 10:18:28.015627 7f9b0cdb9700  1 -- 10.44.44.6:6814/56971 --> 10.44.44.6:6825/57237 -- osd_op(osd.20.26163:20448 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] 14.cf45084b ack+read+ignore_cache+ignore_overlay+map_snap_clone e26168) v4 -- ?+0 0x1efa2240 con 0x1e2ad420
2014-08-29 10:18:28.016235 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1235 ==== osd_op_reply(20447 rb.0.17f56.6b8b4567.000000000935 [list-snaps] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+48 (932158059 0 984938543) 0x1e054500 con 0x1e2ad420
2014-08-29 10:18:28.031943 7f9b12dc5700  1 -- 10.44.44.6:6814/56971 <== osd.13 10.44.44.6:6825/57237 1236 ==== osd_op_reply(20448 rb.0.17f56.6b8b4567.000000000935 [copy-get max 8388608] v0'0 uv122580 ondisk = 0) v6 ==== 199+0+4194379 (2177865586 0 2973039499) 0x1e054280 con 0x1e2ad420
2014-08-29 10:18:28.032069 7f9afe59c700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:20950 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1740 snapset=0=[]:[] snapc=0=[]) v10 -- ?+4195168 0x2b299000
2014-08-29 10:18:28.032210 7f9afe59c700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1740 (0'0,26168'1740] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1738 crt=0'0 lcod 26168'1737 mlcod 26168'1737 active+clean] zqw process_copy_chunk cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 tid 20448 (0) Success
2014-08-29 10:18:28.032256 7f9b0cdb9700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 34.359114 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1740 (0'0,26168'1740] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1738 crt=0'0 lcod 26168'1737 mlcod 26168'1737 active+clean]
2014-08-29 10:18:28.032282 7f9b0cdb9700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1740 (0'0,26168'1740] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1738 crt=0'0 lcod 26168'1737 mlcod 26168'1737 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:18:31.406308 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12510 ==== osd_sub_op_reply(osd.20.0:20919 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (839358301 0 0) 0x44deba00 con 0x1dee5580
2014-08-29 10:18:31.406330 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x44deba00 osd_sub_op_reply(osd.20.0:20919 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:31.406360 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x1f2592d0 prio 196 cost 0 latency 0.000078 osd_sub_op_reply(osd.20.0:20919 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:31.406378 7f9b03da7700  1 osd.20 26168 zqw dequeue_op 0x1f2592d0 prio 196 cost 0 latency 0.000103 osd_sub_op_reply(osd.20.0:20919 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1740 (0'0,26168'1740] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1738 crt=0'0 lcod 26168'1737 mlcod 26168'1737 active+clean]
2014-08-29 10:18:31.406822 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12522 ==== osd_sub_op_reply(osd.20.0:20950 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (2383987118 0 0) 0x1f8d9280 con 0x1dee5580
2014-08-29 10:18:31.406837 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x1f8d9280 osd_sub_op_reply(osd.20.0:20950 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:31.406910 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x2b4bdc30 prio 196 cost 0 latency 0.000178 osd_sub_op_reply(osd.20.0:20950 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:31.406923 7f9b0f5be700  1 osd.20 26168 zqw dequeue_op 0x2b4bdc30 prio 196 cost 0 latency 0.000197 osd_sub_op_reply(osd.20.0:20950 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1740 (0'0,26168'1740] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1739 crt=0'0 lcod 26168'1738 mlcod 26168'1738 active+clean]
2014-08-29 10:18:31.406987 7f9b0f5be700  1 osd.20 26168 zqw dequeue_op 0x4cfbd1e0 prio 63 cost 0 latency 37.733845 osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 pg pg[45.4b( v 26168'1740 (0'0,26168'1740] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 crt=0'0 lcod 26168'1739 mlcod 26168'1739 active+clean]
2014-08-29 10:18:31.407010 7f9b0f5be700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1740 (0'0,26168'1740] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 crt=0'0 lcod 26168'1739 mlcod 26168'1739 active+clean] zqw do_op osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 may_read -> read-ordered flags ack+read
2014-08-29 10:18:31.407166 7f9b0f5be700  1 -- 10.44.44.6:6812/56971 --> 10.44.44.7:0/2003169 -- osd_op_reply(144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] v0'0 uv122580 ondisk = 0) v6 -- ?+0 0x1f8d9280 con 0x1e026420
2014-08-29 10:18:31.407195 7f9b0f5be700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1740 (0'0,26168'1740] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 crt=0'0 lcod 26168'1739 mlcod 26168'1739 active+clean] zqw finishing request, sending reply to client osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4
2014-08-29 10:18:31.655723 7f9afc497700  1 osd.20 pg_epoch: 26168 pg[45.4b( v 26168'1740 (0'0,26168'1740] local-les=26164 n=35 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 crt=0'0 lcod 26168'1739 mlcod 26168'1739 active+clean] zqw agent_maybe_evict evicting cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45(26168'1740 osd.20.0:20950 wrlock_by=unknown.0.0:0 s 4194304 uv122580)
2014-08-29 10:18:31.655816 7f9afc497700  1 -- 10.44.45.6:6806/56971 --> osd.19 10.44.45.5:6812/93696 -- osd_sub_op(osd.20.0:21056 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] v 26168'1741 snapset=0=[]:[] snapc=0=[]) v10 -- ?+138 0x1f2ba800
2014-08-29 10:18:34.337506 7f9b165cc700  1 -- 10.44.45.6:6806/56971 <== osd.19 10.44.45.5:6812/93696 12556 ==== osd_sub_op_reply(osd.20.0:21056 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 ==== 170+0+0 (1122054819 0 0) 0x1ef5c500 con 0x1dee5580
2014-08-29 10:18:34.337513 7f9b165cc700  1 osd.20 26168 zqw _dispatch 0x1ef5c500 osd_sub_op_reply(osd.20.0:21056 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:34.337540 7f9b165cc700  1 osd.20 26168 zqw enqueue_op 0x1e8182d0 prio 196 cost 0 latency 0.000053 osd_sub_op_reply(osd.20.0:21056 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2
2014-08-29 10:18:34.337557 7f9b025a4700  1 osd.20 26168 zqw dequeue_op 0x1e8182d0 prio 196 cost 0 latency 0.000090 osd_sub_op_reply(osd.20.0:21056 45.4b cf45084b/rb.0.17f56.6b8b4567.000000000935/head//45 [] ondisk, result = 0) v2 pg pg[45.4b( v 26168'1741 (0'0,26168'1741] local-les=26164 n=34 ec=25839 les/c 26164/26164 26163/26163/26163) [20,19] r=0 lpr=26163 luod=26168'1740 crt=0'0 lcod 26168'1739 mlcod 26168'1739 active+clean]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cache tiering slow request issue: currently waiting for rw locks
  2014-08-29  6:43 Cache tiering slow request issue: currently waiting for rw locks Wang, Zhiqiang
@ 2014-08-29 14:18 ` Sage Weil
  2014-08-30  2:29   ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2014-08-29 14:18 UTC (permalink / raw)
  To: Wang, Zhiqiang; +Cc: 'ceph-devel@vger.kernel.org'

Hi,

I've opened http://tracker.ceph.com/issues/9285 to track this.

I think you're right--we need a check in agent_maybe_evict() that will 
skip objects that are being promoted.  I suspect a flag on the 
ObjectContext is enough?

sage


On Fri, 29 Aug 2014, Wang, Zhiqiang wrote:

> Hi all,
> 
> I've ran into this slow request issue some time ago. The problem is like this: when running with cache tieing, there are 'slow request' warning messages in the log file like below.
> 
> 2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.996595 secs
> 2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 currently waiting for rw locks
> 
> Recently I made some changes to the log, captured this problem, and finally figured out its root cause. You can check the attachment for the logs.
> 
> Here is the root cause:
> There is a cache miss when doing read. During promotion, after copying the data from base tier osd, the cache tier primary osd replicates the data to other cache tier osds. Some times this takes quite a long time. During this period of time, the promoted object may be evicted because the cache tier is full. When the primary osd finally gets the replication response and restarts the original read request, it doesn't find the object in the cache tier, and do promotion again. This loops for several times, and we'll see the 'slow request' in the logs. Theoretically, this could loops forever, and the request from the client would never be finished.
> 
> There is a simple fix for this:
> Add a field in the object state, indicating the status of the promotion. It's set to true after the copy of data from base tier and before the replication. It's reset to false after the replication and the original client request starts to execute. Evicting is not allowed when this field is true.
> 
> What do you think?
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cache tiering slow request issue: currently waiting for rw locks
  2014-08-29 14:18 ` Sage Weil
@ 2014-08-30  2:29   ` Sage Weil
  2014-09-01  1:33     ` Wang, Zhiqiang
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2014-08-30  2:29 UTC (permalink / raw)
  To: Wang, Zhiqiang; +Cc: 'ceph-devel@vger.kernel.org'

Hi,

Can you take a look at https://github.com/ceph/ceph/pull/2363 and see if 
that addresses the behavior you saw?

Thanks!
sage


On Fri, 29 Aug 2014, Sage Weil wrote:

> Hi,
> 
> I've opened http://tracker.ceph.com/issues/9285 to track this.
> 
> I think you're right--we need a check in agent_maybe_evict() that will 
> skip objects that are being promoted.  I suspect a flag on the 
> ObjectContext is enough?
> 
> sage
> 
> 
> On Fri, 29 Aug 2014, Wang, Zhiqiang wrote:
> 
> > Hi all,
> > 
> > I've ran into this slow request issue some time ago. The problem is like this: when running with cache tieing, there are 'slow request' warning messages in the log file like below.
> > 
> > 2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.996595 secs
> > 2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 currently waiting for rw locks
> > 
> > Recently I made some changes to the log, captured this problem, and finally figured out its root cause. You can check the attachment for the logs.
> > 
> > Here is the root cause:
> > There is a cache miss when doing read. During promotion, after copying the data from base tier osd, the cache tier primary osd replicates the data to other cache tier osds. Some times this takes quite a long time. During this period of time, the promoted object may be evicted because the cache tier is full. When the primary osd finally gets the replication response and restarts the original read request, it doesn't find the object in the cache tier, and do promotion again. This loops for several times, and we'll see the 'slow request' in the logs. Theoretically, this could loops forever, and the request from the client would never be finished.
> > 
> > There is a simple fix for this:
> > Add a field in the object state, indicating the status of the promotion. It's set to true after the copy of data from base tier and before the replication. It's reset to false after the replication and the original client request starts to execute. Evicting is not allowed when this field is true.
> > 
> > What do you think?
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Cache tiering slow request issue: currently waiting for rw locks
  2014-08-30  2:29   ` Sage Weil
@ 2014-09-01  1:33     ` Wang, Zhiqiang
  2014-09-02  6:54       ` Wang, Zhiqiang
  0 siblings, 1 reply; 11+ messages in thread
From: Wang, Zhiqiang @ 2014-09-01  1:33 UTC (permalink / raw)
  To: Sage Weil; +Cc: 'ceph-devel@vger.kernel.org'

I don't think the object context is blocked at that time. It is un-blocked after the copying of data from base tier. It doesn't address the problem here. Anyway, I'll try it and see.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
Sent: Saturday, August 30, 2014 10:29 AM
To: Wang, Zhiqiang
Cc: 'ceph-devel@vger.kernel.org'
Subject: Re: Cache tiering slow request issue: currently waiting for rw locks

Hi,

Can you take a look at https://github.com/ceph/ceph/pull/2363 and see if that addresses the behavior you saw?

Thanks!
sage


On Fri, 29 Aug 2014, Sage Weil wrote:

> Hi,
> 
> I've opened http://tracker.ceph.com/issues/9285 to track this.
> 
> I think you're right--we need a check in agent_maybe_evict() that will 
> skip objects that are being promoted.  I suspect a flag on the 
> ObjectContext is enough?
> 
> sage
> 
> 
> On Fri, 29 Aug 2014, Wang, Zhiqiang wrote:
> 
> > Hi all,
> > 
> > I've ran into this slow request issue some time ago. The problem is like this: when running with cache tieing, there are 'slow request' warning messages in the log file like below.
> > 
> > 2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow 
> > requests, 1 included below; oldest blocked for > 30.996595 secs
> > 2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request 
> > 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: 
> > osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 
> > [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 currently 
> > waiting for rw locks
> > 
> > Recently I made some changes to the log, captured this problem, and finally figured out its root cause. You can check the attachment for the logs.
> > 
> > Here is the root cause:
> > There is a cache miss when doing read. During promotion, after copying the data from base tier osd, the cache tier primary osd replicates the data to other cache tier osds. Some times this takes quite a long time. During this period of time, the promoted object may be evicted because the cache tier is full. When the primary osd finally gets the replication response and restarts the original read request, it doesn't find the object in the cache tier, and do promotion again. This loops for several times, and we'll see the 'slow request' in the logs. Theoretically, this could loops forever, and the request from the client would never be finished.
> > 
> > There is a simple fix for this:
> > Add a field in the object state, indicating the status of the promotion. It's set to true after the copy of data from base tier and before the replication. It's reset to false after the replication and the original client request starts to execute. Evicting is not allowed when this field is true.
> > 
> > What do you think?
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Cache tiering slow request issue: currently waiting for rw locks
  2014-09-01  1:33     ` Wang, Zhiqiang
@ 2014-09-02  6:54       ` Wang, Zhiqiang
  2014-09-05  7:20         ` Wang, Zhiqiang
  0 siblings, 1 reply; 11+ messages in thread
From: Wang, Zhiqiang @ 2014-09-02  6:54 UTC (permalink / raw)
  To: Sage Weil; +Cc: 'ceph-devel@vger.kernel.org'

Tried the pull request, checking the object is blocked or not doesn't work. Actually this check is already done in function agent_work.

I tried to make a fix to add a field/flag to the object context. This is not a good idea for the following reasons:
1) If making this filed/flag to be a persistent one, when resetting/clearing this flag, we need to persist it. This is not good for read request.
2) If making this field/flag not to be a persistent one, when the object context is removed from the cache ' object_contexts', this field/flag is removed as well. This object is removed in the later evicting. The same issue still exists.

So, I came up with a fix to add a set in the class ReplicatedPG to hold all the promoting objects. This fix is at https://github.com/ceph/ceph/pull/2374. It is tested and works well. Pls review and comment, thx.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
Sent: Monday, September 1, 2014 9:33 AM
To: Sage Weil
Cc: 'ceph-devel@vger.kernel.org'
Subject: RE: Cache tiering slow request issue: currently waiting for rw locks

I don't think the object context is blocked at that time. It is un-blocked after the copying of data from base tier. It doesn't address the problem here. Anyway, I'll try it and see.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
Sent: Saturday, August 30, 2014 10:29 AM
To: Wang, Zhiqiang
Cc: 'ceph-devel@vger.kernel.org'
Subject: Re: Cache tiering slow request issue: currently waiting for rw locks

Hi,

Can you take a look at https://github.com/ceph/ceph/pull/2363 and see if that addresses the behavior you saw?

Thanks!
sage


On Fri, 29 Aug 2014, Sage Weil wrote:

> Hi,
> 
> I've opened http://tracker.ceph.com/issues/9285 to track this.
> 
> I think you're right--we need a check in agent_maybe_evict() that will 
> skip objects that are being promoted.  I suspect a flag on the 
> ObjectContext is enough?
> 
> sage
> 
> 
> On Fri, 29 Aug 2014, Wang, Zhiqiang wrote:
> 
> > Hi all,
> > 
> > I've ran into this slow request issue some time ago. The problem is like this: when running with cache tieing, there are 'slow request' warning messages in the log file like below.
> > 
> > 2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow 
> > requests, 1 included below; oldest blocked for > 30.996595 secs
> > 2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request
> > 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: 
> > osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 
> > [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 currently 
> > waiting for rw locks
> > 
> > Recently I made some changes to the log, captured this problem, and finally figured out its root cause. You can check the attachment for the logs.
> > 
> > Here is the root cause:
> > There is a cache miss when doing read. During promotion, after copying the data from base tier osd, the cache tier primary osd replicates the data to other cache tier osds. Some times this takes quite a long time. During this period of time, the promoted object may be evicted because the cache tier is full. When the primary osd finally gets the replication response and restarts the original read request, it doesn't find the object in the cache tier, and do promotion again. This loops for several times, and we'll see the 'slow request' in the logs. Theoretically, this could loops forever, and the request from the client would never be finished.
> > 
> > There is a simple fix for this:
> > Add a field in the object state, indicating the status of the promotion. It's set to true after the copy of data from base tier and before the replication. It's reset to false after the replication and the original client request starts to execute. Evicting is not allowed when this field is true.
> > 
> > What do you think?
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Cache tiering slow request issue: currently waiting for rw locks
  2014-09-02  6:54       ` Wang, Zhiqiang
@ 2014-09-05  7:20         ` Wang, Zhiqiang
  2014-09-09  7:12           ` Wang, Zhiqiang
  0 siblings, 1 reply; 11+ messages in thread
From: Wang, Zhiqiang @ 2014-09-05  7:20 UTC (permalink / raw)
  To: Sage Weil, 'sam.just@inktank.com'
  Cc: 'ceph-devel@vger.kernel.org'

I made some comments based on your comments of the pull request https://github.com/ceph/ceph/pull/2374. Can you take a look? Thx.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
Sent: Tuesday, September 2, 2014 2:54 PM
To: Sage Weil
Cc: 'ceph-devel@vger.kernel.org'
Subject: RE: Cache tiering slow request issue: currently waiting for rw locks

Tried the pull request, checking the object is blocked or not doesn't work. Actually this check is already done in function agent_work.

I tried to make a fix to add a field/flag to the object context. This is not a good idea for the following reasons:
1) If making this filed/flag to be a persistent one, when resetting/clearing this flag, we need to persist it. This is not good for read request.
2) If making this field/flag not to be a persistent one, when the object context is removed from the cache ' object_contexts', this field/flag is removed as well. This object is removed in the later evicting. The same issue still exists.

So, I came up with a fix to add a set in the class ReplicatedPG to hold all the promoting objects. This fix is at https://github.com/ceph/ceph/pull/2374. It is tested and works well. Pls review and comment, thx.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
Sent: Monday, September 1, 2014 9:33 AM
To: Sage Weil
Cc: 'ceph-devel@vger.kernel.org'
Subject: RE: Cache tiering slow request issue: currently waiting for rw locks

I don't think the object context is blocked at that time. It is un-blocked after the copying of data from base tier. It doesn't address the problem here. Anyway, I'll try it and see.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
Sent: Saturday, August 30, 2014 10:29 AM
To: Wang, Zhiqiang
Cc: 'ceph-devel@vger.kernel.org'
Subject: Re: Cache tiering slow request issue: currently waiting for rw locks

Hi,

Can you take a look at https://github.com/ceph/ceph/pull/2363 and see if that addresses the behavior you saw?

Thanks!
sage


On Fri, 29 Aug 2014, Sage Weil wrote:

> Hi,
> 
> I've opened http://tracker.ceph.com/issues/9285 to track this.
> 
> I think you're right--we need a check in agent_maybe_evict() that will 
> skip objects that are being promoted.  I suspect a flag on the 
> ObjectContext is enough?
> 
> sage
> 
> 
> On Fri, 29 Aug 2014, Wang, Zhiqiang wrote:
> 
> > Hi all,
> > 
> > I've ran into this slow request issue some time ago. The problem is like this: when running with cache tieing, there are 'slow request' warning messages in the log file like below.
> > 
> > 2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow 
> > requests, 1 included below; oldest blocked for > 30.996595 secs
> > 2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request
> > 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: 
> > osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 
> > [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 currently 
> > waiting for rw locks
> > 
> > Recently I made some changes to the log, captured this problem, and finally figured out its root cause. You can check the attachment for the logs.
> > 
> > Here is the root cause:
> > There is a cache miss when doing read. During promotion, after copying the data from base tier osd, the cache tier primary osd replicates the data to other cache tier osds. Some times this takes quite a long time. During this period of time, the promoted object may be evicted because the cache tier is full. When the primary osd finally gets the replication response and restarts the original read request, it doesn't find the object in the cache tier, and do promotion again. This loops for several times, and we'll see the 'slow request' in the logs. Theoretically, this could loops forever, and the request from the client would never be finished.
> > 
> > There is a simple fix for this:
> > Add a field in the object state, indicating the status of the promotion. It's set to true after the copy of data from base tier and before the replication. It's reset to false after the replication and the original client request starts to execute. Evicting is not allowed when this field is true.
> > 
> > What do you think?
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Cache tiering slow request issue: currently waiting for rw locks
  2014-09-05  7:20         ` Wang, Zhiqiang
@ 2014-09-09  7:12           ` Wang, Zhiqiang
  2014-09-09  8:45             ` Chen, Xiaoxi
  2014-09-09 16:34             ` Sage Weil
  0 siblings, 2 replies; 11+ messages in thread
From: Wang, Zhiqiang @ 2014-09-09  7:12 UTC (permalink / raw)
  To: Sage Weil, 'sam.just@inktank.com'
  Cc: 'ceph-devel@vger.kernel.org'

Pasting the conversations in the pull request here.

wonzhq commented 6 days ago
If checking ObjectContext::RWState::waiters and ObjectContext::RWState::count for the pending requests on this object, there is still a window which the problem can happen. That is after the promotion replication and requeuing the client request, and before dequeuing the client request. Should we loop the OSD::op_wq to check the pending requests on an object? Or adding something in the ObjectContext to remember the pending requests? @athanatos @liewegas

liewegas commented 10 hours ago
Hmm, that's true that there is still that window. Is it necessary that this is completely air-tight, though? As long as we avoid evicting a newly-promoted object before the request is processed we will win. I'm afraid that a complicated mechanism to cover this could introduce more complexity than we need.

wonzhq commented 2 minutes ago
Tried to use ObjectContext::RWState::count to check the pending request. In my testing, it hit the slow request just once. I checked the log, it exactly falls into the window we talked above. So with this solution, it's possible that we still hit this issue, but much less than before. Should we go ahead with this solution?

Sage/Sam, what do you think?

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
Sent: Friday, September 5, 2014 3:21 PM
To: Sage Weil; 'sam.just@inktank.com'
Cc: 'ceph-devel@vger.kernel.org'
Subject: RE: Cache tiering slow request issue: currently waiting for rw locks

I made some comments based on your comments of the pull request https://github.com/ceph/ceph/pull/2374. Can you take a look? Thx.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
Sent: Tuesday, September 2, 2014 2:54 PM
To: Sage Weil
Cc: 'ceph-devel@vger.kernel.org'
Subject: RE: Cache tiering slow request issue: currently waiting for rw locks

Tried the pull request, checking the object is blocked or not doesn't work. Actually this check is already done in function agent_work.

I tried to make a fix to add a field/flag to the object context. This is not a good idea for the following reasons:
1) If making this filed/flag to be a persistent one, when resetting/clearing this flag, we need to persist it. This is not good for read request.
2) If making this field/flag not to be a persistent one, when the object context is removed from the cache ' object_contexts', this field/flag is removed as well. This object is removed in the later evicting. The same issue still exists.

So, I came up with a fix to add a set in the class ReplicatedPG to hold all the promoting objects. This fix is at https://github.com/ceph/ceph/pull/2374. It is tested and works well. Pls review and comment, thx.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
Sent: Monday, September 1, 2014 9:33 AM
To: Sage Weil
Cc: 'ceph-devel@vger.kernel.org'
Subject: RE: Cache tiering slow request issue: currently waiting for rw locks

I don't think the object context is blocked at that time. It is un-blocked after the copying of data from base tier. It doesn't address the problem here. Anyway, I'll try it and see.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
Sent: Saturday, August 30, 2014 10:29 AM
To: Wang, Zhiqiang
Cc: 'ceph-devel@vger.kernel.org'
Subject: Re: Cache tiering slow request issue: currently waiting for rw locks

Hi,

Can you take a look at https://github.com/ceph/ceph/pull/2363 and see if that addresses the behavior you saw?

Thanks!
sage


On Fri, 29 Aug 2014, Sage Weil wrote:

> Hi,
> 
> I've opened http://tracker.ceph.com/issues/9285 to track this.
> 
> I think you're right--we need a check in agent_maybe_evict() that will 
> skip objects that are being promoted.  I suspect a flag on the 
> ObjectContext is enough?
> 
> sage
> 
> 
> On Fri, 29 Aug 2014, Wang, Zhiqiang wrote:
> 
> > Hi all,
> > 
> > I've ran into this slow request issue some time ago. The problem is like this: when running with cache tieing, there are 'slow request' warning messages in the log file like below.
> > 
> > 2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow 
> > requests, 1 included below; oldest blocked for > 30.996595 secs
> > 2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request
> > 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: 
> > osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 
> > [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 currently 
> > waiting for rw locks
> > 
> > Recently I made some changes to the log, captured this problem, and finally figured out its root cause. You can check the attachment for the logs.
> > 
> > Here is the root cause:
> > There is a cache miss when doing read. During promotion, after copying the data from base tier osd, the cache tier primary osd replicates the data to other cache tier osds. Some times this takes quite a long time. During this period of time, the promoted object may be evicted because the cache tier is full. When the primary osd finally gets the replication response and restarts the original read request, it doesn't find the object in the cache tier, and do promotion again. This loops for several times, and we'll see the 'slow request' in the logs. Theoretically, this could loops forever, and the request from the client would never be finished.
> > 
> > There is a simple fix for this:
> > Add a field in the object state, indicating the status of the promotion. It's set to true after the copy of data from base tier and before the replication. It's reset to false after the replication and the original client request starts to execute. Evicting is not allowed when this field is true.
> > 
> > What do you think?
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Cache tiering slow request issue: currently waiting for rw locks
  2014-09-09  7:12           ` Wang, Zhiqiang
@ 2014-09-09  8:45             ` Chen, Xiaoxi
  2014-09-09 16:34             ` Sage Weil
  1 sibling, 0 replies; 11+ messages in thread
From: Chen, Xiaoxi @ 2014-09-09  8:45 UTC (permalink / raw)
  To: Wang, Zhiqiang, Sage Weil, 'sam.just@inktank.com'
  Cc: 'ceph-devel@vger.kernel.org'

can we set the cache_min_evict_age to a reasonable larger number (say 5min? 10min?)  to walk around the window?-----If a request cannot finished in  minutes, that indicate there should be some issue in the cluster.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
Sent: Tuesday, September 9, 2014 3:13 PM
To: Sage Weil; 'sam.just@inktank.com'
Cc: 'ceph-devel@vger.kernel.org'
Subject: RE: Cache tiering slow request issue: currently waiting for rw locks

Pasting the conversations in the pull request here.

wonzhq commented 6 days ago
If checking ObjectContext::RWState::waiters and ObjectContext::RWState::count for the pending requests on this object, there is still a window which the problem can happen. That is after the promotion replication and requeuing the client request, and before dequeuing the client request. Should we loop the OSD::op_wq to check the pending requests on an object? Or adding something in the ObjectContext to remember the pending requests? @athanatos @liewegas

liewegas commented 10 hours ago
Hmm, that's true that there is still that window. Is it necessary that this is completely air-tight, though? As long as we avoid evicting a newly-promoted object before the request is processed we will win. I'm afraid that a complicated mechanism to cover this could introduce more complexity than we need.

wonzhq commented 2 minutes ago
Tried to use ObjectContext::RWState::count to check the pending request. In my testing, it hit the slow request just once. I checked the log, it exactly falls into the window we talked above. So with this solution, it's possible that we still hit this issue, but much less than before. Should we go ahead with this solution?

Sage/Sam, what do you think?

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
Sent: Friday, September 5, 2014 3:21 PM
To: Sage Weil; 'sam.just@inktank.com'
Cc: 'ceph-devel@vger.kernel.org'
Subject: RE: Cache tiering slow request issue: currently waiting for rw locks

I made some comments based on your comments of the pull request https://github.com/ceph/ceph/pull/2374. Can you take a look? Thx.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
Sent: Tuesday, September 2, 2014 2:54 PM
To: Sage Weil
Cc: 'ceph-devel@vger.kernel.org'
Subject: RE: Cache tiering slow request issue: currently waiting for rw locks

Tried the pull request, checking the object is blocked or not doesn't work. Actually this check is already done in function agent_work.

I tried to make a fix to add a field/flag to the object context. This is not a good idea for the following reasons:
1) If making this filed/flag to be a persistent one, when resetting/clearing this flag, we need to persist it. This is not good for read request.
2) If making this field/flag not to be a persistent one, when the object context is removed from the cache ' object_contexts', this field/flag is removed as well. This object is removed in the later evicting. The same issue still exists.

So, I came up with a fix to add a set in the class ReplicatedPG to hold all the promoting objects. This fix is at https://github.com/ceph/ceph/pull/2374. It is tested and works well. Pls review and comment, thx.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
Sent: Monday, September 1, 2014 9:33 AM
To: Sage Weil
Cc: 'ceph-devel@vger.kernel.org'
Subject: RE: Cache tiering slow request issue: currently waiting for rw locks

I don't think the object context is blocked at that time. It is un-blocked after the copying of data from base tier. It doesn't address the problem here. Anyway, I'll try it and see.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
Sent: Saturday, August 30, 2014 10:29 AM
To: Wang, Zhiqiang
Cc: 'ceph-devel@vger.kernel.org'
Subject: Re: Cache tiering slow request issue: currently waiting for rw locks

Hi,

Can you take a look at https://github.com/ceph/ceph/pull/2363 and see if that addresses the behavior you saw?

Thanks!
sage


On Fri, 29 Aug 2014, Sage Weil wrote:

> Hi,
> 
> I've opened http://tracker.ceph.com/issues/9285 to track this.
> 
> I think you're right--we need a check in agent_maybe_evict() that will 
> skip objects that are being promoted.  I suspect a flag on the 
> ObjectContext is enough?
> 
> sage
> 
> 
> On Fri, 29 Aug 2014, Wang, Zhiqiang wrote:
> 
> > Hi all,
> > 
> > I've ran into this slow request issue some time ago. The problem is like this: when running with cache tieing, there are 'slow request' warning messages in the log file like below.
> > 
> > 2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow 
> > requests, 1 included below; oldest blocked for > 30.996595 secs
> > 2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request
> > 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: 
> > osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 
> > [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 currently 
> > waiting for rw locks
> > 
> > Recently I made some changes to the log, captured this problem, and finally figured out its root cause. You can check the attachment for the logs.
> > 
> > Here is the root cause:
> > There is a cache miss when doing read. During promotion, after copying the data from base tier osd, the cache tier primary osd replicates the data to other cache tier osds. Some times this takes quite a long time. During this period of time, the promoted object may be evicted because the cache tier is full. When the primary osd finally gets the replication response and restarts the original read request, it doesn't find the object in the cache tier, and do promotion again. This loops for several times, and we'll see the 'slow request' in the logs. Theoretically, this could loops forever, and the request from the client would never be finished.
> > 
> > There is a simple fix for this:
> > Add a field in the object state, indicating the status of the promotion. It's set to true after the copy of data from base tier and before the replication. It's reset to false after the replication and the original client request starts to execute. Evicting is not allowed when this field is true.
> > 
> > What do you think?
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Cache tiering slow request issue: currently waiting for rw locks
  2014-09-09  7:12           ` Wang, Zhiqiang
  2014-09-09  8:45             ` Chen, Xiaoxi
@ 2014-09-09 16:34             ` Sage Weil
  2014-09-10  1:13               ` Wang, Zhiqiang
  2014-09-16  8:38               ` Wang, Zhiqiang
  1 sibling, 2 replies; 11+ messages in thread
From: Sage Weil @ 2014-09-09 16:34 UTC (permalink / raw)
  To: Wang, Zhiqiang
  Cc: 'sam.just@inktank.com', 'ceph-devel@vger.kernel.org'

On Tue, 9 Sep 2014, Wang, Zhiqiang wrote:
> Pasting the conversations in the pull request here.
> 
> wonzhq commented 6 days ago
> If checking ObjectContext::RWState::waiters and ObjectContext::RWState::count for the pending requests on this object, there is still a window which the problem can happen. That is after the promotion replication and requeuing the client request, and before dequeuing the client request. Should we loop the OSD::op_wq to check the pending requests on an object? Or adding something in the ObjectContext to remember the pending requests? @athanatos @liewegas
> 
> liewegas commented 10 hours ago
> Hmm, that's true that there is still that window. Is it necessary that this is completely air-tight, though? As long as we avoid evicting a newly-promoted object before the request is processed we will win. I'm afraid that a complicated mechanism to cover this could introduce more complexity than we need.
> 
> wonzhq commented 2 minutes ago
> Tried to use ObjectContext::RWState::count to check the pending request. In my testing, it hit the slow request just once. I checked the log, it exactly falls into the window we talked above. So with this solution, it's possible that we still hit this issue, but much less than before. Should we go ahead with this solution?
>
> Sage/Sam, what do you think?

I think it is definitely worth adding that check, even if it doesn't catch 
the requeue case, because it is still useful to defer eviction if there is 
a request queued for that object.  That seems true at least in the 
writeback cache mode... perhaps not so in other modes like forward.

I'm still not sure what would close the hole reliably.  Perhaps a flag on 
the obc indicating whether any request has touched it since the initial 
promote?  Maybe that, coupled with a time limit (so that eventually we can 
still evict in case the original request never gets processed... e.g. 
because the client disconnected before it was requeued or something).

?

sage

 > 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
> Sent: Friday, September 5, 2014 3:21 PM
> To: Sage Weil; 'sam.just@inktank.com'
> Cc: 'ceph-devel@vger.kernel.org'
> Subject: RE: Cache tiering slow request issue: currently waiting for rw locks
> 
> I made some comments based on your comments of the pull request https://github.com/ceph/ceph/pull/2374. Can you take a look? Thx.
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
> Sent: Tuesday, September 2, 2014 2:54 PM
> To: Sage Weil
> Cc: 'ceph-devel@vger.kernel.org'
> Subject: RE: Cache tiering slow request issue: currently waiting for rw locks
> 
> Tried the pull request, checking the object is blocked or not doesn't work. Actually this check is already done in function agent_work.
> 
> I tried to make a fix to add a field/flag to the object context. This is not a good idea for the following reasons:
> 1) If making this filed/flag to be a persistent one, when resetting/clearing this flag, we need to persist it. This is not good for read request.
> 2) If making this field/flag not to be a persistent one, when the object context is removed from the cache ' object_contexts', this field/flag is removed as well. This object is removed in the later evicting. The same issue still exists.
> 
> So, I came up with a fix to add a set in the class ReplicatedPG to hold all the promoting objects. This fix is at https://github.com/ceph/ceph/pull/2374. It is tested and works well. Pls review and comment, thx.
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
> Sent: Monday, September 1, 2014 9:33 AM
> To: Sage Weil
> Cc: 'ceph-devel@vger.kernel.org'
> Subject: RE: Cache tiering slow request issue: currently waiting for rw locks
> 
> I don't think the object context is blocked at that time. It is un-blocked after the copying of data from base tier. It doesn't address the problem here. Anyway, I'll try it and see.
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Saturday, August 30, 2014 10:29 AM
> To: Wang, Zhiqiang
> Cc: 'ceph-devel@vger.kernel.org'
> Subject: Re: Cache tiering slow request issue: currently waiting for rw locks
> 
> Hi,
> 
> Can you take a look at https://github.com/ceph/ceph/pull/2363 and see if that addresses the behavior you saw?
> 
> Thanks!
> sage
> 
> 
> On Fri, 29 Aug 2014, Sage Weil wrote:
> 
> > Hi,
> > 
> > I've opened http://tracker.ceph.com/issues/9285 to track this.
> > 
> > I think you're right--we need a check in agent_maybe_evict() that will 
> > skip objects that are being promoted.  I suspect a flag on the 
> > ObjectContext is enough?
> > 
> > sage
> > 
> > 
> > On Fri, 29 Aug 2014, Wang, Zhiqiang wrote:
> > 
> > > Hi all,
> > > 
> > > I've ran into this slow request issue some time ago. The problem is like this: when running with cache tieing, there are 'slow request' warning messages in the log file like below.
> > > 
> > > 2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow 
> > > requests, 1 included below; oldest blocked for > 30.996595 secs
> > > 2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow request
> > > 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: 
> > > osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 
> > > [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 currently 
> > > waiting for rw locks
> > > 
> > > Recently I made some changes to the log, captured this problem, and finally figured out its root cause. You can check the attachment for the logs.
> > > 
> > > Here is the root cause:
> > > There is a cache miss when doing read. During promotion, after copying the data from base tier osd, the cache tier primary osd replicates the data to other cache tier osds. Some times this takes quite a long time. During this period of time, the promoted object may be evicted because the cache tier is full. When the primary osd finally gets the replication response and restarts the original read request, it doesn't find the object in the cache tier, and do promotion again. This loops for several times, and we'll see the 'slow request' in the logs. Theoretically, this could loops forever, and the request from the client would never be finished.
> > > 
> > > There is a simple fix for this:
> > > Add a field in the object state, indicating the status of the promotion. It's set to true after the copy of data from base tier and before the replication. It's reset to false after the replication and the original client request starts to execute. Evicting is not allowed when this field is true.
> > > 
> > > What do you think?
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Cache tiering slow request issue: currently waiting for rw locks
  2014-09-09 16:34             ` Sage Weil
@ 2014-09-10  1:13               ` Wang, Zhiqiang
  2014-09-16  8:38               ` Wang, Zhiqiang
  1 sibling, 0 replies; 11+ messages in thread
From: Wang, Zhiqiang @ 2014-09-10  1:13 UTC (permalink / raw)
  To: Sage Weil
  Cc: 'sam.just@inktank.com', 'ceph-devel@vger.kernel.org'

> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com]
> Sent: Wednesday, September 10, 2014 12:35 AM
> To: Wang, Zhiqiang
> Cc: 'sam.just@inktank.com'; 'ceph-devel@vger.kernel.org'
> Subject: RE: Cache tiering slow request issue: currently waiting for rw locks
> 
> On Tue, 9 Sep 2014, Wang, Zhiqiang wrote:
> > Pasting the conversations in the pull request here.
> >
> > wonzhq commented 6 days ago
> > If checking ObjectContext::RWState::waiters and
> > ObjectContext::RWState::count for the pending requests on this object,
> > there is still a window which the problem can happen. That is after
> > the promotion replication and requeuing the client request, and before
> > dequeuing the client request. Should we loop the OSD::op_wq to check
> > the pending requests on an object? Or adding something in the
> > ObjectContext to remember the pending requests? @athanatos @liewegas
> >
> > liewegas commented 10 hours ago
> > Hmm, that's true that there is still that window. Is it necessary that this is
> completely air-tight, though? As long as we avoid evicting a newly-promoted
> object before the request is processed we will win. I'm afraid that a
> complicated mechanism to cover this could introduce more complexity than we
> need.
> >
> > wonzhq commented 2 minutes ago
> > Tried to use ObjectContext::RWState::count to check the pending request. In
> my testing, it hit the slow request just once. I checked the log, it exactly falls
> into the window we talked above. So with this solution, it's possible that we still
> hit this issue, but much less than before. Should we go ahead with this
> solution?
> >
> > Sage/Sam, what do you think?
> 
> I think it is definitely worth adding that check, even if it doesn't catch the
> requeue case, because it is still useful to defer eviction if there is a request
> queued for that object.  That seems true at least in the writeback cache
> mode... perhaps not so in other modes like forward.

OK, I'll add this check first. But why we don't want this check in the forward mode?

> I'm still not sure what would close the hole reliably.  Perhaps a flag on the obc
> indicating whether any request has touched it since the initial promote?
> Maybe that, coupled with a time limit (so that eventually we can still evict in
> case the original request never gets processed... e.g.
> because the client disconnected before it was requeued or something).

I've tried to add a flag in the obc before. But for some reasons, this didn't work well. I set
the flag since the initial promote. But later when checking this flag after the promotion,
sometimes this flag is not set. I haven't figured out the reason for this yet. I'm guessing it's
because we don't hold every obc in the 'object_contexts'. An obc is removed from it under
some conditions (e.g., reaching its size limit). So when an obc is removed, and the flag is not
persisted, we lose this flag when doing another 'get_object_context'. Is this true?

> 
> ?
> 
> sage
> 
>  >
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
> > Sent: Friday, September 5, 2014 3:21 PM
> > To: Sage Weil; 'sam.just@inktank.com'
> > Cc: 'ceph-devel@vger.kernel.org'
> > Subject: RE: Cache tiering slow request issue: currently waiting for
> > rw locks
> >
> > I made some comments based on your comments of the pull request
> https://github.com/ceph/ceph/pull/2374. Can you take a look? Thx.
> >
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
> > Sent: Tuesday, September 2, 2014 2:54 PM
> > To: Sage Weil
> > Cc: 'ceph-devel@vger.kernel.org'
> > Subject: RE: Cache tiering slow request issue: currently waiting for
> > rw locks
> >
> > Tried the pull request, checking the object is blocked or not doesn't work.
> Actually this check is already done in function agent_work.
> >
> > I tried to make a fix to add a field/flag to the object context. This is not a good
> idea for the following reasons:
> > 1) If making this filed/flag to be a persistent one, when resetting/clearing this
> flag, we need to persist it. This is not good for read request.
> > 2) If making this field/flag not to be a persistent one, when the object context
> is removed from the cache ' object_contexts', this field/flag is removed as well.
> This object is removed in the later evicting. The same issue still exists.
> >
> > So, I came up with a fix to add a set in the class ReplicatedPG to hold all the
> promoting objects. This fix is at https://github.com/ceph/ceph/pull/2374. It is
> tested and works well. Pls review and comment, thx.
> >
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
> > Sent: Monday, September 1, 2014 9:33 AM
> > To: Sage Weil
> > Cc: 'ceph-devel@vger.kernel.org'
> > Subject: RE: Cache tiering slow request issue: currently waiting for
> > rw locks
> >
> > I don't think the object context is blocked at that time. It is un-blocked after
> the copying of data from base tier. It doesn't address the problem here.
> Anyway, I'll try it and see.
> >
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
> > Sent: Saturday, August 30, 2014 10:29 AM
> > To: Wang, Zhiqiang
> > Cc: 'ceph-devel@vger.kernel.org'
> > Subject: Re: Cache tiering slow request issue: currently waiting for
> > rw locks
> >
> > Hi,
> >
> > Can you take a look at https://github.com/ceph/ceph/pull/2363 and see if
> that addresses the behavior you saw?
> >
> > Thanks!
> > sage
> >
> >
> > On Fri, 29 Aug 2014, Sage Weil wrote:
> >
> > > Hi,
> > >
> > > I've opened http://tracker.ceph.com/issues/9285 to track this.
> > >
> > > I think you're right--we need a check in agent_maybe_evict() that
> > > will skip objects that are being promoted.  I suspect a flag on the
> > > ObjectContext is enough?
> > >
> > > sage
> > >
> > >
> > > On Fri, 29 Aug 2014, Wang, Zhiqiang wrote:
> > >
> > > > Hi all,
> > > >
> > > > I've ran into this slow request issue some time ago. The problem is like
> this: when running with cache tieing, there are 'slow request' warning
> messages in the log file like below.
> > > >
> > > > 2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow
> > > > requests, 1 included below; oldest blocked for > 30.996595 secs
> > > > 2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow
> > > > request
> > > > 30.996595 seconds old, received at 2014-08-29 10:17:53.673142:
> > > > osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935
> > > > [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4
> > > > currently waiting for rw locks
> > > >
> > > > Recently I made some changes to the log, captured this problem, and
> finally figured out its root cause. You can check the attachment for the logs.
> > > >
> > > > Here is the root cause:
> > > > There is a cache miss when doing read. During promotion, after copying
> the data from base tier osd, the cache tier primary osd replicates the data to
> other cache tier osds. Some times this takes quite a long time. During this
> period of time, the promoted object may be evicted because the cache tier is
> full. When the primary osd finally gets the replication response and restarts the
> original read request, it doesn't find the object in the cache tier, and do
> promotion again. This loops for several times, and we'll see the 'slow request'
> in the logs. Theoretically, this could loops forever, and the request from the
> client would never be finished.
> > > >
> > > > There is a simple fix for this:
> > > > Add a field in the object state, indicating the status of the promotion. It's
> set to true after the copy of data from base tier and before the replication. It's
> reset to false after the replication and the original client request starts to
> execute. Evicting is not allowed when this field is true.
> > > >
> > > > What do you think?
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > in the body of a message to majordomo@vger.kernel.org More majordomo
> > > info at  http://vger.kernel.org/majordomo-info.html
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Cache tiering slow request issue: currently waiting for rw locks
  2014-09-09 16:34             ` Sage Weil
  2014-09-10  1:13               ` Wang, Zhiqiang
@ 2014-09-16  8:38               ` Wang, Zhiqiang
  1 sibling, 0 replies; 11+ messages in thread
From: Wang, Zhiqiang @ 2014-09-16  8:38 UTC (permalink / raw)
  To: Sage Weil
  Cc: 'sam.just@inktank.com', 'ceph-devel@vger.kernel.org'

So, for this issue:
1) Checking the pending request on the obc can't completely solve the problem because of the window.
2) It's also not a good idea to put a flag in the obc to indicate it's doing promotion. This is because the obc will be deleted after the promotion replication. If making this flag persistent to disk (part of object_info_t) before deletion, we'll need to persist it again when resetting this flag later.

In my original pull request, I added a set in the PG to track all the promoting objects. I updated it with the following changes:
1) Make it a map to record the time when it starts promotion. If an object stays in this map for more than 60s during eviction, we evict it anyway.
2) Addressed Sam's comments to clear the map on on_change, and remove the object from the map if the corresponding request is discarded.
These updates are pushed to the original pull request https://github.com/ceph/ceph/pull/2374. Pls take a look, thx.

-----Original Message-----
From: Sage Weil [mailto:sweil@redhat.com] 
Sent: Wednesday, September 10, 2014 12:35 AM
To: Wang, Zhiqiang
Cc: 'sam.just@inktank.com'; 'ceph-devel@vger.kernel.org'
Subject: RE: Cache tiering slow request issue: currently waiting for rw locks

On Tue, 9 Sep 2014, Wang, Zhiqiang wrote:
> Pasting the conversations in the pull request here.
> 
> wonzhq commented 6 days ago
> If checking ObjectContext::RWState::waiters and 
> ObjectContext::RWState::count for the pending requests on this object, 
> there is still a window which the problem can happen. That is after 
> the promotion replication and requeuing the client request, and before 
> dequeuing the client request. Should we loop the OSD::op_wq to check 
> the pending requests on an object? Or adding something in the 
> ObjectContext to remember the pending requests? @athanatos @liewegas
> 
> liewegas commented 10 hours ago
> Hmm, that's true that there is still that window. Is it necessary that this is completely air-tight, though? As long as we avoid evicting a newly-promoted object before the request is processed we will win. I'm afraid that a complicated mechanism to cover this could introduce more complexity than we need.
> 
> wonzhq commented 2 minutes ago
> Tried to use ObjectContext::RWState::count to check the pending request. In my testing, it hit the slow request just once. I checked the log, it exactly falls into the window we talked above. So with this solution, it's possible that we still hit this issue, but much less than before. Should we go ahead with this solution?
>
> Sage/Sam, what do you think?

I think it is definitely worth adding that check, even if it doesn't catch the requeue case, because it is still useful to defer eviction if there is a request queued for that object.  That seems true at least in the writeback cache mode... perhaps not so in other modes like forward.

I'm still not sure what would close the hole reliably.  Perhaps a flag on the obc indicating whether any request has touched it since the initial promote?  Maybe that, coupled with a time limit (so that eventually we can still evict in case the original request never gets processed... e.g. 
because the client disconnected before it was requeued or something).

?

sage

 > 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
> Sent: Friday, September 5, 2014 3:21 PM
> To: Sage Weil; 'sam.just@inktank.com'
> Cc: 'ceph-devel@vger.kernel.org'
> Subject: RE: Cache tiering slow request issue: currently waiting for 
> rw locks
> 
> I made some comments based on your comments of the pull request https://github.com/ceph/ceph/pull/2374. Can you take a look? Thx.
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
> Sent: Tuesday, September 2, 2014 2:54 PM
> To: Sage Weil
> Cc: 'ceph-devel@vger.kernel.org'
> Subject: RE: Cache tiering slow request issue: currently waiting for 
> rw locks
> 
> Tried the pull request, checking the object is blocked or not doesn't work. Actually this check is already done in function agent_work.
> 
> I tried to make a fix to add a field/flag to the object context. This is not a good idea for the following reasons:
> 1) If making this filed/flag to be a persistent one, when resetting/clearing this flag, we need to persist it. This is not good for read request.
> 2) If making this field/flag not to be a persistent one, when the object context is removed from the cache ' object_contexts', this field/flag is removed as well. This object is removed in the later evicting. The same issue still exists.
> 
> So, I came up with a fix to add a set in the class ReplicatedPG to hold all the promoting objects. This fix is at https://github.com/ceph/ceph/pull/2374. It is tested and works well. Pls review and comment, thx.
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Wang, Zhiqiang
> Sent: Monday, September 1, 2014 9:33 AM
> To: Sage Weil
> Cc: 'ceph-devel@vger.kernel.org'
> Subject: RE: Cache tiering slow request issue: currently waiting for 
> rw locks
> 
> I don't think the object context is blocked at that time. It is un-blocked after the copying of data from base tier. It doesn't address the problem here. Anyway, I'll try it and see.
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Saturday, August 30, 2014 10:29 AM
> To: Wang, Zhiqiang
> Cc: 'ceph-devel@vger.kernel.org'
> Subject: Re: Cache tiering slow request issue: currently waiting for 
> rw locks
> 
> Hi,
> 
> Can you take a look at https://github.com/ceph/ceph/pull/2363 and see if that addresses the behavior you saw?
> 
> Thanks!
> sage
> 
> 
> On Fri, 29 Aug 2014, Sage Weil wrote:
> 
> > Hi,
> > 
> > I've opened http://tracker.ceph.com/issues/9285 to track this.
> > 
> > I think you're right--we need a check in agent_maybe_evict() that 
> > will skip objects that are being promoted.  I suspect a flag on the 
> > ObjectContext is enough?
> > 
> > sage
> > 
> > 
> > On Fri, 29 Aug 2014, Wang, Zhiqiang wrote:
> > 
> > > Hi all,
> > > 
> > > I've ran into this slow request issue some time ago. The problem is like this: when running with cache tieing, there are 'slow request' warning messages in the log file like below.
> > > 
> > > 2014-08-29 10:18:24.669763 7f9b20f1b700  0 log [WRN] : 1 slow 
> > > requests, 1 included below; oldest blocked for > 30.996595 secs
> > > 2014-08-29 10:18:24.669768 7f9b20f1b700  0 log [WRN] : slow 
> > > request
> > > 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: 
> > > osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 
> > > [sparse-read 3440640~4096] 45.cf45084b ack+read e26168) v4 
> > > currently waiting for rw locks
> > > 
> > > Recently I made some changes to the log, captured this problem, and finally figured out its root cause. You can check the attachment for the logs.
> > > 
> > > Here is the root cause:
> > > There is a cache miss when doing read. During promotion, after copying the data from base tier osd, the cache tier primary osd replicates the data to other cache tier osds. Some times this takes quite a long time. During this period of time, the promoted object may be evicted because the cache tier is full. When the primary osd finally gets the replication response and restarts the original read request, it doesn't find the object in the cache tier, and do promotion again. This loops for several times, and we'll see the 'slow request' in the logs. Theoretically, this could loops forever, and the request from the client would never be finished.
> > > 
> > > There is a simple fix for this:
> > > Add a field in the object state, indicating the status of the promotion. It's set to true after the copy of data from base tier and before the replication. It's reset to false after the replication and the original client request starts to execute. Evicting is not allowed when this field is true.
> > > 
> > > What do you think?
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-09-16  8:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-29  6:43 Cache tiering slow request issue: currently waiting for rw locks Wang, Zhiqiang
2014-08-29 14:18 ` Sage Weil
2014-08-30  2:29   ` Sage Weil
2014-09-01  1:33     ` Wang, Zhiqiang
2014-09-02  6:54       ` Wang, Zhiqiang
2014-09-05  7:20         ` Wang, Zhiqiang
2014-09-09  7:12           ` Wang, Zhiqiang
2014-09-09  8:45             ` Chen, Xiaoxi
2014-09-09 16:34             ` Sage Weil
2014-09-10  1:13               ` Wang, Zhiqiang
2014-09-16  8:38               ` Wang, Zhiqiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.