All of lore.kernel.org
 help / color / mirror / Atom feed
* cosd multi-second stalls cause "wrongly marked me down"
@ 2011-02-16 21:25 Jim Schutt
  2011-02-16 21:37 ` Wido den Hollander
  2011-02-16 21:40 ` Gregory Farnum
  0 siblings, 2 replies; 94+ messages in thread
From: Jim Schutt @ 2011-02-16 21:25 UTC (permalink / raw)
  To: ceph-devel

Hi,

I've been testing v0.24.3 w/ 64 clients against
1 mon, 1 mds, 96 osds.  Under heavy write load I
see:
  [WRN] map e7 wrongly marked me down or wrong addr

I was able to sort through the logs and discover that when 
this happens I have large gaps (10 seconds or more) in osd 
heatbeat processing.  In those heartbeat gaps I've discovered 
long periods (5-15 seconds) where an osd logs nothing, even 
though I am running with debug osd/filestore/journal = 20.

Is this a known issue?

Below is what I've culled from my logs that show these gaps.
Full logs available on request.

-- Jim


# grep -n "wrongly" osd*.log | dshbak -c
----------------
osd.17.log
----------------
472230:2011-02-16 11:40:29.340076 7fb6863d4940 log [WRN] : map e17 wrongly marked me down or wrong addr
----------------
osd.46.log
----------------
489102:2011-02-16 11:40:45.756536 7f949e98c940 log [WRN] : map e25 wrongly marked me down or wrong addr
----------------
osd.87.log
----------------
406661:2011-02-16 11:40:18.805586 7f0dfe3a7940 log [WRN] : map e7 wrongly marked me down or wrong addr
----------------
osd.40.log
----------------
495401:2011-02-16 11:40:38.057711 7fa6681c5940 log [WRN] : map e21 wrongly marked me down or wrong addr


# grep -n "no heartbeat from osd87" osd*.log | head -20 | dshbak -c
----------------
osd.95.log
----------------
443261:2011-02-16 11:40:10.886318 7f4e5b53b940 osd95 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:49.424639 (cutoff 2011-02-16 11:39:50.886145)
443308:2011-02-16 11:40:10.887379 7f4e63f4e940 osd95 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:49.424639 (cutoff 2011-02-16 11:39:50.887216)
443865:2011-02-16 11:40:14.680998 7f4e5b53b940 osd95 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:49.424639 (cutoff 2011-02-16 11:39:54.680931)
443893:2011-02-16 11:40:14.681824 7f4e63f4e940 osd95 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:49.424639 (cutoff 2011-02-16 11:39:54.681752)
----------------
osd.17.log
----------------
440651:2011-02-16 11:40:13.740999 7fb6821ca940 osd17 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:53.724161 (cutoff 2011-02-16 11:39:53.740937)
440763:2011-02-16 11:40:13.744726 7fb68abdd940 osd17 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:53.724161 (cutoff 2011-02-16 11:39:53.744673)
----------------
osd.46.log
----------------
439491:2011-02-16 11:40:08.860936 7f9495ffb940 osd46 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:48.265285 (cutoff 2011-02-16 11:39:48.860878)
----------------
osd.33.log
----------------
428947:2011-02-16 11:40:26.894541 7ffbed20c940 osd33 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:40:05.244130 (cutoff 2011-02-16 11:40:06.894512)
428950:2011-02-16 11:40:26.894686 7ffbf5c1f940 osd33 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:40:05.244130 (cutoff 2011-02-16 11:40:06.894669)
----------------
osd.73.log
----------------
394823:2011-02-16 11:40:08.649240 7f47b060d940 osd73 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.962027 (cutoff 2011-02-16 11:39:48.649213)
394835:2011-02-16 11:40:08.655061 7f47b9020940 osd73 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.962027 (cutoff 2011-02-16 11:39:48.655034)
395138:2011-02-16 11:40:12.720296 7f47b060d940 osd73 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.962027 (cutoff 2011-02-16 11:39:52.720253)
----------------
osd.0.log
----------------
418554:2011-02-16 11:40:11.534834 7fd5d3bd7940 osd0 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.939692 (cutoff 2011-02-16 11:39:51.534770)
418686:2011-02-16 11:40:11.568725 7fd5dc5ea940 osd0 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.939692 (cutoff 2011-02-16 11:39:51.549753)
418964:2011-02-16 11:40:13.380898 7fd5d3bd7940 osd0 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.939692 (cutoff 2011-02-16 11:39:53.380831)


# grep -nH tick osd.87.log | egrep "11:(39:[3-9]|40:[0-2])"
osd.87.log:379692:2011-02-16 11:39:30.678342 7f0e033b1940 osd87 5 tick
osd.87.log:380284:2011-02-16 11:39:31.678652 7f0e033b1940 osd87 5 tick
osd.87.log:380974:2011-02-16 11:39:32.681217 7f0e033b1940 osd87 5 tick
osd.87.log:381406:2011-02-16 11:39:33.681646 7f0e033b1940 osd87 5 tick
osd.87.log:382004:2011-02-16 11:39:34.681930 7f0e033b1940 osd87 5 tick
osd.87.log:382660:2011-02-16 11:39:35.682177 7f0e033b1940 osd87 5 tick
osd.87.log:383068:2011-02-16 11:39:36.686511 7f0e033b1940 osd87 5 tick
osd.87.log:383849:2011-02-16 11:39:37.686750 7f0e033b1940 osd87 5 tick
osd.87.log:384487:2011-02-16 11:39:38.687127 7f0e033b1940 osd87 5 tick
osd.87.log:384561:2011-02-16 11:39:39.687908 7f0e033b1940 osd87 5 tick
osd.87.log:386015:2011-02-16 11:39:41.936988 7f0e033b1940 osd87 5 tick
osd.87.log:386467:2011-02-16 11:39:44.322215 7f0e033b1940 osd87 5 tick
osd.87.log:388404:2011-02-16 11:39:46.399688 7f0e033b1940 osd87 5 tick
osd.87.log:389153:2011-02-16 11:39:47.400058 7f0e033b1940 osd87 5 tick
osd.87.log:389484:2011-02-16 11:39:48.403479 7f0e033b1940 osd87 5 tick <==
osd.87.log:392292:2011-02-16 11:40:00.338113 7f0e033b1940 osd87 5 tick <== 12 second gap
osd.87.log:392903:2011-02-16 11:40:01.339041 7f0e033b1940 osd87 5 tick
osd.87.log:392948:2011-02-16 11:40:02.339450 7f0e033b1940 osd87 5 tick
osd.87.log:394922:2011-02-16 11:40:04.740211 7f0e033b1940 osd87 5 tick
osd.87.log:395597:2011-02-16 11:40:06.063388 7f0e033b1940 osd87 5 tick
osd.87.log:395623:2011-02-16 11:40:07.063841 7f0e033b1940 osd87 5 tick <==
osd.87.log:398449:2011-02-16 11:40:16.109719 7f0e033b1940 osd87 5 tick <== 9 second gap
osd.87.log:400131:2011-02-16 11:40:17.934761 7f0e033b1940 osd87 5 tick
osd.87.log:410005:2011-02-16 11:40:21.725596 7f0e033b1940 osd87 7 tick
osd.87.log:412432:2011-02-16 11:40:22.725940 7f0e033b1940 osd87 11 tick
osd.87.log:427258:2011-02-16 11:40:24.524376 7f0e033b1940 osd87 14 tick
osd.87.log:432187:2011-02-16 11:40:25.524614 7f0e033b1940 osd87 14 tick
osd.87.log:434222:2011-02-16 11:40:26.524970 7f0e033b1940 osd87 14 tick
osd.87.log:438352:2011-02-16 11:40:27.525224 7f0e033b1940 osd87 15 tick
osd.87.log:444226:2011-02-16 11:40:28.526490 7f0e033b1940 osd87 17 tick
osd.87.log:447127:2011-02-16 11:40:29.529372 7f0e033b1940 osd87 17 tick


# egrep -nHe "--> osd0 " osd.87.log | grep osd_ping | egrep "11:(39:[3-9]|40:[0-2])"
osd.87.log:379735:2011-02-16 11:39:30.930841 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de40c7e50
osd.87.log:380819:2011-02-16 11:39:32.232412 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0df86c2df0
osd.87.log:381277:2011-02-16 11:39:33.233895 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de4041c20
osd.87.log:381924:2011-02-16 11:39:34.638658 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de401dda0
osd.87.log:382680:2011-02-16 11:39:35.842033 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de404baf0
osd.87.log:383126:2011-02-16 11:39:36.845583 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de406dc20
osd.87.log:384121:2011-02-16 11:39:38.047605 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de4039c20
osd.87.log:384547:2011-02-16 11:39:38.949228 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x786c470
osd.87.log:386105:2011-02-16 11:39:42.216853 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de411f380
osd.87.log:386366:2011-02-16 11:39:42.826071 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0df808ac20
osd.87.log:387864:2011-02-16 11:39:45.928938 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dc8401010
osd.87.log:388691:2011-02-16 11:39:46.935489 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd800d910 <==
osd.87.log:389579:2011-02-16 11:39:58.999668 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd800dd50 <== 12 second gap
osd.87.log:391811:2011-02-16 11:39:59.604130 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0df82b1e50
osd.87.log:392297:2011-02-16 11:40:00.405855 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x23121e0
osd.87.log:392926:2011-02-16 11:40:02.098362 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd800b1d0
osd.87.log:393945:2011-02-16 11:40:04.239303 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd803d6d0
osd.87.log:395329:2011-02-16 11:40:05.240611 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd8033c80
osd.87.log:395607:2011-02-16 11:40:06.142020 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de40c7e50 <==
osd.87.log:398446:2011-02-16 11:40:16.109618 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de40c7e50 <== 10 second gap
osd.87.log:399593:2011-02-16 11:40:16.712331 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de40d8e90
osd.87.log:400034:2011-02-16 11:40:17.913637 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de41b2c30
osd.87.log:424702:2011-02-16 11:40:24.116636 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e0 as_of 14) v1 -- ?+0 0x7f0de404baf0
osd.87.log:430175:2011-02-16 11:40:24.619852 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7f0de40cec30
osd.87.log:431874:2011-02-16 11:40:25.221671 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7f0de404baf0
osd.87.log:433563:2011-02-16 11:40:26.123080 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7f0de40cfc00
osd.87.log:438357:2011-02-16 11:40:27.525371 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e15 as_of 15) v1 -- ?+0 0x7f0de412dc20
osd.87.log:443114:2011-02-16 11:40:28.226793 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e17 as_of 17) v1 -- ?+0 0x7f0de40d3e10
osd.87.log:445140:2011-02-16 11:40:28.829473 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e17 as_of 17) v1 -- ?+0 0x7f0de412ee20
osd.87.log:446889:2011-02-16 11:40:29.431710 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e17 as_of 17) v1 -- ?+0 0x7f0de40d3e10


# grep -nH "<== osd87 " osd.0.log | grep osd_ping | egrep "11:(39:[3-9]|40:[0-2])"
osd.0.log:405348:2011-02-16 11:39:30.934972 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 836 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3409844043 0 0) 0x2ce8260 con 0x7fd5cc0eb4e0
osd.0.log:406113:2011-02-16 11:39:32.243478 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 837 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2347402994 0 0) 0x2d08250 con 0x7fd5cc0eb4e0
osd.0.log:407084:2011-02-16 11:39:33.315517 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 838 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1541899367 0 0) 0x2ce8260 con 0x7fd5cc0eb4e0
osd.0.log:407879:2011-02-16 11:39:34.643198 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 839 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3930157039 0 0) 0x2cb6990 con 0x7fd5cc0eb4e0
osd.0.log:408685:2011-02-16 11:39:35.847410 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 840 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3997131881 0 0) 0x2cd0430 con 0x7fd5cc0eb4e0
osd.0.log:409558:2011-02-16 11:39:36.850722 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 841 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2880002961 0 0) 0x2933750 con 0x7fd5cc0eb4e0
osd.0.log:410150:2011-02-16 11:39:38.058936 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 842 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (422669297 0 0) 0x7fd5b4026b80 con 0x7fd5cc0eb4e0
osd.0.log:410785:2011-02-16 11:39:38.951966 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 843 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2887960658 0 0) 0x7fd5b406d7e0 con 0x7fd5cc0eb4e0
osd.0.log:412279:2011-02-16 11:39:42.328051 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 844 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (888536454 0 0) 0x7fd5c4434bf0 con 0x7fd5cc0eb4e0
osd.0.log:413248:2011-02-16 11:39:44.591247 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 845 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3942869391 0 0) 0x7fd5c4422da0 con 0x7fd5cc0eb4e0
osd.0.log:414564:2011-02-16 11:39:46.587410 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 846 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (536010906 0 0) 0x2d0cc70 con 0x7fd5cc0eb4e0
osd.0.log:415059:2011-02-16 11:39:46.939610 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 847 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1952944560 0 0) 0x2cac660 con 0x7fd5cc0eb4e0       <==
osd.0.log:418969:2011-02-16 11:40:13.382178 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 848 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1901796990 0 0) 0x7fd5a0016010 con 0x7fd5cc0eb4e0  <== 26 second gap
osd.0.log:419358:2011-02-16 11:40:13.423453 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 849 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (687989221 0 0) 0x7fd5a0018160 con 0x7fd5cc0eb4e0
osd.0.log:419573:2011-02-16 11:40:13.426088 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 850 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1942960184 0 0) 0x7fd5a0018340 con 0x7fd5cc0eb4e0
osd.0.log:420236:2011-02-16 11:40:13.599647 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 851 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1582392597 0 0) 0x7fd5a0018760 con 0x7fd5cc0eb4e0
osd.0.log:420431:2011-02-16 11:40:13.638125 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 852 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (652463144 0 0) 0x7fd5a00189f0 con 0x7fd5cc0eb4e0
osd.0.log:420737:2011-02-16 11:40:13.731877 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 853 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3298612745 0 0) 0x7fd5a0018d20 con 0x7fd5cc0eb4e0
osd.0.log:420907:2011-02-16 11:40:13.743052 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 854 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3389558799 0 0) 0x7fd5a1024010 con 0x7fd5cc0eb4e0
osd.0.log:423056:2011-02-16 11:40:20.484117 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 855 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (4197551345 0 0) 0x7fd5c4001b20 con 0x7fd5cc0eb4e0
osd.0.log:423342:2011-02-16 11:40:20.522679 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 856 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2447834242 0 0) 0x7fd5c40bac90 con 0x7fd5cc0eb4e0
osd.0.log:423565:2011-02-16 11:40:20.551600 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 857 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3477883131 0 0) 0x7fd5c4456cb0 con 0x7fd5cc0eb4e0
osd.0.log:431891:2011-02-16 11:40:24.336220 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 1 ==== osd_ping(e0 as_of 14) v1 ==== 61+0+0 (981970893 0 0) 0x2b90d50 con 0x29a5e90
osd.0.log:431957:2011-02-16 11:40:24.620956 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 2 ==== osd_ping(e14 as_of 14) v1 ==== 61+0+0 (2171246889 0 0) 0x29069e0 con 0x29a5e90
osd.0.log:432146:2011-02-16 11:40:25.223233 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 3 ==== osd_ping(e14 as_of 14) v1 ==== 61+0+0 (2197848278 0 0) 0x8e7aa60 con 0x29a5e90
osd.0.log:432373:2011-02-16 11:40:26.126494 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 4 ==== osd_ping(e14 as_of 14) v1 ==== 61+0+0 (2473034353 0 0) 0x2caec80 con 0x29a5e90
osd.0.log:434792:2011-02-16 11:40:27.531515 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 5 ==== osd_ping(e15 as_of 15) v1 ==== 61+0+0 (1656259643 0 0) 0x2e10370 con 0x29a5e90
osd.0.log:436542:2011-02-16 11:40:28.267676 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 6 ==== osd_ping(e17 as_of 17) v1 ==== 61+0+0 (414112261 0 0) 0x2931800 con 0x29a5e90
osd.0.log:437755:2011-02-16 11:40:28.830618 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 7 ==== osd_ping(e17 as_of 17) v1 ==== 61+0+0 (1206729933 0 0) 0x29b5d80 con 0x29a5e90



# grep -nH -A 10000 "11:39:49.712017" osd.87.log | grep -B 10000 "11:39:58.813658"
osd.87.log:389490:2011-02-16 11:39:49.712017 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) sync_entry committing 2101 sync_epoch 114
osd.87.log-389491-2011-02-16 11:39:49.712521 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) taking async snap 'snap_2101'
osd.87.log-389492-2011-02-16 11:39:49.750398 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) async snap create 'snap_2101' transid 134 got 0 Success
osd.87.log-389493-2011-02-16 11:39:49.750455 7f0e01bae940 journal commit_started committing 2101, unblocking
osd.87.log-389494-2011-02-16 11:39:49.750463 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087)  waiting for transid 134 to complete
osd.87.log-389495-2011-02-16 11:39:49.813245 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087)  done waiting for transid 134 to complete
osd.87.log-389496-2011-02-16 11:39:49.813306 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) sync_entry commit took 0.101289
osd.87.log-389497-2011-02-16 11:39:49.813313 7f0e01bae940 journal commit_finish thru 2101
osd.87.log-389498-2011-02-16 11:39:49.813320 7f0e01bae940 journal committed_thru 2101 (last_committed_seq 2073)                                                                                <==
osd.87.log-389499-2011-02-16 11:39:58.800113 7f0dffbaa940 osd87 5 pg[0.14bc( v 5'3 (5'1,5'3] n=3 ec=2 les=4 3/3/3) [87,72] r=0 luod=5'2 lcod 5'2 mlcod 5'1 active+clean] update_stats 3'16     <== 9 second gap
osd.87.log-389500-2011-02-16 11:39:58.800198 7f0dffbaa940 osd87 5 pg[0.14bc( v 5'3 (5'1,5'3] n=3 ec=2 les=4 3/3/3) [87,72] r=0 luod=5'2 lcod 5'2 mlcod 5'1 active+clean] eval_repop repgather(0x11493c30 applied 5'3 rep_tid=462 wfack=72 wfdisk=72,87 op=osd_op(client4232.1:72 10000009496.00000047 [write 0~4194304 [1@-1]] 0.94bc snapc 1=[])) wants=d
osd.87.log-389501-2011-02-16 11:39:58.800233 7f0dffbaa940 osd87 5 pg[0.379( v 5'2 (0'0,5'2] n=2 ec=3 les=4 3/3/3) [77,87] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x2298400 op osd_sub_op(client4236.1:14 0.379 10000004a4b.0000000d/head [] v 5'2 snapset=0=[]:[] snapc=0=[]) v3
osd.87.log-389502-2011-02-16 11:39:58.813597 7f0e01bae940 journal header: block_size 4096 alignment 4096 max_size 526385152
osd.87.log-389503-2011-02-16 11:39:58.813614 7f0e01bae940 journal header: start 86712320
osd.87.log-389504-2011-02-16 11:39:58.813620 7f0e01bae940 journal  write_pos 86712320
osd.87.log-389505-2011-02-16 11:39:58.813626 7f0e01bae940 journal queue_completions_thru seq 2101 queueing seq 2084 0x7f0df86a0cc0
osd.87.log-389506-2011-02-16 11:39:58.813643 7f0e01bae940 journal queue_completions_thru seq 2101 queueing seq 2085 0x7f0df8237e80
osd.87.log-389507-2011-02-16 11:39:58.813651 7f0e01bae940 journal queue_completions_thru seq 2101 queueing seq 2086 0x7f0df81b6c90
osd.87.log-389508-2011-02-16 11:39:58.813658 7f0e01bae940 journal queue_completions_thru seq 2101 queueing seq 2087 0x7f0df8150fb0


# grep -nH -A 10000 "11:40:09.777220" osd.87.log | grep -B 10000 "11:40:14.633381"
osd.87.log:395664:2011-02-16 11:40:09.777220 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2129 0x7f0dd804f630
osd.87.log-395665-2011-02-16 11:40:09.777227 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2130 0x7f0dc000b230
osd.87.log-395666-2011-02-16 11:40:09.777233 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2131 0x7f0dd8000b40
osd.87.log-395667-2011-02-16 11:40:09.777239 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2132 0x7f0dd80a5fe0
osd.87.log-395668-2011-02-16 11:40:09.777245 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2133 0x7f0dd8013fe0
osd.87.log-395669-2011-02-16 11:40:09.777250 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2134 0x7f0dd8055560
osd.87.log-395670-2011-02-16 11:40:09.777258 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2135 0x7f0dd801da40
osd.87.log-395671-2011-02-16 11:40:09.777264 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2136 0x7f0dc0013fe0
osd.87.log-395672-2011-02-16 11:40:09.777270 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2137 0x7f0dd8005290
osd.87.log-395673-2011-02-16 11:40:09.777277 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2138 0x7f0dd84c23f0
osd.87.log-395674-2011-02-16 11:40:09.777285 7f0e01bae940 journal  dropping committed but unwritten seq 2124 len 4195464
osd.87.log-395675-2011-02-16 11:40:09.777312 7f0e01bae940 journal  dropping committed but unwritten seq 2125 len 4195507
osd.87.log-395676-2011-02-16 11:40:09.777339 7f0df7fff940 journal throttle: waited for bytes
osd.87.log-395677-2011-02-16 11:40:09.777426 7f0e023af940 osd87 5 pg[0.f90( v 5'2 (0'0,5'2] n=2 ec=3 les=4 3/3/3) [45,87] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_commit on op osd_sub_op(client4196.1:130 0.f90 100000007d2.00000081/head [] v 5'2 snapset=0=[]:[] snapc=0=[]) v3, sending commit to osd45
osd.87.log-395678-2011-02-16 11:40:09.777493 7f0dfc9a2940 journal throttle: waited for bytes                                                                                                   <==
osd.87.log-395679-2011-02-16 11:40:14.407019 7f0df36d9940 -- 172.17.40.31:6821/23945 >> 172.17.40.35:6800/20993 pipe(0x22b7660 sd=42 pgs=5 cs=1 l=1).reader couldn't read tag, Success         <== 5 second gap
osd.87.log-395680-2011-02-16 11:40:14.628871 7f0de3afa940 -- 172.17.40.31:6821/23945 >> 172.17.40.42:0/4169731470 pipe(0x22e9880 sd=129 pgs=6 cs=1 l=1).reader couldn't read tag, Success
osd.87.log-395681-2011-02-16 11:40:14.628926 7f0dd41c3940 -- 172.17.40.31:6821/23945 >> 172.17.40.61:0/2426953348 pipe(0x2377140 sd=190 pgs=56 cs=1 l=1).reader couldn't read tag, Success
osd.87.log-395682-2011-02-16 11:40:14.631264 7f0dd41c3940 -- 172.17.40.31:6821/23945 >> 172.17.40.61:0/2426953348 pipe(0x2377140 sd=190 pgs=56 cs=1 l=1).fault 0: Success
osd.87.log-395683-2011-02-16 11:40:14.633050 7f0dd3fc1940 -- 172.17.40.31:6821/23945 >> 172.17.40.63:0/179283478 pipe(0x23776b0 sd=191 pgs=64 cs=1 l=1).reader couldn't read tag, Success
osd.87.log-395684-2011-02-16 11:40:14.633130 7f0dd3fc1940 -- 172.17.40.31:6821/23945 >> 172.17.40.63:0/179283478 pipe(0x23776b0 sd=191 pgs=64 cs=1 l=1).fault 0: Success
osd.87.log-395685-2011-02-16 11:40:14.633210 7f0dde2aa940 -- 172.17.40.31:6821/23945 >> 172.17.40.56:0/193342550 pipe(0x2385620 sd=154 pgs=15 cs=1 l=1).reader couldn't read tag, Success
osd.87.log-395686-2011-02-16 11:40:14.633238 7f0dde2aa940 -- 172.17.40.31:6821/23945 >> 172.17.40.56:0/193342550 pipe(0x2385620 sd=154 pgs=15 cs=1 l=1).fault 0: Success
osd.87.log-395687-2011-02-16 11:40:14.633308 7f0dde0a8940 -- 172.17.40.31:6821/23945 >> 172.17.40.66:0/2716831927 pipe(0x2317b20 sd=155 pgs=16 cs=1 l=1).reader couldn't read tag, Success
osd.87.log-395688-2011-02-16 11:40:14.633335 7f0dde0a8940 -- 172.17.40.31:6821/23945 >> 172.17.40.66:0/2716831927 pipe(0x2317b20 sd=155 pgs=16 cs=1 l=1).fault 0: Success
osd.87.log-395689-2011-02-16 11:40:14.633381 7f0dd65e5940 -- 172.17.40.31:6821/23945 >> 172.17.40.54:0/3922929349 pipe(0x22d49e0 sd=174 pgs=45 cs=1 l=1).reader couldn't read tag, Success


# grep -nH -A 10000 "11:39:56.765107" osd.0.log | grep -B 10000 "11:40:11.526593"
osd.0.log:418422:2011-02-16 11:39:56.765107 7fd5d75e0940 osd0 5 pg[0.c26( v 5'5 (5'3,5'5] n=5 ec=2 les=4 3/3/3) [25,0] r=1 luod=0'0 lcod 5'4 active] enqueue_op 0x7fd5c404d300 osd_sub_op(client4210.1:231 0.c26 10000002b03.000000e4/head [] v 5'6 snapset=0=[]:[] snapc=0=[]) v3
osd.0.log-418423-2011-02-16 11:39:56.765131 7fd5d75e0940 -- 172.17.40.21:6801/10701 <== osd39 172.17.40.25:6822/18259 44 ==== osd_sub_op_reply(client4237.1:85 0.f51 1000000cf3d.00000054/head [] ondisk = 0) v1 ==== 127+0+0 (699474326 0 0) 0x2bfe000 con 0x7fd5c40f5280
osd.0.log-418424-2011-02-16 11:39:56.765144 7fd5d75e0940 osd0 5 _dispatch 0x2bfe000 osd_sub_op_reply(client4237.1:85 0.f51 1000000cf3d.00000054/head [] ondisk = 0) v1
osd.0.log-418425-2011-02-16 11:39:56.765153 7fd5d75e0940 osd0 5 require_same_or_newer_map 5 (i am 5) 0x2bfe000
osd.0.log-418426-2011-02-16 11:39:56.765160 7fd5d75e0940 osd0 5 _share_map_incoming osd39 172.17.40.25:6822/18259 5
osd.0.log-418427-2011-02-16 11:39:56.765171 7fd5d75e0940 osd0 5 pg[0.f51( v 5'3 (0'0,5'3] n=3 ec=3 les=4 3/3/3) [0,39] r=0 luod=5'2 lcod 5'2 mlcod 0'0 active+clean] enqueue_op 0x2bfe000 osd_sub_op_reply(client4237.1:85 0.f51 1000000cf3d.00000054/head [] ondisk = 0) v1
osd.0.log-418428-2011-02-16 11:39:56.765195 7fd5d75e0940 -- 172.17.40.21:6801/10701 <== osd69 172.17.40.29:6816/32415 36 ==== osd_sub_op(client4239.1:93 0.5d7 1000000f26e.0000005c/head [] v 5'2 snapset=0=[]:[] snapc=0=[]) v3 ==== 532+0+4194764 (1547976727 0 29215660) 0x7fd5b40750d0 con 0x2c91590
osd.0.log-418429-2011-02-16 11:39:56.765210 7fd5d75e0940 osd0 5 _dispatch 0x7fd5b40750d0 osd_sub_op(client4239.1:93 0.5d7 1000000f26e.0000005c/head [] v 5'2 snapset=0=[]:[] snapc=0=[]) v3
osd.0.log-418430-2011-02-16 11:39:56.765230 7fd5d75e0940 osd0 5 handle_sub_op osd_sub_op(client4239.1:93 0.5d7 1000000f26e.0000005c/head [] v 5'2 snapset=0=[]:[] snapc=0=[]) v3 epoch 5
osd.0.log-418431-2011-02-16 11:39:56.765239 7fd5d75e0940 osd0 5 require_same_or_newer_map 5 (i am 5) 0x7fd5b40750d0
osd.0.log-418432-2011-02-16 11:39:56.765247 7fd5d75e0940 osd0 5 _share_map_incoming osd69 172.17.40.29:6816/32415 5
osd.0.log-418433-2011-02-16 11:39:56.767977 7fd5db5e8940 osd0 5 pg[0.b68( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [51,0] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_commit on op osd_sub_op(client4257.1:265 0.b68 10000006d7c.00000103/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3, sending commit to osd51
osd.0.log-418434-2011-02-16 11:40:11.526201 7fd5dbde9940 journal queue_completions_thru seq 2253 queueing seq 2252 0x7fd5a0000ec0    <== 15 second gap
osd.0.log-418435-2011-02-16 11:40:11.526229 7fd5dbde9940 journal queue_completions_thru seq 2253 queueing seq 2253 0x7fd59800f790
osd.0.log-418436-2011-02-16 11:40:11.526242 7fd5dbde9940 journal write_thread throttle finished 3 ops and 12586409 bytes, now 5 ops and 20977433 bytes
osd.0.log-418437-2011-02-16 11:40:11.526278 7fd5dbde9940 journal room 501166079 max_size 526385152 pos 127389696 header.start 102174720 top 4096
osd.0.log-418438-2011-02-16 11:40:11.526285 7fd5dbde9940 journal check_for_full at 127389696 : 4202496 < 501166079
osd.0.log-418439-2011-02-16 11:40:11.526291 7fd5dbde9940 journal prepare_single_write 1 will write 127389696 : seq 2254 len 4195516 -> 4202496 (head 40 pre_pad 3891 ebl 4195516 post_pad 3009 tail 40) (ebl alignment 3931)
osd.0.log-418440-2011-02-16 11:40:11.526309 7fd5dbde9940 journal room 496963583 max_size 526385152 pos 131592192 header.start 102174720 top 4096
osd.0.log-418441-2011-02-16 11:40:11.526316 7fd5dbde9940 journal check_for_full at 131592192 : 4202496 < 496963583
osd.0.log-418442-2011-02-16 11:40:11.526321 7fd5dbde9940 journal prepare_single_write 2 will write 131592192 : seq 2255 len 4195464 -> 4202496 (head 40 pre_pad 3891 ebl 4195464 post_pad 3061 tail 40) (ebl alignment 3931)
osd.0.log-418443-2011-02-16 11:40:11.526331 7fd5dbde9940 journal room 492761087 max_size 526385152 pos 135794688 header.start 102174720 top 4096
osd.0.log-418444-2011-02-16 11:40:11.526337 7fd5dbde9940 journal check_for_full at 135794688 : 4202496 < 492761087
osd.0.log-418445-2011-02-16 11:40:11.526343 7fd5dbde9940 journal prepare_single_write 3 will write 135794688 : seq 2256 len 4195446 -> 4202496 (head 40 pre_pad 3895 ebl 4195446 post_pad 3075 tail 40) (ebl alignment 3935)
osd.0.log-418446-2011-02-16 11:40:11.526593 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd42 172.17.40.26:6808/8729 877 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3742513708 0 0) 0x7fd5c43e1c30 con 0x7fd5cc0e3d90






^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-16 21:25 cosd multi-second stalls cause "wrongly marked me down" Jim Schutt
@ 2011-02-16 21:37 ` Wido den Hollander
  2011-02-16 21:51   ` Jim Schutt
  2011-02-16 21:40 ` Gregory Farnum
  1 sibling, 1 reply; 94+ messages in thread
From: Wido den Hollander @ 2011-02-16 21:37 UTC (permalink / raw)
  To: Jim Schutt, ceph-devel

Hi,
----- Original message -----
> Hi,
> 
> I've been testing v0.24.3 w/ 64 clients against
> 1 mon, 1 mds, 96 osds.   Under heavy write load I
> see:
>     [WRN] map e7 wrongly marked me down or wrong addr

I'm seeing the same, for example when the cluster is recovering.

My thougths is that it is something btrfs related where the OSD is hanging on, do you see the same? (Check the stack of the process in /proc). (Thnx colin!)

It showed me that it was stalling on btrfs ioctls.

Wido
> 
> I was able to sort through the logs and discover that when 
> this happens I have large gaps (10 seconds or more) in osd 
> heatbeat processing.   In those heartbeat gaps I've discovered 
> long periods (5-15 seconds) where an osd logs nothing, even 
> though I am running with debug osd/filestore/journal = 20.
> 
> Is this a known issue?
> 
> Below is what I've culled from my logs that show these gaps.
> Full logs available on request.
> 
> -- Jim
> 
> 
> # grep -n "wrongly" osd*.log | dshbak -c
> ----------------
> osd.17.log
> ----------------
> 472230:2011-02-16 11:40:29.340076 7fb6863d4940 log [WRN] : map e17
> wrongly marked me down or wrong addr ----------------
> osd.46.log
> ----------------
> 489102:2011-02-16 11:40:45.756536 7f949e98c940 log [WRN] : map e25
> wrongly marked me down or wrong addr ----------------
> osd.87.log
> ----------------
> 406661:2011-02-16 11:40:18.805586 7f0dfe3a7940 log [WRN] : map e7
> wrongly marked me down or wrong addr ----------------
> osd.40.log
> ----------------
> 495401:2011-02-16 11:40:38.057711 7fa6681c5940 log [WRN] : map e21
> wrongly marked me down or wrong addr
> 
> 
> # grep -n "no heartbeat from osd87" osd*.log | head -20 | dshbak -c
> ----------------
> osd.95.log
> ----------------
> 443261:2011-02-16 11:40:10.886318 7f4e5b53b940 osd95 5 heartbeat_check:
> no heartbeat from osd87 since 2011-02-16 11:39:49.424639 (cutoff
> 2011-02-16 11:39:50.886145) 443308:2011-02-16 11:40:10.887379
> 7f4e63f4e940 osd95 5 heartbeat_check: no heartbeat from osd87 since
> 2011-02-16 11:39:49.424639 (cutoff 2011-02-16 11:39:50.887216)
> 443865:2011-02-16 11:40:14.680998 7f4e5b53b940 osd95 5 heartbeat_check:
> no heartbeat from osd87 since 2011-02-16 11:39:49.424639 (cutoff
> 2011-02-16 11:39:54.680931) 443893:2011-02-16 11:40:14.681824
> 7f4e63f4e940 osd95 5 heartbeat_check: no heartbeat from osd87 since
> 2011-02-16 11:39:49.424639 (cutoff 2011-02-16 11:39:54.681752)
> ---------------- osd.17.log ---------------- 440651:2011-02-16
> 11:40:13.740999 7fb6821ca940 osd17 5 heartbeat_check: no heartbeat from
> osd87 since 2011-02-16 11:39:53.724161 (cutoff 2011-02-16
> 11:39:53.740937) 440763:2011-02-16 11:40:13.744726 7fb68abdd940 osd17 5
> heartbeat_check: no heartbeat from osd87 since 2011-02-16
> 11:39:53.724161 (cutoff 2011-02-16 11:39:53.744673) ----------------
> osd.46.log ---------------- 439491:2011-02-16 11:40:08.860936
> 7f9495ffb940 osd46 5 heartbeat_check: no heartbeat from osd87 since
> 2011-02-16 11:39:48.265285 (cutoff 2011-02-16 11:39:48.860878)
> ---------------- osd.33.log ---------------- 428947:2011-02-16
> 11:40:26.894541 7ffbed20c940 osd33 5 heartbeat_check: no heartbeat from
> osd87 since 2011-02-16 11:40:05.244130 (cutoff 2011-02-16
> 11:40:06.894512) 428950:2011-02-16 11:40:26.894686 7ffbf5c1f940 osd33 5
> heartbeat_check: no heartbeat from osd87 since 2011-02-16
> 11:40:05.244130 (cutoff 2011-02-16 11:40:06.894669) ----------------
> osd.73.log ---------------- 394823:2011-02-16 11:40:08.649240
> 7f47b060d940 osd73 5 heartbeat_check: no heartbeat from osd87 since
> 2011-02-16 11:39:46.962027 (cutoff 2011-02-16 11:39:48.649213)
> 394835:2011-02-16 11:40:08.655061 7f47b9020940 osd73 5 heartbeat_check:
> no heartbeat from osd87 since 2011-02-16 11:39:46.962027 (cutoff
> 2011-02-16 11:39:48.655034) 395138:2011-02-16 11:40:12.720296
> 7f47b060d940 osd73 5 heartbeat_check: no heartbeat from osd87 since
> 2011-02-16 11:39:46.962027 (cutoff 2011-02-16 11:39:52.720253)
> ---------------- osd.0.log ---------------- 418554:2011-02-16
> 11:40:11.534834 7fd5d3bd7940 osd0 5 heartbeat_check: no heartbeat from
> osd87 since 2011-02-16 11:39:46.939692 (cutoff 2011-02-16
> 11:39:51.534770) 418686:2011-02-16 11:40:11.568725 7fd5dc5ea940 osd0 5
> heartbeat_check: no heartbeat from osd87 since 2011-02-16
> 11:39:46.939692 (cutoff 2011-02-16 11:39:51.549753) 418964:2011-02-16
> 11:40:13.380898 7fd5d3bd7940 osd0 5 heartbeat_check: no heartbeat from
> osd87 since 2011-02-16 11:39:46.939692 (cutoff 2011-02-16
> 11:39:53.380831)
> 
> 
> # grep -nH tick osd.87.log | egrep "11:(39:[3-9]|40:[0-2])"
> osd.87.log:379692:2011-02-16 11:39:30.678342 7f0e033b1940 osd87 5 tick
> osd.87.log:380284:2011-02-16 11:39:31.678652 7f0e033b1940 osd87 5 tick
> osd.87.log:380974:2011-02-16 11:39:32.681217 7f0e033b1940 osd87 5 tick
> osd.87.log:381406:2011-02-16 11:39:33.681646 7f0e033b1940 osd87 5 tick
> osd.87.log:382004:2011-02-16 11:39:34.681930 7f0e033b1940 osd87 5 tick
> osd.87.log:382660:2011-02-16 11:39:35.682177 7f0e033b1940 osd87 5 tick
> osd.87.log:383068:2011-02-16 11:39:36.686511 7f0e033b1940 osd87 5 tick
> osd.87.log:383849:2011-02-16 11:39:37.686750 7f0e033b1940 osd87 5 tick
> osd.87.log:384487:2011-02-16 11:39:38.687127 7f0e033b1940 osd87 5 tick
> osd.87.log:384561:2011-02-16 11:39:39.687908 7f0e033b1940 osd87 5 tick
> osd.87.log:386015:2011-02-16 11:39:41.936988 7f0e033b1940 osd87 5 tick
> osd.87.log:386467:2011-02-16 11:39:44.322215 7f0e033b1940 osd87 5 tick
> osd.87.log:388404:2011-02-16 11:39:46.399688 7f0e033b1940 osd87 5 tick
> osd.87.log:389153:2011-02-16 11:39:47.400058 7f0e033b1940 osd87 5 tick
> osd.87.log:389484:2011-02-16 11:39:48.403479 7f0e033b1940 osd87 5 tick
> <== osd.87.log:392292:2011-02-16 11:40:00.338113 7f0e033b1940 osd87 5
> tick <== 12 second gap osd.87.log:392903:2011-02-16 11:40:01.339041
> 7f0e033b1940 osd87 5 tick osd.87.log:392948:2011-02-16 11:40:02.339450
> 7f0e033b1940 osd87 5 tick osd.87.log:394922:2011-02-16 11:40:04.740211
> 7f0e033b1940 osd87 5 tick osd.87.log:395597:2011-02-16 11:40:06.063388
> 7f0e033b1940 osd87 5 tick osd.87.log:395623:2011-02-16 11:40:07.063841
> 7f0e033b1940 osd87 5 tick <== osd.87.log:398449:2011-02-16
> 11:40:16.109719 7f0e033b1940 osd87 5 tick <== 9 second gap
> osd.87.log:400131:2011-02-16 11:40:17.934761 7f0e033b1940 osd87 5 tick
> osd.87.log:410005:2011-02-16 11:40:21.725596 7f0e033b1940 osd87 7 tick
> osd.87.log:412432:2011-02-16 11:40:22.725940 7f0e033b1940 osd87 11 tick
> osd.87.log:427258:2011-02-16 11:40:24.524376 7f0e033b1940 osd87 14 tick
> osd.87.log:432187:2011-02-16 11:40:25.524614 7f0e033b1940 osd87 14 tick
> osd.87.log:434222:2011-02-16 11:40:26.524970 7f0e033b1940 osd87 14 tick
> osd.87.log:438352:2011-02-16 11:40:27.525224 7f0e033b1940 osd87 15 tick
> osd.87.log:444226:2011-02-16 11:40:28.526490 7f0e033b1940 osd87 17 tick
> osd.87.log:447127:2011-02-16 11:40:29.529372 7f0e033b1940 osd87 17 tick
> 
> 
> # egrep -nHe "--> osd0 " osd.87.log | grep osd_ping | egrep
> "11:(39:[3-9]|40:[0-2])" osd.87.log:379735:2011-02-16 11:39:30.930841
> 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701
> -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de40c7e50
> osd.87.log:380819:2011-02-16 11:39:32.232412 7f0df67fc940 --
> 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5
> as_of 5) v1 -- ?+0 0x7f0df86c2df0 osd.87.log:381277:2011-02-16
> 11:39:33.233895 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de4041c20
> osd.87.log:381924:2011-02-16 11:39:34.638658 7f0df67fc940 --
> 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5
> as_of 5) v1 -- ?+0 0x7f0de401dda0 osd.87.log:382680:2011-02-16
> 11:39:35.842033 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de404baf0
> osd.87.log:383126:2011-02-16 11:39:36.845583 7f0df67fc940 --
> 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5
> as_of 5) v1 -- ?+0 0x7f0de406dc20 osd.87.log:384121:2011-02-16
> 11:39:38.047605 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de4039c20
> osd.87.log:384547:2011-02-16 11:39:38.949228 7f0df67fc940 --
> 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5
> as_of 5) v1 -- ?+0 0x786c470 osd.87.log:386105:2011-02-16
> 11:39:42.216853 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de411f380
> osd.87.log:386366:2011-02-16 11:39:42.826071 7f0df67fc940 --
> 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5
> as_of 5) v1 -- ?+0 0x7f0df808ac20 osd.87.log:387864:2011-02-16
> 11:39:45.928938 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dc8401010
> osd.87.log:388691:2011-02-16 11:39:46.935489 7f0df67fc940 --
> 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5
> as_of 5) v1 -- ?+0 0x7f0dd800d910 <== osd.87.log:389579:2011-02-16
> 11:39:58.999668 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd800dd50
> <== 12 second gap osd.87.log:391811:2011-02-16 11:39:59.604130
> 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701
> -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0df82b1e50
> osd.87.log:392297:2011-02-16 11:40:00.405855 7f0df67fc940 --
> 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5
> as_of 5) v1 -- ?+0 0x23121e0 osd.87.log:392926:2011-02-16
> 11:40:02.098362 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd800b1d0
> osd.87.log:393945:2011-02-16 11:40:04.239303 7f0df67fc940 --
> 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5
> as_of 5) v1 -- ?+0 0x7f0dd803d6d0 osd.87.log:395329:2011-02-16
> 11:40:05.240611 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd8033c80
> osd.87.log:395607:2011-02-16 11:40:06.142020 7f0df67fc940 --
> 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5
> as_of 5) v1 -- ?+0 0x7f0de40c7e50 <== osd.87.log:398446:2011-02-16
> 11:40:16.109618 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de40c7e50
> <== 10 second gap osd.87.log:399593:2011-02-16 11:40:16.712331
> 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701
> -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de40d8e90
> osd.87.log:400034:2011-02-16 11:40:17.913637 7f0df67fc940 --
> 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5
> as_of 5) v1 -- ?+0 0x7f0de41b2c30 osd.87.log:424702:2011-02-16
> 11:40:24.116636 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e0 as_of 14) v1 -- ?+0
> 0x7f0de404baf0 osd.87.log:430175:2011-02-16 11:40:24.619852 7f0df67fc940
> -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 --
> osd_ping(e14 as_of 14) v1 -- ?+0 0x7f0de40cec30
> osd.87.log:431874:2011-02-16 11:40:25.221671 7f0df67fc940 --
> 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e14
> as_of 14) v1 -- ?+0 0x7f0de404baf0 osd.87.log:433563:2011-02-16
> 11:40:26.123080 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e14 as_of 14) v1 -- ?+0
> 0x7f0de40cfc00 osd.87.log:438357:2011-02-16 11:40:27.525371 7f0df67fc940
> -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 --
> osd_ping(e15 as_of 15) v1 -- ?+0 0x7f0de412dc20
> osd.87.log:443114:2011-02-16 11:40:28.226793 7f0df67fc940 --
> 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e17
> as_of 17) v1 -- ?+0 0x7f0de40d3e10 osd.87.log:445140:2011-02-16
> 11:40:28.829473 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0
> 172.17.40.21:6802/10701 -- osd_ping(e17 as_of 17) v1 -- ?+0
> 0x7f0de412ee20 osd.87.log:446889:2011-02-16 11:40:29.431710 7f0df67fc940
> -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 --
> osd_ping(e17 as_of 17) v1 -- ?+0 0x7f0de40d3e10
> 
> 
> # grep -nH "<== osd87 " osd.0.log | grep osd_ping | egrep
> "11:(39:[3-9]|40:[0-2])" osd.0.log:405348:2011-02-16 11:39:30.934972
> 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87
> 172.17.40.31:6823/23945 836 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0
> (3409844043 0 0) 0x2ce8260 con 0x7fd5cc0eb4e0
> osd.0.log:406113:2011-02-16 11:39:32.243478 7fd5d6ddf940 --
> 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 837 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2347402994 0 0) 0x2d08250 con
> 0x7fd5cc0eb4e0 osd.0.log:407084:2011-02-16 11:39:33.315517 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 838 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1541899367 0 0) 0x2ce8260 con
> 0x7fd5cc0eb4e0 osd.0.log:407879:2011-02-16 11:39:34.643198 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 839 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3930157039 0 0) 0x2cb6990 con
> 0x7fd5cc0eb4e0 osd.0.log:408685:2011-02-16 11:39:35.847410 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 840 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3997131881 0 0) 0x2cd0430 con
> 0x7fd5cc0eb4e0 osd.0.log:409558:2011-02-16 11:39:36.850722 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 841 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2880002961 0 0) 0x2933750 con
> 0x7fd5cc0eb4e0 osd.0.log:410150:2011-02-16 11:39:38.058936 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 842 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (422669297 0 0) 0x7fd5b4026b80 con
> 0x7fd5cc0eb4e0 osd.0.log:410785:2011-02-16 11:39:38.951966 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 843 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2887960658 0 0) 0x7fd5b406d7e0 con
> 0x7fd5cc0eb4e0 osd.0.log:412279:2011-02-16 11:39:42.328051 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 844 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (888536454 0 0) 0x7fd5c4434bf0 con
> 0x7fd5cc0eb4e0 osd.0.log:413248:2011-02-16 11:39:44.591247 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 845 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3942869391 0 0) 0x7fd5c4422da0 con
> 0x7fd5cc0eb4e0 osd.0.log:414564:2011-02-16 11:39:46.587410 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 846 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (536010906 0 0) 0x2d0cc70 con
> 0x7fd5cc0eb4e0 osd.0.log:415059:2011-02-16 11:39:46.939610 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 847 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1952944560 0 0) 0x2cac660 con
> 0x7fd5cc0eb4e0             <== osd.0.log:418969:2011-02-16 11:40:13.382178
> 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87
> 172.17.40.31:6823/23945 848 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0
> (1901796990 0 0) 0x7fd5a0016010 con 0x7fd5cc0eb4e0   <== 26 second gap
> osd.0.log:419358:2011-02-16 11:40:13.423453 7fd5d6ddf940 --
> 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 849 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (687989221 0 0) 0x7fd5a0018160 con
> 0x7fd5cc0eb4e0 osd.0.log:419573:2011-02-16 11:40:13.426088 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 850 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1942960184 0 0) 0x7fd5a0018340 con
> 0x7fd5cc0eb4e0 osd.0.log:420236:2011-02-16 11:40:13.599647 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 851 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1582392597 0 0) 0x7fd5a0018760 con
> 0x7fd5cc0eb4e0 osd.0.log:420431:2011-02-16 11:40:13.638125 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 852 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (652463144 0 0) 0x7fd5a00189f0 con
> 0x7fd5cc0eb4e0 osd.0.log:420737:2011-02-16 11:40:13.731877 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 853 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3298612745 0 0) 0x7fd5a0018d20 con
> 0x7fd5cc0eb4e0 osd.0.log:420907:2011-02-16 11:40:13.743052 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 854 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3389558799 0 0) 0x7fd5a1024010 con
> 0x7fd5cc0eb4e0 osd.0.log:423056:2011-02-16 11:40:20.484117 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 855 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (4197551345 0 0) 0x7fd5c4001b20 con
> 0x7fd5cc0eb4e0 osd.0.log:423342:2011-02-16 11:40:20.522679 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 856 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2447834242 0 0) 0x7fd5c40bac90 con
> 0x7fd5cc0eb4e0 osd.0.log:423565:2011-02-16 11:40:20.551600 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 857 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3477883131 0 0) 0x7fd5c4456cb0 con
> 0x7fd5cc0eb4e0 osd.0.log:431891:2011-02-16 11:40:24.336220 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 1 ====
> osd_ping(e0 as_of 14) v1 ==== 61+0+0 (981970893 0 0) 0x2b90d50 con
> 0x29a5e90 osd.0.log:431957:2011-02-16 11:40:24.620956 7fd5d6ddf940 --
> 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 2 ====
> osd_ping(e14 as_of 14) v1 ==== 61+0+0 (2171246889 0 0) 0x29069e0 con
> 0x29a5e90 osd.0.log:432146:2011-02-16 11:40:25.223233 7fd5d6ddf940 --
> 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 3 ====
> osd_ping(e14 as_of 14) v1 ==== 61+0+0 (2197848278 0 0) 0x8e7aa60 con
> 0x29a5e90 osd.0.log:432373:2011-02-16 11:40:26.126494 7fd5d6ddf940 --
> 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 4 ====
> osd_ping(e14 as_of 14) v1 ==== 61+0+0 (2473034353 0 0) 0x2caec80 con
> 0x29a5e90 osd.0.log:434792:2011-02-16 11:40:27.531515 7fd5d6ddf940 --
> 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 5 ====
> osd_ping(e15 as_of 15) v1 ==== 61+0+0 (1656259643 0 0) 0x2e10370 con
> 0x29a5e90 osd.0.log:436542:2011-02-16 11:40:28.267676 7fd5d6ddf940 --
> 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 6 ====
> osd_ping(e17 as_of 17) v1 ==== 61+0+0 (414112261 0 0) 0x2931800 con
> 0x29a5e90 osd.0.log:437755:2011-02-16 11:40:28.830618 7fd5d6ddf940 --
> 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 7 ====
> osd_ping(e17 as_of 17) v1 ==== 61+0+0 (1206729933 0 0) 0x29b5d80 con
> 0x29a5e90
> 
> 
> 
> # grep -nH -A 10000 "11:39:49.712017" osd.87.log | grep -B 10000
> "11:39:58.813658" osd.87.log:389490:2011-02-16 11:39:49.712017
> 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) sync_entry
> committing 2101 sync_epoch 114 osd.87.log-389491-2011-02-16
> 11:39:49.712521 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087)
> taking async snap 'snap_2101' osd.87.log-389492-2011-02-16
> 11:39:49.750398 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087)
> async snap create 'snap_2101' transid 134 got 0 Success
> osd.87.log-389493-2011-02-16 11:39:49.750455 7f0e01bae940 journal
> commit_started committing 2101, unblocking osd.87.log-389494-2011-02-16
> 11:39:49.750463 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) 
> waiting for transid 134 to complete osd.87.log-389495-2011-02-16
> 11:39:49.813245 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) 
> done waiting for transid 134 to complete osd.87.log-389496-2011-02-16
> 11:39:49.813306 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087)
> sync_entry commit took 0.101289 osd.87.log-389497-2011-02-16
> 11:39:49.813313 7f0e01bae940 journal commit_finish thru 2101
> osd.87.log-389498-2011-02-16 11:39:49.813320 7f0e01bae940 journal
> committed_thru 2101 (last_committed_seq 2073)                                                     
>                                                                                                         <==
> osd.87.log-389499-2011-02-16 11:39:58.800113 7f0dffbaa940 osd87 5
> pg[0.14bc( v 5'3 (5'1,5'3] n=3 ec=2 les=4 3/3/3) [87,72] r=0 luod=5'2
> lcod 5'2 mlcod 5'1 active+clean] update_stats 3'16         <== 9 second gap
> osd.87.log-389500-2011-02-16 11:39:58.800198 7f0dffbaa940 osd87 5
> pg[0.14bc( v 5'3 (5'1,5'3] n=3 ec=2 les=4 3/3/3) [87,72] r=0 luod=5'2
> lcod 5'2 mlcod 5'1 active+clean] eval_repop repgather(0x11493c30 applied
> 5'3 rep_tid=462 wfack=72 wfdisk=72,87 op=osd_op(client4232.1:72
> 10000009496.00000047 [write 0~4194304 [1@-1]] 0.94bc snapc 1=[]))
> wants=d osd.87.log-389501-2011-02-16 11:39:58.800233 7f0dffbaa940 osd87
> 5 pg[0.379( v 5'2 (0'0,5'2] n=2 ec=3 les=4 3/3/3) [77,87] r=1 luod=0'0
> lcod 0'0 active] sub_op_modify_applied on 0x2298400 op
> osd_sub_op(client4236.1:14 0.379 10000004a4b.0000000d/head [] v 5'2
> snapset=0=[]:[] snapc=0=[]) v3 osd.87.log-389502-2011-02-16
> 11:39:58.813597 7f0e01bae940 journal header: block_size 4096 alignment
> 4096 max_size 526385152 osd.87.log-389503-2011-02-16 11:39:58.813614
> 7f0e01bae940 journal header: start 86712320 osd.87.log-389504-2011-02-16
> 11:39:58.813620 7f0e01bae940 journal   write_pos 86712320
> osd.87.log-389505-2011-02-16 11:39:58.813626 7f0e01bae940 journal
> queue_completions_thru seq 2101 queueing seq 2084 0x7f0df86a0cc0
> osd.87.log-389506-2011-02-16 11:39:58.813643 7f0e01bae940 journal
> queue_completions_thru seq 2101 queueing seq 2085 0x7f0df8237e80
> osd.87.log-389507-2011-02-16 11:39:58.813651 7f0e01bae940 journal
> queue_completions_thru seq 2101 queueing seq 2086 0x7f0df81b6c90
> osd.87.log-389508-2011-02-16 11:39:58.813658 7f0e01bae940 journal
> queue_completions_thru seq 2101 queueing seq 2087 0x7f0df8150fb0
> 
> 
> # grep -nH -A 10000 "11:40:09.777220" osd.87.log | grep -B 10000
> "11:40:14.633381" osd.87.log:395664:2011-02-16 11:40:09.777220
> 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2129
> 0x7f0dd804f630 osd.87.log-395665-2011-02-16 11:40:09.777227 7f0e01bae940
> journal queue_completions_thru seq 2138 queueing seq 2130 0x7f0dc000b230
> osd.87.log-395666-2011-02-16 11:40:09.777233 7f0e01bae940 journal
> queue_completions_thru seq 2138 queueing seq 2131 0x7f0dd8000b40
> osd.87.log-395667-2011-02-16 11:40:09.777239 7f0e01bae940 journal
> queue_completions_thru seq 2138 queueing seq 2132 0x7f0dd80a5fe0
> osd.87.log-395668-2011-02-16 11:40:09.777245 7f0e01bae940 journal
> queue_completions_thru seq 2138 queueing seq 2133 0x7f0dd8013fe0
> osd.87.log-395669-2011-02-16 11:40:09.777250 7f0e01bae940 journal
> queue_completions_thru seq 2138 queueing seq 2134 0x7f0dd8055560
> osd.87.log-395670-2011-02-16 11:40:09.777258 7f0e01bae940 journal
> queue_completions_thru seq 2138 queueing seq 2135 0x7f0dd801da40
> osd.87.log-395671-2011-02-16 11:40:09.777264 7f0e01bae940 journal
> queue_completions_thru seq 2138 queueing seq 2136 0x7f0dc0013fe0
> osd.87.log-395672-2011-02-16 11:40:09.777270 7f0e01bae940 journal
> queue_completions_thru seq 2138 queueing seq 2137 0x7f0dd8005290
> osd.87.log-395673-2011-02-16 11:40:09.777277 7f0e01bae940 journal
> queue_completions_thru seq 2138 queueing seq 2138 0x7f0dd84c23f0
> osd.87.log-395674-2011-02-16 11:40:09.777285 7f0e01bae940 journal 
> dropping committed but unwritten seq 2124 len 4195464
> osd.87.log-395675-2011-02-16 11:40:09.777312 7f0e01bae940 journal 
> dropping committed but unwritten seq 2125 len 4195507
> osd.87.log-395676-2011-02-16 11:40:09.777339 7f0df7fff940 journal
> throttle: waited for bytes osd.87.log-395677-2011-02-16 11:40:09.777426
> 7f0e023af940 osd87 5 pg[0.f90( v 5'2 (0'0,5'2] n=2 ec=3 les=4 3/3/3)
> [45,87] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_commit on op
> osd_sub_op(client4196.1:130 0.f90 100000007d2.00000081/head [] v 5'2
> snapset=0=[]:[] snapc=0=[]) v3, sending commit to osd45
> osd.87.log-395678-2011-02-16 11:40:09.777493 7f0dfc9a2940 journal
> throttle: waited for bytes                                                                                           
>                                                                                                         <==
> osd.87.log-395679-2011-02-16 11:40:14.407019 7f0df36d9940 --
> 172.17.40.31:6821/23945 >> 172.17.40.35:6800/20993 pipe(0x22b7660 sd=42
> pgs=5 cs=1 l=1).reader couldn't read tag, Success                 <== 5 second
> gap osd.87.log-395680-2011-02-16 11:40:14.628871 7f0de3afa940 --
> 172.17.40.31:6821/23945 >> 172.17.40.42:0/4169731470 pipe(0x22e9880
> sd=129 pgs=6 cs=1 l=1).reader couldn't read tag, Success
> osd.87.log-395681-2011-02-16 11:40:14.628926 7f0dd41c3940 --
> 172.17.40.31:6821/23945 >> 172.17.40.61:0/2426953348 pipe(0x2377140
> sd=190 pgs=56 cs=1 l=1).reader couldn't read tag, Success
> osd.87.log-395682-2011-02-16 11:40:14.631264 7f0dd41c3940 --
> 172.17.40.31:6821/23945 >> 172.17.40.61:0/2426953348 pipe(0x2377140
> sd=190 pgs=56 cs=1 l=1).fault 0: Success osd.87.log-395683-2011-02-16
> 11:40:14.633050 7f0dd3fc1940 -- 172.17.40.31:6821/23945 >>
> 172.17.40.63:0/179283478 pipe(0x23776b0 sd=191 pgs=64 cs=1 l=1).reader
> couldn't read tag, Success osd.87.log-395684-2011-02-16 11:40:14.633130
> 7f0dd3fc1940 -- 172.17.40.31:6821/23945 >> 172.17.40.63:0/179283478
> pipe(0x23776b0 sd=191 pgs=64 cs=1 l=1).fault 0: Success
> osd.87.log-395685-2011-02-16 11:40:14.633210 7f0dde2aa940 --
> 172.17.40.31:6821/23945 >> 172.17.40.56:0/193342550 pipe(0x2385620
> sd=154 pgs=15 cs=1 l=1).reader couldn't read tag, Success
> osd.87.log-395686-2011-02-16 11:40:14.633238 7f0dde2aa940 --
> 172.17.40.31:6821/23945 >> 172.17.40.56:0/193342550 pipe(0x2385620
> sd=154 pgs=15 cs=1 l=1).fault 0: Success osd.87.log-395687-2011-02-16
> 11:40:14.633308 7f0dde0a8940 -- 172.17.40.31:6821/23945 >>
> 172.17.40.66:0/2716831927 pipe(0x2317b20 sd=155 pgs=16 cs=1 l=1).reader
> couldn't read tag, Success osd.87.log-395688-2011-02-16 11:40:14.633335
> 7f0dde0a8940 -- 172.17.40.31:6821/23945 >> 172.17.40.66:0/2716831927
> pipe(0x2317b20 sd=155 pgs=16 cs=1 l=1).fault 0: Success
> osd.87.log-395689-2011-02-16 11:40:14.633381 7f0dd65e5940 --
> 172.17.40.31:6821/23945 >> 172.17.40.54:0/3922929349 pipe(0x22d49e0
> sd=174 pgs=45 cs=1 l=1).reader couldn't read tag, Success
> 
> 
> # grep -nH -A 10000 "11:39:56.765107" osd.0.log | grep -B 10000
> "11:40:11.526593" osd.0.log:418422:2011-02-16 11:39:56.765107
> 7fd5d75e0940 osd0 5 pg[0.c26( v 5'5 (5'3,5'5] n=5 ec=2 les=4 3/3/3)
> [25,0] r=1 luod=0'0 lcod 5'4 active] enqueue_op 0x7fd5c404d300
> osd_sub_op(client4210.1:231 0.c26 10000002b03.000000e4/head [] v 5'6
> snapset=0=[]:[] snapc=0=[]) v3 osd.0.log-418423-2011-02-16
> 11:39:56.765131 7fd5d75e0940 -- 172.17.40.21:6801/10701 <== osd39
> 172.17.40.25:6822/18259 44 ==== osd_sub_op_reply(client4237.1:85 0.f51
> 1000000cf3d.00000054/head [] ondisk = 0) v1 ==== 127+0+0 (699474326 0 0)
> 0x2bfe000 con 0x7fd5c40f5280 osd.0.log-418424-2011-02-16 11:39:56.765144
> 7fd5d75e0940 osd0 5 _dispatch 0x2bfe000 osd_sub_op_reply(client4237.1:85
> 0.f51 1000000cf3d.00000054/head [] ondisk = 0) v1
> osd.0.log-418425-2011-02-16 11:39:56.765153 7fd5d75e0940 osd0 5
> require_same_or_newer_map 5 (i am 5) 0x2bfe000
> osd.0.log-418426-2011-02-16 11:39:56.765160 7fd5d75e0940 osd0 5
> _share_map_incoming osd39 172.17.40.25:6822/18259 5
> osd.0.log-418427-2011-02-16 11:39:56.765171 7fd5d75e0940 osd0 5
> pg[0.f51( v 5'3 (0'0,5'3] n=3 ec=3 les=4 3/3/3) [0,39] r=0 luod=5'2 lcod
> 5'2 mlcod 0'0 active+clean] enqueue_op 0x2bfe000
> osd_sub_op_reply(client4237.1:85 0.f51 1000000cf3d.00000054/head []
> ondisk = 0) v1 osd.0.log-418428-2011-02-16 11:39:56.765195 7fd5d75e0940
> -- 172.17.40.21:6801/10701 <== osd69 172.17.40.29:6816/32415 36 ====
> osd_sub_op(client4239.1:93 0.5d7 1000000f26e.0000005c/head [] v 5'2
> snapset=0=[]:[] snapc=0=[]) v3 ==== 532+0+4194764 (1547976727 0
> 29215660) 0x7fd5b40750d0 con 0x2c91590 osd.0.log-418429-2011-02-16
> 11:39:56.765210 7fd5d75e0940 osd0 5 _dispatch 0x7fd5b40750d0
> osd_sub_op(client4239.1:93 0.5d7 1000000f26e.0000005c/head [] v 5'2
> snapset=0=[]:[] snapc=0=[]) v3 osd.0.log-418430-2011-02-16
> 11:39:56.765230 7fd5d75e0940 osd0 5 handle_sub_op
> osd_sub_op(client4239.1:93 0.5d7 1000000f26e.0000005c/head [] v 5'2
> snapset=0=[]:[] snapc=0=[]) v3 epoch 5 osd.0.log-418431-2011-02-16
> 11:39:56.765239 7fd5d75e0940 osd0 5 require_same_or_newer_map 5 (i am 5)
> 0x7fd5b40750d0 osd.0.log-418432-2011-02-16 11:39:56.765247 7fd5d75e0940
> osd0 5 _share_map_incoming osd69 172.17.40.29:6816/32415 5
> osd.0.log-418433-2011-02-16 11:39:56.767977 7fd5db5e8940 osd0 5
> pg[0.b68( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [51,0] r=1 luod=0'0 lcod
> 0'0 active] sub_op_modify_commit on op osd_sub_op(client4257.1:265 0.b68
> 10000006d7c.00000103/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3,
> sending commit to osd51 osd.0.log-418434-2011-02-16 11:40:11.526201
> 7fd5dbde9940 journal queue_completions_thru seq 2253 queueing seq 2252
> 0x7fd5a0000ec0       <== 15 second gap osd.0.log-418435-2011-02-16
> 11:40:11.526229 7fd5dbde9940 journal queue_completions_thru seq 2253
> queueing seq 2253 0x7fd59800f790 osd.0.log-418436-2011-02-16
> 11:40:11.526242 7fd5dbde9940 journal write_thread throttle finished 3
> ops and 12586409 bytes, now 5 ops and 20977433 bytes
> osd.0.log-418437-2011-02-16 11:40:11.526278 7fd5dbde9940 journal room
> 501166079 max_size 526385152 pos 127389696 header.start 102174720 top
> 4096 osd.0.log-418438-2011-02-16 11:40:11.526285 7fd5dbde9940 journal
> check_for_full at 127389696 : 4202496 < 501166079
> osd.0.log-418439-2011-02-16 11:40:11.526291 7fd5dbde9940 journal
> prepare_single_write 1 will write 127389696 : seq 2254 len 4195516 ->
> 4202496 (head 40 pre_pad 3891 ebl 4195516 post_pad 3009 tail 40) (ebl
> alignment 3931) osd.0.log-418440-2011-02-16 11:40:11.526309 7fd5dbde9940
> journal room 496963583 max_size 526385152 pos 131592192 header.start
> 102174720 top 4096 osd.0.log-418441-2011-02-16 11:40:11.526316
> 7fd5dbde9940 journal check_for_full at 131592192 : 4202496 < 496963583
> osd.0.log-418442-2011-02-16 11:40:11.526321 7fd5dbde9940 journal
> prepare_single_write 2 will write 131592192 : seq 2255 len 4195464 ->
> 4202496 (head 40 pre_pad 3891 ebl 4195464 post_pad 3061 tail 40) (ebl
> alignment 3931) osd.0.log-418443-2011-02-16 11:40:11.526331 7fd5dbde9940
> journal room 492761087 max_size 526385152 pos 135794688 header.start
> 102174720 top 4096 osd.0.log-418444-2011-02-16 11:40:11.526337
> 7fd5dbde9940 journal check_for_full at 135794688 : 4202496 < 492761087
> osd.0.log-418445-2011-02-16 11:40:11.526343 7fd5dbde9940 journal
> prepare_single_write 3 will write 135794688 : seq 2256 len 4195446 ->
> 4202496 (head 40 pre_pad 3895 ebl 4195446 post_pad 3075 tail 40) (ebl
> alignment 3935) osd.0.log-418446-2011-02-16 11:40:11.526593 7fd5d6ddf940
> -- 172.17.40.21:6802/10701 <== osd42 172.17.40.26:6808/8729 877 ====
> osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3742513708 0 0) 0x7fd5c43e1c30 con
> 0x7fd5cc0e3d90
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at   http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-16 21:25 cosd multi-second stalls cause "wrongly marked me down" Jim Schutt
  2011-02-16 21:37 ` Wido den Hollander
@ 2011-02-16 21:40 ` Gregory Farnum
  2011-02-16 21:50   ` Jim Schutt
  1 sibling, 1 reply; 94+ messages in thread
From: Gregory Farnum @ 2011-02-16 21:40 UTC (permalink / raw)
  To: Jim Schutt; +Cc: ceph-devel


On Wednesday, February 16, 2011 at 1:25 PM, Jim Schutt wrote: 
> Hi,
> 
> I've been testing v0.24.3 w/ 64 clients against
> 1 mon, 1 mds, 96 osds. Under heavy write load I
> see:
>  [WRN] map e7 wrongly marked me down or wrong addr
> 
> I was able to sort through the logs and discover that when 
> this happens I have large gaps (10 seconds or more) in osd 
> heatbeat processing. In those heartbeat gaps I've discovered 
> long periods (5-15 seconds) where an osd logs nothing, even 
> though I am running with debug osd/filestore/journal = 20.
> 
> Is this a known issue?

You're running on btrfs? We've come across some issues involving very long sync times that I believe manifest like this. Sage is looking into them, although it's delayed at the moment thanks to FAST 11. :)
-Greg

> 
> 
> Below is what I've culled from my logs that show these gaps.
> Full logs available on request.
> 
> -- Jim
> 
> 
> # grep -n "wrongly" osd*.log | dshbak -c
> ----------------
> osd.17.log
> ----------------
> 472230:2011-02-16 11:40:29.340076 7fb6863d4940 log [WRN] : map e17 wrongly marked me down or wrong addr
> ----------------
> osd.46.log
> ----------------
> 489102:2011-02-16 11:40:45.756536 7f949e98c940 log [WRN] : map e25 wrongly marked me down or wrong addr
> ----------------
> osd.87.log
> ----------------
> 406661:2011-02-16 11:40:18.805586 7f0dfe3a7940 log [WRN] : map e7 wrongly marked me down or wrong addr
> ----------------
> osd.40.log
> ----------------
> 495401:2011-02-16 11:40:38.057711 7fa6681c5940 log [WRN] : map e21 wrongly marked me down or wrong addr
> 
> 
> # grep -n "no heartbeat from osd87" osd*.log | head -20 | dshbak -c
> ----------------
> osd.95.log
> ----------------
> 443261:2011-02-16 11:40:10.886318 7f4e5b53b940 osd95 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:49.424639 (cutoff 2011-02-16 11:39:50.886145)
> 443308:2011-02-16 11:40:10.887379 7f4e63f4e940 osd95 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:49.424639 (cutoff 2011-02-16 11:39:50.887216)
> 443865:2011-02-16 11:40:14.680998 7f4e5b53b940 osd95 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:49.424639 (cutoff 2011-02-16 11:39:54.680931)
> 443893:2011-02-16 11:40:14.681824 7f4e63f4e940 osd95 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:49.424639 (cutoff 2011-02-16 11:39:54.681752)
> ----------------
> osd.17.log
> ----------------
> 440651:2011-02-16 11:40:13.740999 7fb6821ca940 osd17 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:53.724161 (cutoff 2011-02-16 11:39:53.740937)
> 440763:2011-02-16 11:40:13.744726 7fb68abdd940 osd17 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:53.724161 (cutoff 2011-02-16 11:39:53.744673)
> ----------------
> osd.46.log
> ----------------
> 439491:2011-02-16 11:40:08.860936 7f9495ffb940 osd46 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:48.265285 (cutoff 2011-02-16 11:39:48.860878)
> ----------------
> osd.33.log
> ----------------
> 428947:2011-02-16 11:40:26.894541 7ffbed20c940 osd33 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:40:05.244130 (cutoff 2011-02-16 11:40:06.894512)
> 428950:2011-02-16 11:40:26.894686 7ffbf5c1f940 osd33 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:40:05.244130 (cutoff 2011-02-16 11:40:06.894669)
> ----------------
> osd.73.log
> ----------------
> 394823:2011-02-16 11:40:08.649240 7f47b060d940 osd73 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.962027 (cutoff 2011-02-16 11:39:48.649213)
> 394835:2011-02-16 11:40:08.655061 7f47b9020940 osd73 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.962027 (cutoff 2011-02-16 11:39:48.655034)
> 395138:2011-02-16 11:40:12.720296 7f47b060d940 osd73 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.962027 (cutoff 2011-02-16 11:39:52.720253)
> ----------------
> osd.0.log
> ----------------
> 418554:2011-02-16 11:40:11.534834 7fd5d3bd7940 osd0 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.939692 (cutoff 2011-02-16 11:39:51.534770)
> 418686:2011-02-16 11:40:11.568725 7fd5dc5ea940 osd0 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.939692 (cutoff 2011-02-16 11:39:51.549753)
> 418964:2011-02-16 11:40:13.380898 7fd5d3bd7940 osd0 5 heartbeat_check: no heartbeat from osd87 since 2011-02-16 11:39:46.939692 (cutoff 2011-02-16 11:39:53.380831)
> 
> 
> # grep -nH tick osd.87.log | egrep "11:(39:[3-9]|40:[0-2])"
> osd.87.log:379692:2011-02-16 11:39:30.678342 7f0e033b1940 osd87 5 tick
> osd.87.log:380284:2011-02-16 11:39:31.678652 7f0e033b1940 osd87 5 tick
> osd.87.log:380974:2011-02-16 11:39:32.681217 7f0e033b1940 osd87 5 tick
> osd.87.log:381406:2011-02-16 11:39:33.681646 7f0e033b1940 osd87 5 tick
> osd.87.log:382004:2011-02-16 11:39:34.681930 7f0e033b1940 osd87 5 tick
> osd.87.log:382660:2011-02-16 11:39:35.682177 7f0e033b1940 osd87 5 tick
> osd.87.log:383068:2011-02-16 11:39:36.686511 7f0e033b1940 osd87 5 tick
> osd.87.log:383849:2011-02-16 11:39:37.686750 7f0e033b1940 osd87 5 tick
> osd.87.log:384487:2011-02-16 11:39:38.687127 7f0e033b1940 osd87 5 tick
> osd.87.log:384561:2011-02-16 11:39:39.687908 7f0e033b1940 osd87 5 tick
> osd.87.log:386015:2011-02-16 11:39:41.936988 7f0e033b1940 osd87 5 tick
> osd.87.log:386467:2011-02-16 11:39:44.322215 7f0e033b1940 osd87 5 tick
> osd.87.log:388404:2011-02-16 11:39:46.399688 7f0e033b1940 osd87 5 tick
> osd.87.log:389153:2011-02-16 11:39:47.400058 7f0e033b1940 osd87 5 tick
> osd.87.log:389484:2011-02-16 11:39:48.403479 7f0e033b1940 osd87 5 tick <==
> osd.87.log:392292:2011-02-16 11:40:00.338113 7f0e033b1940 osd87 5 tick <== 12 second gap
> osd.87.log:392903:2011-02-16 11:40:01.339041 7f0e033b1940 osd87 5 tick
> osd.87.log:392948:2011-02-16 11:40:02.339450 7f0e033b1940 osd87 5 tick
> osd.87.log:394922:2011-02-16 11:40:04.740211 7f0e033b1940 osd87 5 tick
> osd.87.log:395597:2011-02-16 11:40:06.063388 7f0e033b1940 osd87 5 tick
> osd.87.log:395623:2011-02-16 11:40:07.063841 7f0e033b1940 osd87 5 tick <==
> osd.87.log:398449:2011-02-16 11:40:16.109719 7f0e033b1940 osd87 5 tick <== 9 second gap
> osd.87.log:400131:2011-02-16 11:40:17.934761 7f0e033b1940 osd87 5 tick
> osd.87.log:410005:2011-02-16 11:40:21.725596 7f0e033b1940 osd87 7 tick
> osd.87.log:412432:2011-02-16 11:40:22.725940 7f0e033b1940 osd87 11 tick
> osd.87.log:427258:2011-02-16 11:40:24.524376 7f0e033b1940 osd87 14 tick
> osd.87.log:432187:2011-02-16 11:40:25.524614 7f0e033b1940 osd87 14 tick
> osd.87.log:434222:2011-02-16 11:40:26.524970 7f0e033b1940 osd87 14 tick
> osd.87.log:438352:2011-02-16 11:40:27.525224 7f0e033b1940 osd87 15 tick
> osd.87.log:444226:2011-02-16 11:40:28.526490 7f0e033b1940 osd87 17 tick
> osd.87.log:447127:2011-02-16 11:40:29.529372 7f0e033b1940 osd87 17 tick
> 
> 
> # egrep -nHe "--> osd0 " osd.87.log | grep osd_ping | egrep "11:(39:[3-9]|40:[0-2])"
> osd.87.log:379735:2011-02-16 11:39:30.930841 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de40c7e50
> osd.87.log:380819:2011-02-16 11:39:32.232412 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0df86c2df0
> osd.87.log:381277:2011-02-16 11:39:33.233895 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de4041c20
> osd.87.log:381924:2011-02-16 11:39:34.638658 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de401dda0
> osd.87.log:382680:2011-02-16 11:39:35.842033 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de404baf0
> osd.87.log:383126:2011-02-16 11:39:36.845583 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de406dc20
> osd.87.log:384121:2011-02-16 11:39:38.047605 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de4039c20
> osd.87.log:384547:2011-02-16 11:39:38.949228 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x786c470
> osd.87.log:386105:2011-02-16 11:39:42.216853 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de411f380
> osd.87.log:386366:2011-02-16 11:39:42.826071 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0df808ac20
> osd.87.log:387864:2011-02-16 11:39:45.928938 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dc8401010
> osd.87.log:388691:2011-02-16 11:39:46.935489 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd800d910 <==
> osd.87.log:389579:2011-02-16 11:39:58.999668 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd800dd50 <== 12 second gap
> osd.87.log:391811:2011-02-16 11:39:59.604130 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0df82b1e50
> osd.87.log:392297:2011-02-16 11:40:00.405855 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x23121e0
> osd.87.log:392926:2011-02-16 11:40:02.098362 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd800b1d0
> osd.87.log:393945:2011-02-16 11:40:04.239303 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd803d6d0
> osd.87.log:395329:2011-02-16 11:40:05.240611 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0dd8033c80
> osd.87.log:395607:2011-02-16 11:40:06.142020 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de40c7e50 <==
> osd.87.log:398446:2011-02-16 11:40:16.109618 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de40c7e50 <== 10 second gap
> osd.87.log:399593:2011-02-16 11:40:16.712331 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de40d8e90
> osd.87.log:400034:2011-02-16 11:40:17.913637 7f0df67fc940 -- 172.17.40.31:6823/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f0de41b2c30
> osd.87.log:424702:2011-02-16 11:40:24.116636 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e0 as_of 14) v1 -- ?+0 0x7f0de404baf0
> osd.87.log:430175:2011-02-16 11:40:24.619852 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7f0de40cec30
> osd.87.log:431874:2011-02-16 11:40:25.221671 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7f0de404baf0
> osd.87.log:433563:2011-02-16 11:40:26.123080 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7f0de40cfc00
> osd.87.log:438357:2011-02-16 11:40:27.525371 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e15 as_of 15) v1 -- ?+0 0x7f0de412dc20
> osd.87.log:443114:2011-02-16 11:40:28.226793 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e17 as_of 17) v1 -- ?+0 0x7f0de40d3e10
> osd.87.log:445140:2011-02-16 11:40:28.829473 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e17 as_of 17) v1 -- ?+0 0x7f0de412ee20
> osd.87.log:446889:2011-02-16 11:40:29.431710 7f0df67fc940 -- 172.17.40.31:6825/23945 --> osd0 172.17.40.21:6802/10701 -- osd_ping(e17 as_of 17) v1 -- ?+0 0x7f0de40d3e10
> 
> 
> # grep -nH "<== osd87 " osd.0.log | grep osd_ping | egrep "11:(39:[3-9]|40:[0-2])"
> osd.0.log:405348:2011-02-16 11:39:30.934972 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 836 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3409844043 0 0) 0x2ce8260 con 0x7fd5cc0eb4e0
> osd.0.log:406113:2011-02-16 11:39:32.243478 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 837 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2347402994 0 0) 0x2d08250 con 0x7fd5cc0eb4e0
> osd.0.log:407084:2011-02-16 11:39:33.315517 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 838 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1541899367 0 0) 0x2ce8260 con 0x7fd5cc0eb4e0
> osd.0.log:407879:2011-02-16 11:39:34.643198 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 839 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3930157039 0 0) 0x2cb6990 con 0x7fd5cc0eb4e0
> osd.0.log:408685:2011-02-16 11:39:35.847410 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 840 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3997131881 0 0) 0x2cd0430 con 0x7fd5cc0eb4e0
> osd.0.log:409558:2011-02-16 11:39:36.850722 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 841 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2880002961 0 0) 0x2933750 con 0x7fd5cc0eb4e0
> osd.0.log:410150:2011-02-16 11:39:38.058936 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 842 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (422669297 0 0) 0x7fd5b4026b80 con 0x7fd5cc0eb4e0
> osd.0.log:410785:2011-02-16 11:39:38.951966 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 843 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2887960658 0 0) 0x7fd5b406d7e0 con 0x7fd5cc0eb4e0
> osd.0.log:412279:2011-02-16 11:39:42.328051 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 844 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (888536454 0 0) 0x7fd5c4434bf0 con 0x7fd5cc0eb4e0
> osd.0.log:413248:2011-02-16 11:39:44.591247 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 845 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3942869391 0 0) 0x7fd5c4422da0 con 0x7fd5cc0eb4e0
> osd.0.log:414564:2011-02-16 11:39:46.587410 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 846 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (536010906 0 0) 0x2d0cc70 con 0x7fd5cc0eb4e0
> osd.0.log:415059:2011-02-16 11:39:46.939610 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 847 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1952944560 0 0) 0x2cac660 con 0x7fd5cc0eb4e0 <==
> osd.0.log:418969:2011-02-16 11:40:13.382178 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 848 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1901796990 0 0) 0x7fd5a0016010 con 0x7fd5cc0eb4e0 <== 26 second gap
> osd.0.log:419358:2011-02-16 11:40:13.423453 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 849 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (687989221 0 0) 0x7fd5a0018160 con 0x7fd5cc0eb4e0
> osd.0.log:419573:2011-02-16 11:40:13.426088 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 850 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1942960184 0 0) 0x7fd5a0018340 con 0x7fd5cc0eb4e0
> osd.0.log:420236:2011-02-16 11:40:13.599647 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 851 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (1582392597 0 0) 0x7fd5a0018760 con 0x7fd5cc0eb4e0
> osd.0.log:420431:2011-02-16 11:40:13.638125 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 852 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (652463144 0 0) 0x7fd5a00189f0 con 0x7fd5cc0eb4e0
> osd.0.log:420737:2011-02-16 11:40:13.731877 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 853 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3298612745 0 0) 0x7fd5a0018d20 con 0x7fd5cc0eb4e0
> osd.0.log:420907:2011-02-16 11:40:13.743052 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 854 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3389558799 0 0) 0x7fd5a1024010 con 0x7fd5cc0eb4e0
> osd.0.log:423056:2011-02-16 11:40:20.484117 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 855 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (4197551345 0 0) 0x7fd5c4001b20 con 0x7fd5cc0eb4e0
> osd.0.log:423342:2011-02-16 11:40:20.522679 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 856 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (2447834242 0 0) 0x7fd5c40bac90 con 0x7fd5cc0eb4e0
> osd.0.log:423565:2011-02-16 11:40:20.551600 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6823/23945 857 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3477883131 0 0) 0x7fd5c4456cb0 con 0x7fd5cc0eb4e0
> osd.0.log:431891:2011-02-16 11:40:24.336220 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 1 ==== osd_ping(e0 as_of 14) v1 ==== 61+0+0 (981970893 0 0) 0x2b90d50 con 0x29a5e90
> osd.0.log:431957:2011-02-16 11:40:24.620956 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 2 ==== osd_ping(e14 as_of 14) v1 ==== 61+0+0 (2171246889 0 0) 0x29069e0 con 0x29a5e90
> osd.0.log:432146:2011-02-16 11:40:25.223233 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 3 ==== osd_ping(e14 as_of 14) v1 ==== 61+0+0 (2197848278 0 0) 0x8e7aa60 con 0x29a5e90
> osd.0.log:432373:2011-02-16 11:40:26.126494 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 4 ==== osd_ping(e14 as_of 14) v1 ==== 61+0+0 (2473034353 0 0) 0x2caec80 con 0x29a5e90
> osd.0.log:434792:2011-02-16 11:40:27.531515 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 5 ==== osd_ping(e15 as_of 15) v1 ==== 61+0+0 (1656259643 0 0) 0x2e10370 con 0x29a5e90
> osd.0.log:436542:2011-02-16 11:40:28.267676 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 6 ==== osd_ping(e17 as_of 17) v1 ==== 61+0+0 (414112261 0 0) 0x2931800 con 0x29a5e90
> osd.0.log:437755:2011-02-16 11:40:28.830618 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd87 172.17.40.31:6825/23945 7 ==== osd_ping(e17 as_of 17) v1 ==== 61+0+0 (1206729933 0 0) 0x29b5d80 con 0x29a5e90
> 
> 
> 
> # grep -nH -A 10000 "11:39:49.712017" osd.87.log | grep -B 10000 "11:39:58.813658"
> osd.87.log:389490:2011-02-16 11:39:49.712017 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) sync_entry committing 2101 sync_epoch 114
> osd.87.log-389491-2011-02-16 11:39:49.712521 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) taking async snap 'snap_2101'
> osd.87.log-389492-2011-02-16 11:39:49.750398 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) async snap create 'snap_2101' transid 134 got 0 Success
> osd.87.log-389493-2011-02-16 11:39:49.750455 7f0e01bae940 journal commit_started committing 2101, unblocking
> osd.87.log-389494-2011-02-16 11:39:49.750463 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) waiting for transid 134 to complete
> osd.87.log-389495-2011-02-16 11:39:49.813245 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) done waiting for transid 134 to complete
> osd.87.log-389496-2011-02-16 11:39:49.813306 7f0e01bae940 filestore(/ram/mnt/ceph/data.osd.0087) sync_entry commit took 0.101289
> osd.87.log-389497-2011-02-16 11:39:49.813313 7f0e01bae940 journal commit_finish thru 2101
> osd.87.log-389498-2011-02-16 11:39:49.813320 7f0e01bae940 journal committed_thru 2101 (last_committed_seq 2073) <==
> osd.87.log-389499-2011-02-16 11:39:58.800113 7f0dffbaa940 osd87 5 pg[0.14bc( v 5'3 (5'1,5'3] n=3 ec=2 les=4 3/3/3) [87,72] r=0 luod=5'2 lcod 5'2 mlcod 5'1 active+clean] update_stats 3'16 <== 9 second gap
> osd.87.log-389500-2011-02-16 11:39:58.800198 7f0dffbaa940 osd87 5 pg[0.14bc( v 5'3 (5'1,5'3] n=3 ec=2 les=4 3/3/3) [87,72] r=0 luod=5'2 lcod 5'2 mlcod 5'1 active+clean] eval_repop repgather(0x11493c30 applied 5'3 rep_tid=462 wfack=72 wfdisk=72,87 op=osd_op(client4232.1:72 10000009496.00000047 [write 0~4194304 [1@-1]] 0.94bc snapc 1=[])) wants=d
> osd.87.log-389501-2011-02-16 11:39:58.800233 7f0dffbaa940 osd87 5 pg[0.379( v 5'2 (0'0,5'2] n=2 ec=3 les=4 3/3/3) [77,87] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x2298400 op osd_sub_op(client4236.1:14 0.379 10000004a4b.0000000d/head [] v 5'2 snapset=0=[]:[] snapc=0=[]) v3
> osd.87.log-389502-2011-02-16 11:39:58.813597 7f0e01bae940 journal header: block_size 4096 alignment 4096 max_size 526385152
> osd.87.log-389503-2011-02-16 11:39:58.813614 7f0e01bae940 journal header: start 86712320
> osd.87.log-389504-2011-02-16 11:39:58.813620 7f0e01bae940 journal write_pos 86712320
> osd.87.log-389505-2011-02-16 11:39:58.813626 7f0e01bae940 journal queue_completions_thru seq 2101 queueing seq 2084 0x7f0df86a0cc0
> osd.87.log-389506-2011-02-16 11:39:58.813643 7f0e01bae940 journal queue_completions_thru seq 2101 queueing seq 2085 0x7f0df8237e80
> osd.87.log-389507-2011-02-16 11:39:58.813651 7f0e01bae940 journal queue_completions_thru seq 2101 queueing seq 2086 0x7f0df81b6c90
> osd.87.log-389508-2011-02-16 11:39:58.813658 7f0e01bae940 journal queue_completions_thru seq 2101 queueing seq 2087 0x7f0df8150fb0
> 
> 
> # grep -nH -A 10000 "11:40:09.777220" osd.87.log | grep -B 10000 "11:40:14.633381"
> osd.87.log:395664:2011-02-16 11:40:09.777220 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2129 0x7f0dd804f630
> osd.87.log-395665-2011-02-16 11:40:09.777227 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2130 0x7f0dc000b230
> osd.87.log-395666-2011-02-16 11:40:09.777233 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2131 0x7f0dd8000b40
> osd.87.log-395667-2011-02-16 11:40:09.777239 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2132 0x7f0dd80a5fe0
> osd.87.log-395668-2011-02-16 11:40:09.777245 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2133 0x7f0dd8013fe0
> osd.87.log-395669-2011-02-16 11:40:09.777250 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2134 0x7f0dd8055560
> osd.87.log-395670-2011-02-16 11:40:09.777258 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2135 0x7f0dd801da40
> osd.87.log-395671-2011-02-16 11:40:09.777264 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2136 0x7f0dc0013fe0
> osd.87.log-395672-2011-02-16 11:40:09.777270 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2137 0x7f0dd8005290
> osd.87.log-395673-2011-02-16 11:40:09.777277 7f0e01bae940 journal queue_completions_thru seq 2138 queueing seq 2138 0x7f0dd84c23f0
> osd.87.log-395674-2011-02-16 11:40:09.777285 7f0e01bae940 journal dropping committed but unwritten seq 2124 len 4195464
> osd.87.log-395675-2011-02-16 11:40:09.777312 7f0e01bae940 journal dropping committed but unwritten seq 2125 len 4195507
> osd.87.log-395676-2011-02-16 11:40:09.777339 7f0df7fff940 journal throttle: waited for bytes
> osd.87.log-395677-2011-02-16 11:40:09.777426 7f0e023af940 osd87 5 pg[0.f90( v 5'2 (0'0,5'2] n=2 ec=3 les=4 3/3/3) [45,87] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_commit on op osd_sub_op(client4196.1:130 0.f90 100000007d2.00000081/head [] v 5'2 snapset=0=[]:[] snapc=0=[]) v3, sending commit to osd45
> osd.87.log-395678-2011-02-16 11:40:09.777493 7f0dfc9a2940 journal throttle: waited for bytes <==
> osd.87.log-395679-2011-02-16 11:40:14.407019 7f0df36d9940 -- 172.17.40.31:6821/23945 >> 172.17.40.35:6800/20993 pipe(0x22b7660 sd=42 pgs=5 cs=1 l=1).reader couldn't read tag, Success <== 5 second gap
> osd.87.log-395680-2011-02-16 11:40:14.628871 7f0de3afa940 -- 172.17.40.31:6821/23945 >> 172.17.40.42:0/4169731470 pipe(0x22e9880 sd=129 pgs=6 cs=1 l=1).reader couldn't read tag, Success
> osd.87.log-395681-2011-02-16 11:40:14.628926 7f0dd41c3940 -- 172.17.40.31:6821/23945 >> 172.17.40.61:0/2426953348 pipe(0x2377140 sd=190 pgs=56 cs=1 l=1).reader couldn't read tag, Success
> osd.87.log-395682-2011-02-16 11:40:14.631264 7f0dd41c3940 -- 172.17.40.31:6821/23945 >> 172.17.40.61:0/2426953348 pipe(0x2377140 sd=190 pgs=56 cs=1 l=1).fault 0: Success
> osd.87.log-395683-2011-02-16 11:40:14.633050 7f0dd3fc1940 -- 172.17.40.31:6821/23945 >> 172.17.40.63:0/179283478 pipe(0x23776b0 sd=191 pgs=64 cs=1 l=1).reader couldn't read tag, Success
> osd.87.log-395684-2011-02-16 11:40:14.633130 7f0dd3fc1940 -- 172.17.40.31:6821/23945 >> 172.17.40.63:0/179283478 pipe(0x23776b0 sd=191 pgs=64 cs=1 l=1).fault 0: Success
> osd.87.log-395685-2011-02-16 11:40:14.633210 7f0dde2aa940 -- 172.17.40.31:6821/23945 >> 172.17.40.56:0/193342550 pipe(0x2385620 sd=154 pgs=15 cs=1 l=1).reader couldn't read tag, Success
> osd.87.log-395686-2011-02-16 11:40:14.633238 7f0dde2aa940 -- 172.17.40.31:6821/23945 >> 172.17.40.56:0/193342550 pipe(0x2385620 sd=154 pgs=15 cs=1 l=1).fault 0: Success
> osd.87.log-395687-2011-02-16 11:40:14.633308 7f0dde0a8940 -- 172.17.40.31:6821/23945 >> 172.17.40.66:0/2716831927 pipe(0x2317b20 sd=155 pgs=16 cs=1 l=1).reader couldn't read tag, Success
> osd.87.log-395688-2011-02-16 11:40:14.633335 7f0dde0a8940 -- 172.17.40.31:6821/23945 >> 172.17.40.66:0/2716831927 pipe(0x2317b20 sd=155 pgs=16 cs=1 l=1).fault 0: Success
> osd.87.log-395689-2011-02-16 11:40:14.633381 7f0dd65e5940 -- 172.17.40.31:6821/23945 >> 172.17.40.54:0/3922929349 pipe(0x22d49e0 sd=174 pgs=45 cs=1 l=1).reader couldn't read tag, Success
> 
> 
> # grep -nH -A 10000 "11:39:56.765107" osd.0.log | grep -B 10000 "11:40:11.526593"
> osd.0.log:418422:2011-02-16 11:39:56.765107 7fd5d75e0940 osd0 5 pg[0.c26( v 5'5 (5'3,5'5] n=5 ec=2 les=4 3/3/3) [25,0] r=1 luod=0'0 lcod 5'4 active] enqueue_op 0x7fd5c404d300 osd_sub_op(client4210.1:231 0.c26 10000002b03.000000e4/head [] v 5'6 snapset=0=[]:[] snapc=0=[]) v3
> osd.0.log-418423-2011-02-16 11:39:56.765131 7fd5d75e0940 -- 172.17.40.21:6801/10701 <== osd39 172.17.40.25:6822/18259 44 ==== osd_sub_op_reply(client4237.1:85 0.f51 1000000cf3d.00000054/head [] ondisk = 0) v1 ==== 127+0+0 (699474326 0 0) 0x2bfe000 con 0x7fd5c40f5280
> osd.0.log-418424-2011-02-16 11:39:56.765144 7fd5d75e0940 osd0 5 _dispatch 0x2bfe000 osd_sub_op_reply(client4237.1:85 0.f51 1000000cf3d.00000054/head [] ondisk = 0) v1
> osd.0.log-418425-2011-02-16 11:39:56.765153 7fd5d75e0940 osd0 5 require_same_or_newer_map 5 (i am 5) 0x2bfe000
> osd.0.log-418426-2011-02-16 11:39:56.765160 7fd5d75e0940 osd0 5 _share_map_incoming osd39 172.17.40.25:6822/18259 5
> osd.0.log-418427-2011-02-16 11:39:56.765171 7fd5d75e0940 osd0 5 pg[0.f51( v 5'3 (0'0,5'3] n=3 ec=3 les=4 3/3/3) [0,39] r=0 luod=5'2 lcod 5'2 mlcod 0'0 active+clean] enqueue_op 0x2bfe000 osd_sub_op_reply(client4237.1:85 0.f51 1000000cf3d.00000054/head [] ondisk = 0) v1
> osd.0.log-418428-2011-02-16 11:39:56.765195 7fd5d75e0940 -- 172.17.40.21:6801/10701 <== osd69 172.17.40.29:6816/32415 36 ==== osd_sub_op(client4239.1:93 0.5d7 1000000f26e.0000005c/head [] v 5'2 snapset=0=[]:[] snapc=0=[]) v3 ==== 532+0+4194764 (1547976727 0 29215660) 0x7fd5b40750d0 con 0x2c91590
> osd.0.log-418429-2011-02-16 11:39:56.765210 7fd5d75e0940 osd0 5 _dispatch 0x7fd5b40750d0 osd_sub_op(client4239.1:93 0.5d7 1000000f26e.0000005c/head [] v 5'2 snapset=0=[]:[] snapc=0=[]) v3
> osd.0.log-418430-2011-02-16 11:39:56.765230 7fd5d75e0940 osd0 5 handle_sub_op osd_sub_op(client4239.1:93 0.5d7 1000000f26e.0000005c/head [] v 5'2 snapset=0=[]:[] snapc=0=[]) v3 epoch 5
> osd.0.log-418431-2011-02-16 11:39:56.765239 7fd5d75e0940 osd0 5 require_same_or_newer_map 5 (i am 5) 0x7fd5b40750d0
> osd.0.log-418432-2011-02-16 11:39:56.765247 7fd5d75e0940 osd0 5 _share_map_incoming osd69 172.17.40.29:6816/32415 5
> osd.0.log-418433-2011-02-16 11:39:56.767977 7fd5db5e8940 osd0 5 pg[0.b68( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [51,0] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_commit on op osd_sub_op(client4257.1:265 0.b68 10000006d7c.00000103/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3, sending commit to osd51
> osd.0.log-418434-2011-02-16 11:40:11.526201 7fd5dbde9940 journal queue_completions_thru seq 2253 queueing seq 2252 0x7fd5a0000ec0 <== 15 second gap
> osd.0.log-418435-2011-02-16 11:40:11.526229 7fd5dbde9940 journal queue_completions_thru seq 2253 queueing seq 2253 0x7fd59800f790
> osd.0.log-418436-2011-02-16 11:40:11.526242 7fd5dbde9940 journal write_thread throttle finished 3 ops and 12586409 bytes, now 5 ops and 20977433 bytes
> osd.0.log-418437-2011-02-16 11:40:11.526278 7fd5dbde9940 journal room 501166079 max_size 526385152 pos 127389696 header.start 102174720 top 4096
> osd.0.log-418438-2011-02-16 11:40:11.526285 7fd5dbde9940 journal check_for_full at 127389696 : 4202496 < 501166079
> osd.0.log-418439-2011-02-16 11:40:11.526291 7fd5dbde9940 journal prepare_single_write 1 will write 127389696 : seq 2254 len 4195516 -> 4202496 (head 40 pre_pad 3891 ebl 4195516 post_pad 3009 tail 40) (ebl alignment 3931)
> osd.0.log-418440-2011-02-16 11:40:11.526309 7fd5dbde9940 journal room 496963583 max_size 526385152 pos 131592192 header.start 102174720 top 4096
> osd.0.log-418441-2011-02-16 11:40:11.526316 7fd5dbde9940 journal check_for_full at 131592192 : 4202496 < 496963583
> osd.0.log-418442-2011-02-16 11:40:11.526321 7fd5dbde9940 journal prepare_single_write 2 will write 131592192 : seq 2255 len 4195464 -> 4202496 (head 40 pre_pad 3891 ebl 4195464 post_pad 3061 tail 40) (ebl alignment 3931)
> osd.0.log-418443-2011-02-16 11:40:11.526331 7fd5dbde9940 journal room 492761087 max_size 526385152 pos 135794688 header.start 102174720 top 4096
> osd.0.log-418444-2011-02-16 11:40:11.526337 7fd5dbde9940 journal check_for_full at 135794688 : 4202496 < 492761087
> osd.0.log-418445-2011-02-16 11:40:11.526343 7fd5dbde9940 journal prepare_single_write 3 will write 135794688 : seq 2256 len 4195446 -> 4202496 (head 40 pre_pad 3895 ebl 4195446 post_pad 3075 tail 40) (ebl alignment 3935)
> osd.0.log-418446-2011-02-16 11:40:11.526593 7fd5d6ddf940 -- 172.17.40.21:6802/10701 <== osd42 172.17.40.26:6808/8729 877 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3742513708 0 0) 0x7fd5c43e1c30 con 0x7fd5cc0e3d90
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-16 21:40 ` Gregory Farnum
@ 2011-02-16 21:50   ` Jim Schutt
  2011-02-17  0:50     ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-02-16 21:50 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel


On Wed, 2011-02-16 at 14:40 -0700, Gregory Farnum wrote:
> On Wednesday, February 16, 2011 at 1:25 PM, Jim Schutt wrote:
> > Hi,
> >
> > I've been testing v0.24.3 w/ 64 clients against
> > 1 mon, 1 mds, 96 osds. Under heavy write load I
> > see:
> >  [WRN] map e7 wrongly marked me down or wrong addr
> >
> > I was able to sort through the logs and discover that when
> > this happens I have large gaps (10 seconds or more) in osd
> > heatbeat processing. In those heartbeat gaps I've discovered
> > long periods (5-15 seconds) where an osd logs nothing, even
> > though I am running with debug osd/filestore/journal = 20.
> >
> > Is this a known issue?
> 
> You're running on btrfs? 

Yep.

> We've come across some issues involving very long sync times that I believe manifest like this. Sage is looking into them, although it's delayed at the moment thanks to FAST 11. :)

OK, great.

-- Jim

> -Greg
> 




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-16 21:37 ` Wido den Hollander
@ 2011-02-16 21:51   ` Jim Schutt
  0 siblings, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-02-16 21:51 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel


On Wed, 2011-02-16 at 14:37 -0700, Wido den Hollander wrote:
> Hi,
> ----- Original message -----
> > Hi,
> >
> > I've been testing v0.24.3 w/ 64 clients against
> > 1 mon, 1 mds, 96 osds.   Under heavy write load I
> > see:
> >     [WRN] map e7 wrongly marked me down or wrong addr
> 
> I'm seeing the same, for example when the cluster is recovering.
> 
> My thougths is that it is something btrfs related where the OSD is
> hanging on, do you see the same? (Check the stack of the process
> in /proc). (Thnx colin!)
> 
> It showed me that it was stalling on btrfs ioctls.

Thanks, I'll try to see if I can catch this.

-- Jim

> 
> Wido



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-16 21:50   ` Jim Schutt
@ 2011-02-17  0:50     ` Sage Weil
  2011-02-17  0:54       ` Sage Weil
  2011-04-08 16:23       ` Jim Schutt
  0 siblings, 2 replies; 94+ messages in thread
From: Sage Weil @ 2011-02-17  0:50 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Wed, 16 Feb 2011, Jim Schutt wrote:
> On Wed, 2011-02-16 at 14:40 -0700, Gregory Farnum wrote:
> > On Wednesday, February 16, 2011 at 1:25 PM, Jim Schutt wrote:
> > > Hi,
> > >
> > > I've been testing v0.24.3 w/ 64 clients against
> > > 1 mon, 1 mds, 96 osds. Under heavy write load I
> > > see:
> > >  [WRN] map e7 wrongly marked me down or wrong addr
> > >
> > > I was able to sort through the logs and discover that when
> > > this happens I have large gaps (10 seconds or more) in osd
> > > heatbeat processing. In those heartbeat gaps I've discovered
> > > long periods (5-15 seconds) where an osd logs nothing, even
> > > though I am running with debug osd/filestore/journal = 20.
> > >
> > > Is this a known issue?
> > 
> > You're running on btrfs? 
> 
> Yep.

Are the cosd log files on the same btrfs volume as the btrfs data, or 
elsewhere?  The heartbeat thread takes some pains to avoid any locks that 
may be contented and do avoid any disk io, so in theory a btrfs stall 
shouldn't affect anything.  We may have missed something.. do you have a 
log showing this in action?

sage


> 
> > We've come across some issues involving very long sync times that I believe manifest like this. Sage is looking into them, although it's delayed at the moment thanks to FAST 11. :)
> 
> OK, great.
> 
> -- Jim
> 
> > -Greg
> > 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-17  0:50     ` Sage Weil
@ 2011-02-17  0:54       ` Sage Weil
  2011-02-17 15:46         ` Jim Schutt
  2011-04-08 16:23       ` Jim Schutt
  1 sibling, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-02-17  0:54 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Wed, 16 Feb 2011, Sage Weil wrote:
> shouldn't affect anything.  We may have missed something.. do you have a 
> log showing this in action?

Obviously yes, looking at your original email.  :)  At the beginning of 
each log line we include a thread id.  What would be really helpful would 
be to narrow down where in OSD::heartbeat_entry() and heartbeat() things 
are blocking, either based on the existing output, or by adding additional 
dout lines at interesting points in time.

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-17  0:54       ` Sage Weil
@ 2011-02-17 15:46         ` Jim Schutt
  2011-02-17 16:11           ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-02-17 15:46 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

Hi Sage,

On Wed, 2011-02-16 at 17:54 -0700, Sage Weil wrote:
> On Wed, 16 Feb 2011, Sage Weil wrote:
> > shouldn't affect anything.  We may have missed something.. do you have a 
> > log showing this in action?
> 
> Obviously yes, looking at your original email.  :)  At the beginning of 
> each log line we include a thread id.  What would be really helpful would 
> be to narrow down where in OSD::heartbeat_entry() and heartbeat() things 
> are blocking, either based on the existing output, or by adding additional 
> dout lines at interesting points in time.

I'll take a deeper look at my existing logs with
that in mind; let me know if you'd like me to
send you some.

I have also been looking at map_lock, as it seems
to be shared between the heartbeat and map update
threads.

Would instrumenting acquiring/releasing that lock
be helpful?  Is there some other lock that may
be more fruitful to instrument?  I can reproduce 
pretty reliably, so adding instrumentation is 
no problem.

-- Jim

> 
> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-17 15:46         ` Jim Schutt
@ 2011-02-17 16:11           ` Sage Weil
  2011-02-17 23:31             ` Jim Schutt
  2011-02-23 17:52             ` Jim Schutt
  0 siblings, 2 replies; 94+ messages in thread
From: Sage Weil @ 2011-02-17 16:11 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Thu, 17 Feb 2011, Jim Schutt wrote:
> Hi Sage,
> 
> On Wed, 2011-02-16 at 17:54 -0700, Sage Weil wrote:
> > On Wed, 16 Feb 2011, Sage Weil wrote:
> > > shouldn't affect anything.  We may have missed something.. do you have a 
> > > log showing this in action?
> > 
> > Obviously yes, looking at your original email.  :)  At the beginning of 
> > each log line we include a thread id.  What would be really helpful would 
> > be to narrow down where in OSD::heartbeat_entry() and heartbeat() things 
> > are blocking, either based on the existing output, or by adding additional 
> > dout lines at interesting points in time.
> 
> I'll take a deeper look at my existing logs with
> that in mind; let me know if you'd like me to
> send you some.
> 
> I have also been looking at map_lock, as it seems
> to be shared between the heartbeat and map update
> threads.
> 
> Would instrumenting acquiring/releasing that lock
> be helpful?  Is there some other lock that may
> be more fruitful to instrument?  I can reproduce 
> pretty reliably, so adding instrumentation is 
> no problem.

The heartbeat thread is doing a map_lock.try_get_read() because it 
frequently is held by another thread, so that shouldn't ever block. 

The possibilities I see are:
 - peer_stat_lock
 - the monc->sub_want / renew_subs calls (monc has an internal lock), 
although that code should only trigger with a single osd.  :/
 - heartbeat_lock itself could be held by another thread; i'd instrument 
all locks/unlocks there, along with the wakeup in heartbeat().

Thanks for looking at this!
sage



> 
> -- Jim
> 
> > 
> > sage
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-17 16:11           ` Sage Weil
@ 2011-02-17 23:31             ` Jim Schutt
  2011-02-18  7:13               ` Sage Weil
  2011-03-09 16:02               ` Jim Schutt
  2011-02-23 17:52             ` Jim Schutt
  1 sibling, 2 replies; 94+ messages in thread
From: Jim Schutt @ 2011-02-17 23:31 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Thu, 2011-02-17 at 09:11 -0700, Sage Weil wrote:
> On Thu, 17 Feb 2011, Jim Schutt wrote:
> > Hi Sage,
> > 
> > On Wed, 2011-02-16 at 17:54 -0700, Sage Weil wrote:
> > > On Wed, 16 Feb 2011, Sage Weil wrote:
> > > > shouldn't affect anything.  We may have missed something.. do you have a 
> > > > log showing this in action?
> > > 
> > > Obviously yes, looking at your original email.  :)  At the beginning of 
> > > each log line we include a thread id.  What would be really helpful would 
> > > be to narrow down where in OSD::heartbeat_entry() and heartbeat() things 
> > > are blocking, either based on the existing output, or by adding additional 
> > > dout lines at interesting points in time.
> > 
> > I'll take a deeper look at my existing logs with
> > that in mind; let me know if you'd like me to
> > send you some.
> > 
> > I have also been looking at map_lock, as it seems
> > to be shared between the heartbeat and map update
> > threads.
> > 
> > Would instrumenting acquiring/releasing that lock
> > be helpful?  Is there some other lock that may
> > be more fruitful to instrument?  I can reproduce 
> > pretty reliably, so adding instrumentation is 
> > no problem.
> 
> The heartbeat thread is doing a map_lock.try_get_read() because it 
> frequently is held by another thread, so that shouldn't ever block. 
> 
> The possibilities I see are:
>  - peer_stat_lock
>  - the monc->sub_want / renew_subs calls (monc has an internal lock), 
> although that code should only trigger with a single osd.  :/
>  - heartbeat_lock itself could be held by another thread; i'd instrument 
> all locks/unlocks there, along with the wakeup in heartbeat().

If I did the instrumentation right, there's no sign that
any of these locks are contended.

So, I decided to instrument OSD::tick() like this:

---
 src/osd/OSD.cc |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 76b8af8..dab6054 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -1530,8 +1530,10 @@ void OSD::tick()
 
   // periodically kick recovery work queue
   recovery_tp.kick();
-  
+
+  dout(20) << "tick getting read lock on map_lock" << dendl;
   map_lock.get_read();
+  dout(20) << "tick got read lock on map_lock" << dendl;
 
   if (scrub_should_schedule()) {
     sched_scrub();
@@ -1544,11 +1546,13 @@ void OSD::tick()
   check_replay_queue();
 
   // mon report?
+  dout(20) << "tick sending mon report" << dendl;
   utime_t now = g_clock.now();
   if (now - last_mon_report > g_conf.osd_mon_report_interval)
     do_mon_report();
 
   // remove stray pgs?
+  dout(20) << "tick removing stray pgs" << dendl;
   remove_list_lock.Lock();
   for (map<epoch_t, map<int, vector<pg_t> > >::iterator p = remove_list.begin();
        p != remove_list.end();
@@ -1566,19 +1570,23 @@ void OSD::tick()
 
   map_lock.put_read();
 
+  dout(20) << "tick sending log to logclient" << dendl;
   logclient.send_log();
 
+  dout(20) << "tick arming timer for next tick" << dendl;
   timer.add_event_after(1.0, new C_Tick(this));
 
   // only do waiters if dispatch() isn't currently running.  (if it is,
   // it'll do the waiters, and doing them here may screw up ordering
   // of op_queue vs handle_osd_map.)
+  dout(20) << "tick checking dispatch queue status" << dendl;
   if (!dispatch_running) {
     dispatch_running = true;
     do_waiters();
     dispatch_running = false;
     dispatch_cond.Signal();
   }
+  dout(20) << "tick done" << dendl;
 }
 
Check out the result:

osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick
osd.68.log:256028:2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock
osd.68.log:256029:2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock
osd.68.log:256031:2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report
osd.68.log:256032:2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs
osd.68.log:256033:2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient
osd.68.log:256034:2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick     <==
osd.68.log:256277:2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status <== 28 second gap
osd.68.log:256278:2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done
osd.68.log:256279:2011-02-17 15:45:18.481691 7fd42ad57940 osd68 5 tick
osd.68.log:256280:2011-02-17 15:45:18.481705 7fd42ad57940 osd68 5 tick getting read lock on map_lock
osd.68.log:256281:2011-02-17 15:45:18.481712 7fd42ad57940 osd68 5 tick got read lock on map_lock
osd.68.log:256688:2011-02-17 15:45:20.010705 7fd42ad57940 osd68 5 tick sending mon report
osd.68.log:256753:2011-02-17 15:45:20.012950 7fd42ad57940 osd68 5 tick removing stray pgs
osd.68.log:256754:2011-02-17 15:45:20.012959 7fd42ad57940 osd68 5 tick sending log to logclient
osd.68.log:256755:2011-02-17 15:45:20.012965 7fd42ad57940 osd68 5 tick arming timer for next tick
osd.68.log:256756:2011-02-17 15:45:20.012976 7fd42ad57940 osd68 5 tick checking dispatch queue status
osd.68.log:256757:2011-02-17 15:45:20.012993 7fd42ad57940 osd68 5 tick done

Why should it take 28 seconds to add a new timer event?

Here's the full log spanning that 28 second gap in tick():

osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick
osd.68.log-256028-2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock
osd.68.log-256029-2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock
osd.68.log-256030-2011-02-17 15:44:50.141540 7fd42ad57940 osd68 5 scrub_should_schedule loadavg 32.73 >= max 1.25 = no, load too high
osd.68.log-256031-2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report
osd.68.log-256032-2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs
osd.68.log-256033-2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient
osd.68.log-256034-2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick
osd.68.log-256035-2011-02-17 15:44:52.813591 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry woke after 5.002862
osd.68.log-256036-2011-02-17 15:44:52.813653 7fd429554940 journal commit_start op_seq 3082, applied_seq 3082, committed_seq 3069
osd.68.log-256037-2011-02-17 15:44:52.813661 7fd429554940 journal commit_start committing 3082, still blocked
osd.68.log-256038-2011-02-17 15:44:52.813666 7fd429554940 journal commit_start
osd.68.log-256039-2011-02-17 15:44:52.813672 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry committing 3082 sync_epoch 48
osd.68.log-256040-2011-02-17 15:44:52.813945 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) taking async snap 'snap_3082'
osd.68.log-256041-2011-02-17 15:44:53.693394 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) async snap create 'snap_3082' transid 65 got 0 Success
osd.68.log-256042-2011-02-17 15:44:53.693437 7fd429554940 journal commit_started committing 3082, unblocking
osd.68.log-256043-2011-02-17 15:44:53.693445 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068)  waiting for transid 65 to complete
osd.68.log-256044-2011-02-17 15:44:53.751214 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068)  done waiting for transid 65 to complete
osd.68.log-256045-2011-02-17 15:44:53.751286 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry commit took 0.937614
osd.68.log-256046-2011-02-17 15:44:53.751295 7fd429554940 journal commit_finish thru 3082
osd.68.log-256047-2011-02-17 15:44:55.793734 7fd42554c940 -- 172.17.40.29:6814/12558 <== osd29 172.17.40.24:6817/11926 286 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (4012807742 0 0) 0x7fd404061cf0 con 0xeef270
osd.68.log-256048-2011-02-17 15:44:55.793804 7fd42554c940 osd68 5 heartbeat_dispatch 0x7fd404061cf0
osd.68.log-256049-2011-02-17 15:44:55.793811 7fd42554c940 osd68 5 handle_osd_ping from osd29 got stat stat(2011-02-17 15:44:49.760084 oprate=7.39631 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
osd.68.log-256050-2011-02-17 15:44:55.936072 7fd42a556940 journal write_thread_entry going to sleep
osd.68.log-256051-2011-02-17 15:44:55.936131 7fd429554940 journal committed_thru 3082 (last_committed_seq 3069)
osd.68.log-256052-2011-02-17 15:44:55.936142 7fd429554940 journal header: block_size 4096 alignment 4096 max_size 526385152
osd.68.log-256053-2011-02-17 15:44:55.936149 7fd429554940 journal header: start 499519488
osd.68.log-256054-2011-02-17 15:44:55.936154 7fd429554940 journal  write_pos 499519488
osd.68.log-256055-2011-02-17 15:44:55.936160 7fd429554940 journal committed_thru done
osd.68.log-256056-2011-02-17 15:44:55.936175 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) removing snap 'snap_3038'
osd.68.log-256057-2011-02-17 15:44:56.030632 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry committed to op_seq 3082
osd.68.log-256058-2011-02-17 15:44:56.030670 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry waiting for max_interval 5.000000
osd.68.log-256059-2011-02-17 15:44:57.763019 7fd425d4d940 -- 172.17.40.29:6813/12558 <== osd31 172.17.40.24:6822/12076 34 ==== osd_sub_op_reply(client4196.1:143 0.b03 100000003e9.0000008e/head [] ack = 0) v1 ==== 127+0+0 (603260922 0 0) 0x7fd41c005fe0 con 0xe2ec20
osd.68.log-256060-2011-02-17 15:45:01.030890 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry woke after 5.000218
osd.68.log-256061-2011-02-17 15:45:01.030964 7fd429554940 journal commit_start op_seq 3082, applied_seq 3082, committed_seq 3082
osd.68.log-256062-2011-02-17 15:45:01.030970 7fd429554940 journal commit_start nothing to do
osd.68.log-256063-2011-02-17 15:45:01.030976 7fd429554940 journal commit_start
osd.68.log-256064-2011-02-17 15:45:01.030986 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry waiting for max_interval 5.000000
osd.68.log-256065-2011-02-17 15:45:06.031075 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry woke after 5.000087
osd.68.log-256066-2011-02-17 15:45:06.031104 7fd429554940 journal commit_start op_seq 3082, applied_seq 3082, committed_seq 3082
osd.68.log-256067-2011-02-17 15:45:06.031114 7fd429554940 journal commit_start nothing to do
osd.68.log-256068-2011-02-17 15:45:06.031122 7fd429554940 journal commit_start
osd.68.log-256069-2011-02-17 15:45:06.031134 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry waiting for max_interval 5.000000
osd.68.log-256070-2011-02-17 15:45:10.642177 7fd40adee940 -- 172.17.40.29:6812/12558 >> 172.17.40.58:0/3091507398 pipe(0x7fd3fc0ed010 sd=214 pgs=90 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256071-2011-02-17 15:45:10.642274 7fd40adee940 -- 172.17.40.29:6812/12558 >> 172.17.40.58:0/3091507398 pipe(0x7fd3fc0ed010 sd=214 pgs=90 cs=1 l=1).fault 0: Success
osd.68.log-256072-2011-02-17 15:45:10.642431 7fd40aff0940 -- 172.17.40.29:6812/12558 >> 172.17.40.74:0/2165351138 pipe(0xdbac20 sd=156 pgs=6 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256073-2011-02-17 15:45:10.642461 7fd40aff0940 -- 172.17.40.29:6812/12558 >> 172.17.40.74:0/2165351138 pipe(0xdbac20 sd=156 pgs=6 cs=1 l=1).fault 0: Success
osd.68.log-256074-2011-02-17 15:45:10.642606 7fd3fb0f2940 -- 172.17.40.29:6812/12558 >> 172.17.40.57:0/579985139 pipe(0x105ec30 sd=198 pgs=65 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256075-2011-02-17 15:45:10.642670 7fd3fb0f2940 -- 172.17.40.29:6812/12558 >> 172.17.40.57:0/579985139 pipe(0x105ec30 sd=198 pgs=65 cs=1 l=1).fault 0: Success
osd.68.log-256076-2011-02-17 15:45:10.642790 7fd408cd9940 -- 172.17.40.29:6812/12558 >> 172.17.40.65:0/2115694822 pipe(0x1032d20 sd=161 pgs=10 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256077-2011-02-17 15:45:10.642818 7fd408cd9940 -- 172.17.40.29:6812/12558 >> 172.17.40.65:0/2115694822 pipe(0x1032d20 sd=161 pgs=10 cs=1 l=1).fault 0: Success
osd.68.log-256078-2011-02-17 15:45:10.643238 7fd3faff1940 -- 172.17.40.29:6812/12558 >> 172.17.40.66:0/892415451 pipe(0xc7fe50 sd=199 pgs=69 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256079-2011-02-17 15:45:10.643268 7fd3faff1940 -- 172.17.40.29:6812/12558 >> 172.17.40.66:0/892415451 pipe(0xc7fe50 sd=199 pgs=69 cs=1 l=1).fault 0: Success
osd.68.log-256080-2011-02-17 15:45:10.643291 7fd4025e5940 -- 172.17.40.29:6812/12558 >> 172.17.40.70:0/2135188772 pipe(0xf88c20 sd=176 pgs=44 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256081-2011-02-17 15:45:10.643362 7fd4025e5940 -- 172.17.40.29:6812/12558 >> 172.17.40.70:0/2135188772 pipe(0xf88c20 sd=176 pgs=44 cs=1 l=1).fault 0: Success
osd.68.log-256082-2011-02-17 15:45:10.643507 7fd40a4e9940 -- 172.17.40.29:6812/12558 >> 172.17.40.62:0/59132854 pipe(0xd72cf0 sd=157 pgs=5 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256083-2011-02-17 15:45:10.643622 7fd40a4e9940 -- 172.17.40.29:6812/12558 >> 172.17.40.62:0/59132854 pipe(0xd72cf0 sd=157 pgs=5 cs=1 l=1).fault 0: Success
osd.68.log-256084-2011-02-17 15:45:10.643870 7fd409ce3940 -- 172.17.40.29:6812/12558 >> 172.17.40.84:0/103596728 pipe(0xca56a0 sd=159 pgs=5 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256085-2011-02-17 15:45:10.643902 7fd409ce3940 -- 172.17.40.29:6812/12558 >> 172.17.40.84:0/103596728 pipe(0xca56a0 sd=159 pgs=5 cs=1 l=1).fault 0: Success
osd.68.log-256086-2011-02-17 15:45:10.644044 7fd3f88d2940 -- 172.17.40.29:6812/12558 >> 172.17.40.77:0/3894807147 pipe(0x7fd3fc008fc0 sd=210 pgs=83 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256087-2011-02-17 15:45:10.644084 7fd3f88d2940 -- 172.17.40.29:6812/12558 >> 172.17.40.77:0/3894807147 pipe(0x7fd3fc008fc0 sd=210 pgs=83 cs=1 l=1).fault 0: Success
osd.68.log-256088-2011-02-17 15:45:10.644274 7fd4009c9940 -- 172.17.40.29:6812/12558 >> 172.17.40.63:0/118294342 pipe(0xd53c20 sd=190 pgs=44 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256089-2011-02-17 15:45:10.644302 7fd4009c9940 -- 172.17.40.29:6812/12558 >> 172.17.40.63:0/118294342 pipe(0xd53c20 sd=190 pgs=44 cs=1 l=1).fault 0: Success
osd.68.log-256090-2011-02-17 15:45:10.644446 7fd40abec940 -- 172.17.40.29:6812/12558 >> 172.17.40.76:0/3517121089 pipe(0x7fd3fc5de370 sd=215 pgs=86 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256091-2011-02-17 15:45:10.644509 7fd40abec940 -- 172.17.40.29:6812/12558 >> 172.17.40.76:0/3517121089 pipe(0x7fd3fc5de370 sd=215 pgs=86 cs=1 l=1).fault 0: Success
osd.68.log-256092-2011-02-17 15:45:10.644602 7fd3f9bdd940 -- 172.17.40.29:6812/12558 >> 172.17.40.55:0/1343812630 pipe(0x7fd3fc008250 sd=209 pgs=75 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256093-2011-02-17 15:45:10.644636 7fd3f9bdd940 -- 172.17.40.29:6812/12558 >> 172.17.40.55:0/1343812630 pipe(0x7fd3fc008250 sd=209 pgs=75 cs=1 l=1).fault 0: Success
osd.68.log-256094-2011-02-17 15:45:10.645375 7fd401bdb940 -- 172.17.40.29:6812/12558 >> 172.17.40.72:0/658453639 pipe(0xe3ad20 sd=182 pgs=28 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256095-2011-02-17 15:45:10.645394 7fd401bdb940 -- 172.17.40.29:6812/12558 >> 172.17.40.72:0/658453639 pipe(0xe3ad20 sd=182 pgs=28 cs=1 l=1).fault 0: Success
osd.68.log-256096-2011-02-17 15:45:10.645714 7fd4033f3940 -- 172.17.40.29:6812/12558 >> 172.17.40.85:0/2266176512 pipe(0x1028c50 sd=169 pgs=9 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256097-2011-02-17 15:45:10.645740 7fd4033f3940 -- 172.17.40.29:6812/12558 >> 172.17.40.85:0/2266176512 pipe(0x1028c50 sd=169 pgs=9 cs=1 l=1).fault 0: Success
osd.68.log-256098-2011-02-17 15:45:10.646231 7fd3fa0e2940 -- 172.17.40.29:6812/12558 >> 172.17.40.103:0/1723900743 pipe(0x7fd3fc002d70 sd=206 pgs=85 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256099-2011-02-17 15:45:10.646304 7fd3fa0e2940 -- 172.17.40.29:6812/12558 >> 172.17.40.103:0/1723900743 pipe(0x7fd3fc002d70 sd=206 pgs=85 cs=1 l=1).fault 0: Success
osd.68.log-256100-2011-02-17 15:45:10.646372 7fd3f42ae940 -- 172.17.40.29:6812/12558 >> 172.17.40.101:0/1743123736 pipe(0x7fd3fc019720 sd=211 pgs=40 cs=1 l=1).reader couldn't read tag, Success
osd.68.log-256101-2011-02-17 15:45:10.646397 7fd3f42ae940 -- 172.17.40.29:6812/12558 >> 172.17.40.101:0/1743123736 pipe(0x7fd3fc019720 sd=211 pgs=40 cs=1 l=1).fault 0: Success
osd.68.log-256102-2011-02-17 15:45:10.844357 7fd410747940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6822/11833 pipe(0x7fd41c03bc20 sd=115 pgs=65 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256103-2011-02-17 15:45:10.844395 7fd410747940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6822/11833 pipe(0x7fd41c03bc20 sd=115 pgs=65 cs=1 l=0).fault 0: Success
osd.68.log-256104-2011-02-17 15:45:10.844418 7fd410747940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6822/11833 pipe(0x7fd41c03bc20 sd=115 pgs=65 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256105-2011-02-17 15:45:11.031203 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry woke after 5.000067
osd.68.log-256106-2011-02-17 15:45:11.031233 7fd429554940 journal commit_start op_seq 3082, applied_seq 3082, committed_seq 3082
osd.68.log-256107-2011-02-17 15:45:11.031240 7fd429554940 journal commit_start nothing to do
osd.68.log-256108-2011-02-17 15:45:11.031246 7fd429554940 journal commit_start
osd.68.log-256109-2011-02-17 15:45:11.031254 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry waiting for max_interval 5.000000
osd.68.log-256110-2011-02-17 15:45:11.193386 7fd4183c3940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6801/12830 pipe(0xeba000 sd=55 pgs=53 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256111-2011-02-17 15:45:11.193424 7fd4183c3940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6801/12830 pipe(0xeba000 sd=55 pgs=53 cs=1 l=0).fault 0: Success
osd.68.log-256112-2011-02-17 15:45:11.193451 7fd4183c3940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6801/12830 pipe(0xeba000 sd=55 pgs=53 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256113-2011-02-17 15:45:11.291656 7fd4199d9940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6810/12606 pipe(0xec3c20 sd=43 pgs=48 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256114-2011-02-17 15:45:11.291738 7fd4199d9940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6810/12606 pipe(0xec3c20 sd=43 pgs=48 cs=1 l=0).fault 0: Success
osd.68.log-256115-2011-02-17 15:45:11.291766 7fd4199d9940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6810/12606 pipe(0xec3c20 sd=43 pgs=48 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256116-2011-02-17 15:45:11.699576 7fd40b3f4940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6820/12848 pipe(0x7fd41c095d30 sd=150 pgs=110 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256117-2011-02-17 15:45:15.907490 7fd40b3f4940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6820/12848 pipe(0x7fd41c095d30 sd=150 pgs=110 cs=1 l=0).fault 0: Success
osd.68.log-256118-2011-02-17 15:45:15.907579 7fd40b3f4940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6820/12848 pipe(0x7fd41c095d30 sd=150 pgs=110 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256119-2011-02-17 15:45:15.907594 7fd416faf940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6804/12907 pipe(0xeef800 sd=65 pgs=59 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256120-2011-02-17 15:45:15.907616 7fd416faf940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6804/12907 pipe(0xeef800 sd=65 pgs=59 cs=1 l=0).fault 0: Success
osd.68.log-256121-2011-02-17 15:45:15.907637 7fd416faf940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6804/12907 pipe(0xeef800 sd=65 pgs=59 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256122-2011-02-17 15:45:15.907655 7fd417ebe940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6813/13164 pipe(0xed7260 sd=59 pgs=60 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256123-2011-02-17 15:45:15.907675 7fd417ebe940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6813/13164 pipe(0xed7260 sd=59 pgs=60 cs=1 l=0).fault 0: Success
osd.68.log-256124-2011-02-17 15:45:15.907695 7fd417ebe940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6813/13164 pipe(0xed7260 sd=59 pgs=60 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256125-2011-02-17 15:45:15.907734 7fd412363940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6814/13164 pipe(0x7fd41c116c30 sd=70 pgs=127 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256126-2011-02-17 15:45:15.907763 7fd412363940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6814/13164 pipe(0x7fd41c116c30 sd=70 pgs=127 cs=1 l=0).fault 0: Success
osd.68.log-256127-2011-02-17 15:45:15.907814 7fd412363940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6814/13164 pipe(0x7fd41c116c30 sd=70 pgs=127 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256128-2011-02-17 15:45:15.907829 7fd420433940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6819/11475 pipe(0xd4fd40 sd=23 pgs=54 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256129-2011-02-17 15:45:15.907850 7fd420433940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6819/11475 pipe(0xd4fd40 sd=23 pgs=54 cs=1 l=0).fault 0: Success
osd.68.log-256130-2011-02-17 15:45:15.907871 7fd420433940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6819/11475 pipe(0xd4fd40 sd=23 pgs=54 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256131-2011-02-17 15:45:15.909670 7fd40cc0c940 -- 172.17.40.29:6814/12558 >> 172.17.40.25:6805/12657 pipe(0x7fd41c378cc0 sd=82 pgs=111 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256132-2011-02-17 15:45:15.910080 7fd40cc0c940 -- 172.17.40.29:6814/12558 >> 172.17.40.25:6805/12657 pipe(0x7fd41c378cc0 sd=82 pgs=111 cs=1 l=0).fault 0: Success
osd.68.log-256133-2011-02-17 15:45:15.910108 7fd40cc0c940 -- 172.17.40.29:6814/12558 >> 172.17.40.25:6805/12657 pipe(0x7fd41c378cc0 sd=82 pgs=111 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256134-2011-02-17 15:45:15.910127 7fd411e5e940 -- 172.17.40.29:6814/12558 >> 172.17.40.26:6820/11475 pipe(0x7fd41c161d60 sd=86 pgs=131 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256135-2011-02-17 15:45:15.910194 7fd411e5e940 -- 172.17.40.29:6814/12558 >> 172.17.40.26:6820/11475 pipe(0x7fd41c161d60 sd=86 pgs=131 cs=1 l=0).fault 0: Success
osd.68.log-256136-2011-02-17 15:45:15.910215 7fd411e5e940 -- 172.17.40.29:6814/12558 >> 172.17.40.26:6820/11475 pipe(0x7fd41c161d60 sd=86 pgs=131 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256137-2011-02-17 15:45:15.910228 7fd420938940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6810/11227 pipe(0xf20c30 sd=20 pgs=57 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256138-2011-02-17 15:45:15.910247 7fd420938940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6810/11227 pipe(0xf20c30 sd=20 pgs=57 cs=1 l=0).fault 0: Success
osd.68.log-256139-2011-02-17 15:45:15.910266 7fd420938940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6810/11227 pipe(0xf20c30 sd=20 pgs=57 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256140-2011-02-17 15:45:15.910291 7fd410040940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6816/13237 pipe(0x7fd41c31ac20 sd=116 pgs=70 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256141-2011-02-17 15:45:15.910311 7fd410040940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6816/13237 pipe(0x7fd41c31ac20 sd=116 pgs=70 cs=1 l=0).fault 0: Success
osd.68.log-256142-2011-02-17 15:45:15.910332 7fd410040940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6816/13237 pipe(0x7fd41c31ac20 sd=116 pgs=70 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256143-2011-02-17 15:45:15.910350 7fd40bafb940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6802/12017 pipe(0x7fd41c02db90 sd=143 pgs=129 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256144-2011-02-17 15:45:15.910597 7fd40bafb940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6802/12017 pipe(0x7fd41c02db90 sd=143 pgs=129 cs=1 l=0).fault 0: Success
osd.68.log-256145-2011-02-17 15:45:15.910618 7fd40bafb940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6802/12017 pipe(0x7fd41c02db90 sd=143 pgs=129 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256146-2011-02-17 15:45:15.910667 7fd40c909940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6820/11999 pipe(0x7fd41c141ce0 sd=110 pgs=106 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256147-2011-02-17 15:45:15.912873 7fd40c909940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6820/11999 pipe(0x7fd41c141ce0 sd=110 pgs=106 cs=1 l=0).fault 0: Success
osd.68.log-256148-2011-02-17 15:45:15.912926 7fd40c909940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6820/11999 pipe(0x7fd41c141ce0 sd=110 pgs=106 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256149-2011-02-17 15:45:15.912944 7fd41befe940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6822/11561 pipe(0xd70c20 sd=25 pgs=58 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256150-2011-02-17 15:45:15.912965 7fd41befe940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6822/11561 pipe(0xd70c20 sd=25 pgs=58 cs=1 l=0).fault 0: Success
osd.68.log-256151-2011-02-17 15:45:15.912986 7fd41befe940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6822/11561 pipe(0xd70c20 sd=25 pgs=58 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256152-2011-02-17 15:45:15.913111 7fd4195d5940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6822/12076 pipe(0xc72d50 sd=46 pgs=58 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256153-2011-02-17 15:45:15.913133 7fd4195d5940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6822/12076 pipe(0xc72d50 sd=46 pgs=58 cs=1 l=0).fault 0: Success
osd.68.log-256154-2011-02-17 15:45:15.913156 7fd4195d5940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6822/12076 pipe(0xc72d50 sd=46 pgs=58 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256155-2011-02-17 15:45:15.913184 7fd40b8f9940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6820/13948 pipe(0x7fd41c10fb90 sd=142 pgs=129 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256156-2011-02-17 15:45:15.914376 7fd40b8f9940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6820/13948 pipe(0x7fd41c10fb90 sd=142 pgs=129 cs=1 l=0).fault 0: Success
osd.68.log-256157-2011-02-17 15:45:15.914399 7fd40b8f9940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6820/13948 pipe(0x7fd41c10fb90 sd=142 pgs=129 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256158-2011-02-17 15:45:15.914423 7fd412c6c940 -- 172.17.40.29:6813/12558 >> 172.17.40.32:6816/12882 pipe(0xea1f90 sd=98 pgs=61 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256159-2011-02-17 15:45:15.914443 7fd412c6c940 -- 172.17.40.29:6813/12558 >> 172.17.40.32:6816/12882 pipe(0xea1f90 sd=98 pgs=61 cs=1 l=0).fault 0: Success
osd.68.log-256160-2011-02-17 15:45:15.914462 7fd412c6c940 -- 172.17.40.29:6813/12558 >> 172.17.40.32:6816/12882 pipe(0xea1f90 sd=98 pgs=61 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256161-2011-02-17 15:45:15.914479 7fd40f838940 -- 172.17.40.29:6813/12558 >> 172.17.40.23:6804/13532 pipe(0x7fd41c06bc20 sd=119 pgs=71 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256162-2011-02-17 15:45:15.914499 7fd40f838940 -- 172.17.40.29:6813/12558 >> 172.17.40.23:6804/13532 pipe(0x7fd41c06bc20 sd=119 pgs=71 cs=1 l=0).fault 0: Success
osd.68.log-256163-2011-02-17 15:45:15.914519 7fd40f838940 -- 172.17.40.29:6813/12558 >> 172.17.40.23:6804/13532 pipe(0x7fd41c06bc20 sd=119 pgs=71 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256164-2011-02-17 15:45:15.914532 7fd40bbfc940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6817/12882 pipe(0x7fd41c1fad30 sd=136 pgs=119 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256165-2011-02-17 15:45:15.914935 7fd40bbfc940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6817/12882 pipe(0x7fd41c1fad30 sd=136 pgs=119 cs=1 l=0).fault 0: Success
osd.68.log-256166-2011-02-17 15:45:15.914957 7fd40bbfc940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6817/12882 pipe(0x7fd41c1fad30 sd=136 pgs=119 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256167-2011-02-17 15:45:15.914971 7fd40b6f7940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6817/13853 pipe(0x7fd41c186ba0 sd=137 pgs=132 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256168-2011-02-17 15:45:15.915453 7fd40b6f7940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6817/13853 pipe(0x7fd41c186ba0 sd=137 pgs=132 cs=1 l=0).fault 0: Success
osd.68.log-256169-2011-02-17 15:45:15.915467 7fd41a6e6940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6808/12993 pipe(0xe14c20 sd=37 pgs=90 cs=1 l=0).fault 0: Success
osd.68.log-256170-2011-02-17 15:45:15.915507 7fd41a6e6940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6808/12993 pipe(0xe14c20 sd=37 pgs=90 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256171-2011-02-17 15:45:15.915542 7fd421241940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6808/11429 pipe(0xc63000 sd=15 pgs=136 cs=1 l=0).fault 107: Transport endpoint is not connected
osd.68.log-256172-2011-02-17 15:45:15.915574 7fd421241940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6808/11429 pipe(0xc63000 sd=15 pgs=136 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256173-2011-02-17 15:45:15.915644 7fd421140940 -- 172.17.40.29:6813/12558 >> 172.17.40.29:6810/12473 pipe(0xcb0300 sd=14 pgs=1 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256174-2011-02-17 15:45:15.915778 7fd421140940 -- 172.17.40.29:6813/12558 >> 172.17.40.29:6810/12473 pipe(0xcb0300 sd=14 pgs=1 cs=1 l=0).fault 0: Success
osd.68.log-256175-2011-02-17 15:45:15.915826 7fd421140940 -- 172.17.40.29:6813/12558 >> 172.17.40.29:6810/12473 pipe(0xcb0300 sd=14 pgs=1 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256176-2011-02-17 15:45:15.915927 7fd40e929940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6802/13458 pipe(0x7fd41c228c20 sd=135 pgs=121 cs=1 l=0).fault 0: Success
osd.68.log-256177-2011-02-17 15:45:15.915964 7fd40e929940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6802/13458 pipe(0x7fd41c228c20 sd=135 pgs=121 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256178-2011-02-17 15:45:15.915994 7fd415797940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6805/11346 pipe(0x7fd41c28fc20 sd=60 pgs=120 cs=1 l=0).fault 0: Success
osd.68.log-256179-2011-02-17 15:45:15.916021 7fd415797940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6805/11346 pipe(0x7fd41c28fc20 sd=60 pgs=120 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256180-2011-02-17 15:45:15.916283 7fd413676940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6811/11754 pipe(0x7fd41c265c20 sd=41 pgs=112 cs=1 l=0).fault 0: Success
osd.68.log-256181-2011-02-17 15:45:15.916313 7fd413676940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6811/11754 pipe(0x7fd41c265c20 sd=41 pgs=112 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256182-2011-02-17 15:45:15.916456 7fd415595940 -- 172.17.40.29:6813/12558 >> 172.17.40.23:6807/13607 pipe(0x7fd41c033c20 sd=120 pgs=77 cs=1 l=0).fault 0: Success
osd.68.log-256183-2011-02-17 15:45:15.916480 7fd415595940 -- 172.17.40.29:6813/12558 >> 172.17.40.23:6807/13607 pipe(0x7fd41c033c20 sd=120 pgs=77 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256184-2011-02-17 15:45:15.918620 7fd4197d7940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6823/12076 pipe(0xcafc20 sd=45 pgs=94 cs=1 l=0).fault 0: Success
osd.68.log-256185-2011-02-17 15:45:15.920192 7fd4197d7940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6823/12076 pipe(0xcafc20 sd=45 pgs=94 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256186-2011-02-17 15:45:15.920214 7fd41aaea940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6807/12993 pipe(0xdd4c20 sd=35 pgs=54 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256187-2011-02-17 15:45:15.920234 7fd41aaea940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6807/12993 pipe(0xdd4c20 sd=35 pgs=54 cs=1 l=0).fault 0: Success
osd.68.log-256188-2011-02-17 15:45:15.920254 7fd41aaea940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6807/12993 pipe(0xdd4c20 sd=35 pgs=54 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256189-2011-02-17 15:45:15.920276 7fd40ca0a940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6808/12636 pipe(0x7fd41c32ac20 sd=107 pgs=120 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256190-2011-02-17 15:45:15.920303 7fd40ca0a940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6808/12636 pipe(0x7fd41c32ac20 sd=107 pgs=120 cs=1 l=0).fault 0: Success
osd.68.log-256191-2011-02-17 15:45:15.920323 7fd40ca0a940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6808/12636 pipe(0x7fd41c32ac20 sd=107 pgs=120 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256192-2011-02-17 15:45:15.924700 7fd4176b6940 -- 172.17.40.29:6813/12558 >> 172.17.40.25:6804/12657 pipe(0xee10c0 sd=61 pgs=61 cs=1 l=0).fault 0: Success
osd.68.log-256193-2011-02-17 15:45:15.924802 7fd4176b6940 -- 172.17.40.29:6813/12558 >> 172.17.40.25:6804/12657 pipe(0xee10c0 sd=61 pgs=61 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256194-2011-02-17 15:45:15.924854 7fd412565940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6819/12848 pipe(0xd35b90 sd=101 pgs=58 cs=1 l=0).fault 0: Success
osd.68.log-256195-2011-02-17 15:45:15.924875 7fd412565940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6819/12848 pipe(0xd35b90 sd=101 pgs=58 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256196-2011-02-17 15:45:15.925126 7fd4015d5940 -- 172.17.40.29:6812/12558 >> 172.17.40.50:0/2223503458 pipe(0xec9940 sd=185 pgs=40 cs=1 l=1).fault 0: Success
osd.68.log-256197-2011-02-17 15:45:15.926469 7fd4171b1940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6817/11926 pipe(0xeef000 sd=64 pgs=84 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256198-2011-02-17 15:45:18.465361 7fd40b7f8940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6814/12798 pipe(0x7fd41c3c0d30 sd=152 pgs=116 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256199-2011-02-17 15:45:18.472367 7fd40b7f8940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6814/12798 pipe(0x7fd41c3c0d30 sd=152 pgs=116 cs=1 l=0).fault 0: Success
osd.68.log-256200-2011-02-17 15:45:18.472401 7fd40b7f8940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6814/12798 pipe(0x7fd41c3c0d30 sd=152 pgs=116 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256201-2011-02-17 15:45:18.472450 7fd418fcf940 -- 172.17.40.29:6813/12558 >> 172.17.40.28:6810/12265 pipe(0xf1b050 sd=51 pgs=64 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256202-2011-02-17 15:45:18.472472 7fd418fcf940 -- 172.17.40.29:6813/12558 >> 172.17.40.28:6810/12265 pipe(0xf1b050 sd=51 pgs=64 cs=1 l=0).fault 0: Success
osd.68.log-256203-2011-02-17 15:45:18.472523 7fd418fcf940 -- 172.17.40.29:6813/12558 >> 172.17.40.28:6810/12265 pipe(0xf1b050 sd=51 pgs=64 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256204-2011-02-17 15:45:18.472547 7fd411a5a940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6811/12265 pipe(0x7fd41c169cb0 sd=77 pgs=129 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256205-2011-02-17 15:45:18.472580 7fd411a5a940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6811/12265 pipe(0x7fd41c169cb0 sd=77 pgs=129 cs=1 l=0).fault 0: Success
osd.68.log-256206-2011-02-17 15:45:18.472602 7fd411a5a940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6811/12265 pipe(0x7fd41c169cb0 sd=77 pgs=129 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256207-2011-02-17 15:45:18.472615 7fd40b9fa940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6823/13041 pipe(0x7fd41c1ceb60 sd=138 pgs=111 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256208-2011-02-17 15:45:18.474761 7fd40b9fa940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6823/13041 pipe(0x7fd41c1ceb60 sd=138 pgs=111 cs=1 l=0).fault 0: Success
osd.68.log-256209-2011-02-17 15:45:18.474810 7fd40b9fa940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6823/13041 pipe(0x7fd41c1ceb60 sd=138 pgs=111 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256210-2011-02-17 15:45:18.477408 7fd42654e940 -- 172.17.40.29:6812/12558 <== client4240 172.17.40.73:0/3624355571 1 ==== osd_op(client4240.1:111 10000004e34.0000006e [write 0~4194304 [1@-1]] 0.a156 snapc 1=[]) ==== 128+0+4194304 (2756195295 0 0) 0xe43f80 con 0x7fd3fc001c00
osd.68.log-256211-2011-02-17 15:45:18.479547 7fd413c7c940 -- 172.17.40.29:6814/12558 >> 172.17.40.27:6823/13093 pipe(0x7fd41c15fc30 sd=90 pgs=97 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256212-2011-02-17 15:45:18.479635 7fd40e626940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6804/11346 pipe(0x7fd41c13cc20 sd=113 pgs=72 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256213-2011-02-17 15:45:18.479736 7fd40e626940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6804/11346 pipe(0x7fd41c13cc20 sd=113 pgs=72 cs=1 l=0).fault 0: Success
osd.68.log-256214-2011-02-17 15:45:18.479759 7fd40e626940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6804/11346 pipe(0x7fd41c13cc20 sd=113 pgs=72 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256215-2011-02-17 15:45:18.479776 7fd420d3c940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6807/11429 pipe(0xc63870 sd=17 pgs=4 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256216-2011-02-17 15:45:18.479805 7fd420d3c940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6807/11429 pipe(0xc63870 sd=17 pgs=4 cs=1 l=0).fault 0: Success
osd.68.log-256217-2011-02-17 15:45:18.479827 7fd420d3c940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6807/11429 pipe(0xc63870 sd=17 pgs=4 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256218-2011-02-17 15:45:18.479846 7fd419cdc940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6822/12924 pipe(0xe8fc20 sd=42 pgs=58 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256219-2011-02-17 15:45:18.479867 7fd419cdc940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6822/12924 pipe(0xe8fc20 sd=42 pgs=58 cs=1 l=0).fault 0: Success
osd.68.log-256220-2011-02-17 15:45:18.479898 7fd419cdc940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6822/12924 pipe(0xe8fc20 sd=42 pgs=58 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256221-2011-02-17 15:45:18.479915 7fd40f333940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6819/13310 pipe(0x7fd41c007c20 sd=117 pgs=75 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256222-2011-02-17 15:45:18.479936 7fd40f333940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6819/13310 pipe(0x7fd41c007c20 sd=117 pgs=75 cs=1 l=0).fault 0: Success
osd.68.log-256223-2011-02-17 15:45:18.479955 7fd40f333940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6819/13310 pipe(0x7fd41c007c20 sd=117 pgs=75 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256224-2011-02-17 15:45:18.479973 7fd41a2e2940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6804/11594 pipe(0x7fd41c154c20 sd=122 pgs=71 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256225-2011-02-17 15:45:18.480025 7fd41a2e2940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6804/11594 pipe(0x7fd41c154c20 sd=122 pgs=71 cs=1 l=0).fault 0: Success
osd.68.log-256226-2011-02-17 15:45:18.480044 7fd41a2e2940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6804/11594 pipe(0x7fd41c154c20 sd=122 pgs=71 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256227-2011-02-17 15:45:18.480074 7fd40d212940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6823/12924 pipe(0x7fd41c3fcc30 sd=104 pgs=120 cs=1 l=0).reader couldn't read tag, Success
osd.68.log-256228-2011-02-17 15:45:18.480107 7fd40d212940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6823/12924 pipe(0x7fd41c3fcc30 sd=104 pgs=120 cs=1 l=0).fault 0: Success
osd.68.log-256229-2011-02-17 15:45:18.480127 7fd40d212940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6823/12924 pipe(0x7fd41c3fcc30 sd=104 pgs=120 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256230-2011-02-17 15:45:18.480154 7fd427550940 osd68 5 pg[0.7fc( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [3,68] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x7fd404067010 op osd_sub_op(client4258.1:35 0.7fc 10000008109.00000022/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3
osd.68.log-256231-2011-02-17 15:45:18.480271 7fd427550940 osd68 5 pg[0.680( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [90,68] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x7fd404031e40 op osd_sub_op(client4224.1:33 0.680 1000000ac0c.00000020/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3
osd.68.log-256232-2011-02-17 15:45:18.480310 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] op_applied repgather(0x7fd3fc0eb480 applying 5'2 rep_tid=174 wfack=27,68 wfdisk=27 op=osd_op(client4244.1:126 1000000dee1.0000007d [write 0~4194304 [1@-1]] 0.588 snapc 1=[]))
osd.68.log-256233-2011-02-17 15:45:18.480356 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] op_applied mode was rmw(wr=1)
osd.68.log-256234-2011-02-17 15:45:18.480371 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] op_applied mode now idle(wr=0 WAKE) (finish_write)
osd.68.log-256235-2011-02-17 15:45:18.480384 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] put_object_context 1000000dee1.0000007d/head 1 -> 0
osd.68.log-256236-2011-02-17 15:45:18.480399 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] put_snapset_context 1000000dee1.0000007d 1 -> 0
osd.68.log-256237-2011-02-17 15:45:18.480435 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] update_stats 3'13
osd.68.log-256238-2011-02-17 15:45:18.480451 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] eval_repop repgather(0x7fd3fc0eb480 applied 5'2 rep_tid=174 wfack=27 wfdisk=27 op=osd_op(client4244.1:126 1000000dee1.0000007d [write 0~4194304 [1@-1]] 0.588 snapc 1=[])) wants=d
osd.68.log-256239-2011-02-17 15:45:18.480471 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] op_applied repgather(0x7fd404013bf0 applying 5'2 rep_tid=175 wfack=31,68 wfdisk=31 op=osd_op(client4220.1:59 1000000ea9c.0000003a [write 0~4194304 [1@-1]] 0.cb03 snapc 1=[]))
osd.68.log-256240-2011-02-17 15:45:18.480497 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] op_applied mode was rmw(wr=1)
osd.68.log-256241-2011-02-17 15:45:18.480510 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] op_applied mode now idle(wr=0 WAKE) (finish_write)
osd.68.log-256242-2011-02-17 15:45:18.480523 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] put_object_context 1000000ea9c.0000003a/head 1 -> 0
osd.68.log-256243-2011-02-17 15:45:18.480536 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] put_snapset_context 1000000ea9c.0000003a 1 -> 0
osd.68.log-256244-2011-02-17 15:45:18.480555 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] update_stats 3'13
osd.68.log-256245-2011-02-17 15:45:18.480593 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] eval_repop repgather(0x7fd404013bf0 applied 5'2 rep_tid=175 wfack=31 wfdisk=31 op=osd_op(client4220.1:59 1000000ea9c.0000003a [write 0~4194304 [1@-1]] 0.cb03 snapc 1=[])) wants=d
osd.68.log-256246-2011-02-17 15:45:18.480612 7fd427550940 osd68 5 pg[0.b45( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [76,68] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x7fd3fc031200 op osd_sub_op(client4221.1:17 0.b45 1000000aff5.00000010/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3
osd.68.log-256247-2011-02-17 15:45:18.480652 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] op_applied repgather(0x7fd404021370 applying 5'2 rep_tid=176 wfack=55,68 wfdisk=55 op=osd_op(client4246.1:15 10000007165.0000000e [write 0~4194304 [1@-1]] 0.4bcb snapc 1=[]))
osd.68.log-256248-2011-02-17 15:45:18.480681 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] op_applied mode was rmw(wr=1)
osd.68.log-256249-2011-02-17 15:45:18.480694 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] op_applied mode now idle(wr=0 WAKE) (finish_write)
osd.68.log-256250-2011-02-17 15:45:18.480706 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] put_object_context 10000007165.0000000e/head 1 -> 0
osd.68.log-256251-2011-02-17 15:45:18.480719 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] put_snapset_context 10000007165.0000000e 1 -> 0
osd.68.log-256252-2011-02-17 15:45:18.480738 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] update_stats 3'13
osd.68.log-256253-2011-02-17 15:45:18.480752 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] eval_repop repgather(0x7fd404021370 applied 5'2 rep_tid=176 wfack=55 wfdisk=55 op=osd_op(client4246.1:15 10000007165.0000000e [write 0~4194304 [1@-1]] 0.4bcb snapc 1=[])) wants=d
osd.68.log-256254-2011-02-17 15:45:18.480770 7fd427550940 osd68 5 pg[0.36b( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [21,68] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x7fd3fc0a9050 op osd_sub_op(client4213.1:82 0.36b 10000005dd8.00000051/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3
osd.68.log-256255-2011-02-17 15:45:18.480811 7fd427550940 osd68 5 pg[0.38d( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [5,68] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x7fd40403f770 op osd_sub_op(client4256.1:52 0.38d 1000000b7c7.00000033/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3
osd.68.log-256256-2011-02-17 15:45:18.480901 7fd40db1b940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6820/13310 pipe(0xf5dc30 sd=133 pgs=123 cs=1 l=0).reader couldn't read tag, Transport endpoint is not connected
osd.68.log-256257-2011-02-17 15:45:18.480929 7fd40db1b940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6820/13310 pipe(0xf5dc30 sd=133 pgs=123 cs=1 l=0).fault 107: Transport endpoint is not connected
osd.68.log-256258-2011-02-17 15:45:18.480949 7fd40db1b940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6820/13310 pipe(0xf5dc30 sd=133 pgs=123 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256259-2011-02-17 15:45:18.481001 7fd4167a7940 -- 172.17.40.29:6814/12558 >> 172.17.40.26:6805/11056 pipe(0xf06700 sd=68 pgs=90 cs=1 l=0).fault 0: Success
osd.68.log-256260-2011-02-17 15:45:18.481037 7fd4167a7940 -- 172.17.40.29:6814/12558 >> 172.17.40.26:6805/11056 pipe(0xf06700 sd=68 pgs=90 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256261-2011-02-17 15:45:18.481059 7fd413979940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6817/13237 pipe(0x7fd41c1fcc00 sd=34 pgs=124 cs=1 l=0).fault 0: Success
osd.68.log-256262-2011-02-17 15:45:18.481085 7fd413979940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6817/13237 pipe(0x7fd41c1fcc00 sd=34 pgs=124 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256263-2011-02-17 15:45:18.481107 7fd40d919940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6823/11833 pipe(0x7fd41c37cc20 sd=146 pgs=116 cs=1 l=0).fault 0: Success
osd.68.log-256264-2011-02-17 15:45:18.481139 7fd40d919940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6823/11833 pipe(0x7fd41c37cc20 sd=146 pgs=116 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256265-2011-02-17 15:45:18.481174 7fd4182c2940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6802/12830 pipe(0xebaaf0 sd=56 pgs=94 cs=1 l=0).fault 0: Success
osd.68.log-256266-2011-02-17 15:45:18.481207 7fd4182c2940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6802/12830 pipe(0xebaaf0 sd=56 pgs=94 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256267-2011-02-17 15:45:18.481228 7fd41b3f3940 -- 172.17.40.29:6814/12558 >> 172.17.40.25:6802/12569 pipe(0xd2bd50 sd=30 pgs=87 cs=1 l=0).fault 0: Success
osd.68.log-256268-2011-02-17 15:45:18.481262 7fd41b3f3940 -- 172.17.40.29:6814/12558 >> 172.17.40.25:6802/12569 pipe(0xd2bd50 sd=30 pgs=87 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256269-2011-02-17 15:45:18.481284 7fd41a4e4940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6811/13703 pipe(0xe40d50 sd=39 pgs=89 cs=1 l=0).fault 0: Success
osd.68.log-256270-2011-02-17 15:45:18.481316 7fd41a4e4940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6811/13703 pipe(0xe40d50 sd=39 pgs=89 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256271-2011-02-17 15:45:18.481337 7fd4194d4940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6808/13607 pipe(0x7fd41c1f6be0 sd=26 pgs=134 cs=1 l=0).fault 0: Success
osd.68.log-256272-2011-02-17 15:45:18.481372 7fd4194d4940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6808/13607 pipe(0x7fd41c1f6be0 sd=26 pgs=134 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256273-2011-02-17 15:45:18.481393 7fd414282940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6811/13069 pipe(0x7fd41c053c20 sd=88 pgs=97 cs=1 l=0).fault 0: Success
osd.68.log-256274-2011-02-17 15:45:18.481426 7fd414282940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6811/13069 pipe(0x7fd41c053c20 sd=88 pgs=97 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256275-2011-02-17 15:45:18.481591 7fd412767940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6823/14021 pipe(0x7fd41c1b7c30 sd=100 pgs=87 cs=1 l=0).fault 0: Success
osd.68.log-256276-2011-02-17 15:45:18.481633 7fd412767940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6823/14021 pipe(0x7fd41c1b7c30 sd=100 pgs=87 cs=1 l=0).fault with nothing to send, going to standby
osd.68.log-256277-2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status
osd.68.log-256278-2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done


See anything useful in there?

Let me know if there's anything I can do to get you more information about this.

> 
> Thanks for looking at this!

No problem :)

-- Jim

> sage
> 
> 
> 
> > 
> > -- Jim
> > 
> > > 
> > > sage
> > > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-17 23:31             ` Jim Schutt
@ 2011-02-18  7:13               ` Sage Weil
  2011-02-18 17:04                 ` Jim Schutt
                                   ` (3 more replies)
  2011-03-09 16:02               ` Jim Schutt
  1 sibling, 4 replies; 94+ messages in thread
From: Sage Weil @ 2011-02-18  7:13 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Thu, 17 Feb 2011, Jim Schutt wrote:
> Check out the result:
> 
> osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick
> osd.68.log:256028:2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock
> osd.68.log:256029:2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock
> osd.68.log:256031:2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report
> osd.68.log:256032:2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs
> osd.68.log:256033:2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient
> osd.68.log:256034:2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick     <==
> osd.68.log:256277:2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status <== 28 second gap
> osd.68.log:256278:2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done
> osd.68.log:256279:2011-02-17 15:45:18.481691 7fd42ad57940 osd68 5 tick
> osd.68.log:256280:2011-02-17 15:45:18.481705 7fd42ad57940 osd68 5 tick getting read lock on map_lock
> osd.68.log:256281:2011-02-17 15:45:18.481712 7fd42ad57940 osd68 5 tick got read lock on map_lock
> osd.68.log:256688:2011-02-17 15:45:20.010705 7fd42ad57940 osd68 5 tick sending mon report
> osd.68.log:256753:2011-02-17 15:45:20.012950 7fd42ad57940 osd68 5 tick removing stray pgs
> osd.68.log:256754:2011-02-17 15:45:20.012959 7fd42ad57940 osd68 5 tick sending log to logclient
> osd.68.log:256755:2011-02-17 15:45:20.012965 7fd42ad57940 osd68 5 tick arming timer for next tick
> osd.68.log:256756:2011-02-17 15:45:20.012976 7fd42ad57940 osd68 5 tick checking dispatch queue status
> osd.68.log:256757:2011-02-17 15:45:20.012993 7fd42ad57940 osd68 5 tick done
> 
> Why should it take 28 seconds to add a new timer event?

Huh.. that is pretty weird.  I see multiple sync in there, too, so it's 
not like something was somehow blocking on a btrfs commit.

Anybody else have ideas?  :/

sage

> 
> Here's the full log spanning that 28 second gap in tick():
> 
> osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick
> osd.68.log-256028-2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock
> osd.68.log-256029-2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock
> osd.68.log-256030-2011-02-17 15:44:50.141540 7fd42ad57940 osd68 5 scrub_should_schedule loadavg 32.73 >= max 1.25 = no, load too high
> osd.68.log-256031-2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report
> osd.68.log-256032-2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs
> osd.68.log-256033-2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient
> osd.68.log-256034-2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick
> osd.68.log-256035-2011-02-17 15:44:52.813591 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry woke after 5.002862
> osd.68.log-256036-2011-02-17 15:44:52.813653 7fd429554940 journal commit_start op_seq 3082, applied_seq 3082, committed_seq 3069
> osd.68.log-256037-2011-02-17 15:44:52.813661 7fd429554940 journal commit_start committing 3082, still blocked
> osd.68.log-256038-2011-02-17 15:44:52.813666 7fd429554940 journal commit_start
> osd.68.log-256039-2011-02-17 15:44:52.813672 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry committing 3082 sync_epoch 48
> osd.68.log-256040-2011-02-17 15:44:52.813945 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) taking async snap 'snap_3082'
> osd.68.log-256041-2011-02-17 15:44:53.693394 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) async snap create 'snap_3082' transid 65 got 0 Success
> osd.68.log-256042-2011-02-17 15:44:53.693437 7fd429554940 journal commit_started committing 3082, unblocking
> osd.68.log-256043-2011-02-17 15:44:53.693445 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068)  waiting for transid 65 to complete
> osd.68.log-256044-2011-02-17 15:44:53.751214 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068)  done waiting for transid 65 to complete
> osd.68.log-256045-2011-02-17 15:44:53.751286 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry commit took 0.937614
> osd.68.log-256046-2011-02-17 15:44:53.751295 7fd429554940 journal commit_finish thru 3082
> osd.68.log-256047-2011-02-17 15:44:55.793734 7fd42554c940 -- 172.17.40.29:6814/12558 <== osd29 172.17.40.24:6817/11926 286 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (4012807742 0 0) 0x7fd404061cf0 con 0xeef270
> osd.68.log-256048-2011-02-17 15:44:55.793804 7fd42554c940 osd68 5 heartbeat_dispatch 0x7fd404061cf0
> osd.68.log-256049-2011-02-17 15:44:55.793811 7fd42554c940 osd68 5 handle_osd_ping from osd29 got stat stat(2011-02-17 15:44:49.760084 oprate=7.39631 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
> osd.68.log-256050-2011-02-17 15:44:55.936072 7fd42a556940 journal write_thread_entry going to sleep
> osd.68.log-256051-2011-02-17 15:44:55.936131 7fd429554940 journal committed_thru 3082 (last_committed_seq 3069)
> osd.68.log-256052-2011-02-17 15:44:55.936142 7fd429554940 journal header: block_size 4096 alignment 4096 max_size 526385152
> osd.68.log-256053-2011-02-17 15:44:55.936149 7fd429554940 journal header: start 499519488
> osd.68.log-256054-2011-02-17 15:44:55.936154 7fd429554940 journal  write_pos 499519488
> osd.68.log-256055-2011-02-17 15:44:55.936160 7fd429554940 journal committed_thru done
> osd.68.log-256056-2011-02-17 15:44:55.936175 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) removing snap 'snap_3038'
> osd.68.log-256057-2011-02-17 15:44:56.030632 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry committed to op_seq 3082
> osd.68.log-256058-2011-02-17 15:44:56.030670 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry waiting for max_interval 5.000000
> osd.68.log-256059-2011-02-17 15:44:57.763019 7fd425d4d940 -- 172.17.40.29:6813/12558 <== osd31 172.17.40.24:6822/12076 34 ==== osd_sub_op_reply(client4196.1:143 0.b03 100000003e9.0000008e/head [] ack = 0) v1 ==== 127+0+0 (603260922 0 0) 0x7fd41c005fe0 con 0xe2ec20
> osd.68.log-256060-2011-02-17 15:45:01.030890 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry woke after 5.000218
> osd.68.log-256061-2011-02-17 15:45:01.030964 7fd429554940 journal commit_start op_seq 3082, applied_seq 3082, committed_seq 3082
> osd.68.log-256062-2011-02-17 15:45:01.030970 7fd429554940 journal commit_start nothing to do
> osd.68.log-256063-2011-02-17 15:45:01.030976 7fd429554940 journal commit_start
> osd.68.log-256064-2011-02-17 15:45:01.030986 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry waiting for max_interval 5.000000
> osd.68.log-256065-2011-02-17 15:45:06.031075 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry woke after 5.000087
> osd.68.log-256066-2011-02-17 15:45:06.031104 7fd429554940 journal commit_start op_seq 3082, applied_seq 3082, committed_seq 3082
> osd.68.log-256067-2011-02-17 15:45:06.031114 7fd429554940 journal commit_start nothing to do
> osd.68.log-256068-2011-02-17 15:45:06.031122 7fd429554940 journal commit_start
> osd.68.log-256069-2011-02-17 15:45:06.031134 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry waiting for max_interval 5.000000
> osd.68.log-256070-2011-02-17 15:45:10.642177 7fd40adee940 -- 172.17.40.29:6812/12558 >> 172.17.40.58:0/3091507398 pipe(0x7fd3fc0ed010 sd=214 pgs=90 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256071-2011-02-17 15:45:10.642274 7fd40adee940 -- 172.17.40.29:6812/12558 >> 172.17.40.58:0/3091507398 pipe(0x7fd3fc0ed010 sd=214 pgs=90 cs=1 l=1).fault 0: Success
> osd.68.log-256072-2011-02-17 15:45:10.642431 7fd40aff0940 -- 172.17.40.29:6812/12558 >> 172.17.40.74:0/2165351138 pipe(0xdbac20 sd=156 pgs=6 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256073-2011-02-17 15:45:10.642461 7fd40aff0940 -- 172.17.40.29:6812/12558 >> 172.17.40.74:0/2165351138 pipe(0xdbac20 sd=156 pgs=6 cs=1 l=1).fault 0: Success
> osd.68.log-256074-2011-02-17 15:45:10.642606 7fd3fb0f2940 -- 172.17.40.29:6812/12558 >> 172.17.40.57:0/579985139 pipe(0x105ec30 sd=198 pgs=65 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256075-2011-02-17 15:45:10.642670 7fd3fb0f2940 -- 172.17.40.29:6812/12558 >> 172.17.40.57:0/579985139 pipe(0x105ec30 sd=198 pgs=65 cs=1 l=1).fault 0: Success
> osd.68.log-256076-2011-02-17 15:45:10.642790 7fd408cd9940 -- 172.17.40.29:6812/12558 >> 172.17.40.65:0/2115694822 pipe(0x1032d20 sd=161 pgs=10 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256077-2011-02-17 15:45:10.642818 7fd408cd9940 -- 172.17.40.29:6812/12558 >> 172.17.40.65:0/2115694822 pipe(0x1032d20 sd=161 pgs=10 cs=1 l=1).fault 0: Success
> osd.68.log-256078-2011-02-17 15:45:10.643238 7fd3faff1940 -- 172.17.40.29:6812/12558 >> 172.17.40.66:0/892415451 pipe(0xc7fe50 sd=199 pgs=69 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256079-2011-02-17 15:45:10.643268 7fd3faff1940 -- 172.17.40.29:6812/12558 >> 172.17.40.66:0/892415451 pipe(0xc7fe50 sd=199 pgs=69 cs=1 l=1).fault 0: Success
> osd.68.log-256080-2011-02-17 15:45:10.643291 7fd4025e5940 -- 172.17.40.29:6812/12558 >> 172.17.40.70:0/2135188772 pipe(0xf88c20 sd=176 pgs=44 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256081-2011-02-17 15:45:10.643362 7fd4025e5940 -- 172.17.40.29:6812/12558 >> 172.17.40.70:0/2135188772 pipe(0xf88c20 sd=176 pgs=44 cs=1 l=1).fault 0: Success
> osd.68.log-256082-2011-02-17 15:45:10.643507 7fd40a4e9940 -- 172.17.40.29:6812/12558 >> 172.17.40.62:0/59132854 pipe(0xd72cf0 sd=157 pgs=5 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256083-2011-02-17 15:45:10.643622 7fd40a4e9940 -- 172.17.40.29:6812/12558 >> 172.17.40.62:0/59132854 pipe(0xd72cf0 sd=157 pgs=5 cs=1 l=1).fault 0: Success
> osd.68.log-256084-2011-02-17 15:45:10.643870 7fd409ce3940 -- 172.17.40.29:6812/12558 >> 172.17.40.84:0/103596728 pipe(0xca56a0 sd=159 pgs=5 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256085-2011-02-17 15:45:10.643902 7fd409ce3940 -- 172.17.40.29:6812/12558 >> 172.17.40.84:0/103596728 pipe(0xca56a0 sd=159 pgs=5 cs=1 l=1).fault 0: Success
> osd.68.log-256086-2011-02-17 15:45:10.644044 7fd3f88d2940 -- 172.17.40.29:6812/12558 >> 172.17.40.77:0/3894807147 pipe(0x7fd3fc008fc0 sd=210 pgs=83 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256087-2011-02-17 15:45:10.644084 7fd3f88d2940 -- 172.17.40.29:6812/12558 >> 172.17.40.77:0/3894807147 pipe(0x7fd3fc008fc0 sd=210 pgs=83 cs=1 l=1).fault 0: Success
> osd.68.log-256088-2011-02-17 15:45:10.644274 7fd4009c9940 -- 172.17.40.29:6812/12558 >> 172.17.40.63:0/118294342 pipe(0xd53c20 sd=190 pgs=44 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256089-2011-02-17 15:45:10.644302 7fd4009c9940 -- 172.17.40.29:6812/12558 >> 172.17.40.63:0/118294342 pipe(0xd53c20 sd=190 pgs=44 cs=1 l=1).fault 0: Success
> osd.68.log-256090-2011-02-17 15:45:10.644446 7fd40abec940 -- 172.17.40.29:6812/12558 >> 172.17.40.76:0/3517121089 pipe(0x7fd3fc5de370 sd=215 pgs=86 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256091-2011-02-17 15:45:10.644509 7fd40abec940 -- 172.17.40.29:6812/12558 >> 172.17.40.76:0/3517121089 pipe(0x7fd3fc5de370 sd=215 pgs=86 cs=1 l=1).fault 0: Success
> osd.68.log-256092-2011-02-17 15:45:10.644602 7fd3f9bdd940 -- 172.17.40.29:6812/12558 >> 172.17.40.55:0/1343812630 pipe(0x7fd3fc008250 sd=209 pgs=75 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256093-2011-02-17 15:45:10.644636 7fd3f9bdd940 -- 172.17.40.29:6812/12558 >> 172.17.40.55:0/1343812630 pipe(0x7fd3fc008250 sd=209 pgs=75 cs=1 l=1).fault 0: Success
> osd.68.log-256094-2011-02-17 15:45:10.645375 7fd401bdb940 -- 172.17.40.29:6812/12558 >> 172.17.40.72:0/658453639 pipe(0xe3ad20 sd=182 pgs=28 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256095-2011-02-17 15:45:10.645394 7fd401bdb940 -- 172.17.40.29:6812/12558 >> 172.17.40.72:0/658453639 pipe(0xe3ad20 sd=182 pgs=28 cs=1 l=1).fault 0: Success
> osd.68.log-256096-2011-02-17 15:45:10.645714 7fd4033f3940 -- 172.17.40.29:6812/12558 >> 172.17.40.85:0/2266176512 pipe(0x1028c50 sd=169 pgs=9 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256097-2011-02-17 15:45:10.645740 7fd4033f3940 -- 172.17.40.29:6812/12558 >> 172.17.40.85:0/2266176512 pipe(0x1028c50 sd=169 pgs=9 cs=1 l=1).fault 0: Success
> osd.68.log-256098-2011-02-17 15:45:10.646231 7fd3fa0e2940 -- 172.17.40.29:6812/12558 >> 172.17.40.103:0/1723900743 pipe(0x7fd3fc002d70 sd=206 pgs=85 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256099-2011-02-17 15:45:10.646304 7fd3fa0e2940 -- 172.17.40.29:6812/12558 >> 172.17.40.103:0/1723900743 pipe(0x7fd3fc002d70 sd=206 pgs=85 cs=1 l=1).fault 0: Success
> osd.68.log-256100-2011-02-17 15:45:10.646372 7fd3f42ae940 -- 172.17.40.29:6812/12558 >> 172.17.40.101:0/1743123736 pipe(0x7fd3fc019720 sd=211 pgs=40 cs=1 l=1).reader couldn't read tag, Success
> osd.68.log-256101-2011-02-17 15:45:10.646397 7fd3f42ae940 -- 172.17.40.29:6812/12558 >> 172.17.40.101:0/1743123736 pipe(0x7fd3fc019720 sd=211 pgs=40 cs=1 l=1).fault 0: Success
> osd.68.log-256102-2011-02-17 15:45:10.844357 7fd410747940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6822/11833 pipe(0x7fd41c03bc20 sd=115 pgs=65 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256103-2011-02-17 15:45:10.844395 7fd410747940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6822/11833 pipe(0x7fd41c03bc20 sd=115 pgs=65 cs=1 l=0).fault 0: Success
> osd.68.log-256104-2011-02-17 15:45:10.844418 7fd410747940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6822/11833 pipe(0x7fd41c03bc20 sd=115 pgs=65 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256105-2011-02-17 15:45:11.031203 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry woke after 5.000067
> osd.68.log-256106-2011-02-17 15:45:11.031233 7fd429554940 journal commit_start op_seq 3082, applied_seq 3082, committed_seq 3082
> osd.68.log-256107-2011-02-17 15:45:11.031240 7fd429554940 journal commit_start nothing to do
> osd.68.log-256108-2011-02-17 15:45:11.031246 7fd429554940 journal commit_start
> osd.68.log-256109-2011-02-17 15:45:11.031254 7fd429554940 filestore(/ram/mnt/ceph/data.osd.0068) sync_entry waiting for max_interval 5.000000
> osd.68.log-256110-2011-02-17 15:45:11.193386 7fd4183c3940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6801/12830 pipe(0xeba000 sd=55 pgs=53 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256111-2011-02-17 15:45:11.193424 7fd4183c3940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6801/12830 pipe(0xeba000 sd=55 pgs=53 cs=1 l=0).fault 0: Success
> osd.68.log-256112-2011-02-17 15:45:11.193451 7fd4183c3940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6801/12830 pipe(0xeba000 sd=55 pgs=53 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256113-2011-02-17 15:45:11.291656 7fd4199d9940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6810/12606 pipe(0xec3c20 sd=43 pgs=48 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256114-2011-02-17 15:45:11.291738 7fd4199d9940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6810/12606 pipe(0xec3c20 sd=43 pgs=48 cs=1 l=0).fault 0: Success
> osd.68.log-256115-2011-02-17 15:45:11.291766 7fd4199d9940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6810/12606 pipe(0xec3c20 sd=43 pgs=48 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256116-2011-02-17 15:45:11.699576 7fd40b3f4940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6820/12848 pipe(0x7fd41c095d30 sd=150 pgs=110 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256117-2011-02-17 15:45:15.907490 7fd40b3f4940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6820/12848 pipe(0x7fd41c095d30 sd=150 pgs=110 cs=1 l=0).fault 0: Success
> osd.68.log-256118-2011-02-17 15:45:15.907579 7fd40b3f4940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6820/12848 pipe(0x7fd41c095d30 sd=150 pgs=110 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256119-2011-02-17 15:45:15.907594 7fd416faf940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6804/12907 pipe(0xeef800 sd=65 pgs=59 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256120-2011-02-17 15:45:15.907616 7fd416faf940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6804/12907 pipe(0xeef800 sd=65 pgs=59 cs=1 l=0).fault 0: Success
> osd.68.log-256121-2011-02-17 15:45:15.907637 7fd416faf940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6804/12907 pipe(0xeef800 sd=65 pgs=59 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256122-2011-02-17 15:45:15.907655 7fd417ebe940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6813/13164 pipe(0xed7260 sd=59 pgs=60 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256123-2011-02-17 15:45:15.907675 7fd417ebe940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6813/13164 pipe(0xed7260 sd=59 pgs=60 cs=1 l=0).fault 0: Success
> osd.68.log-256124-2011-02-17 15:45:15.907695 7fd417ebe940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6813/13164 pipe(0xed7260 sd=59 pgs=60 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256125-2011-02-17 15:45:15.907734 7fd412363940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6814/13164 pipe(0x7fd41c116c30 sd=70 pgs=127 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256126-2011-02-17 15:45:15.907763 7fd412363940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6814/13164 pipe(0x7fd41c116c30 sd=70 pgs=127 cs=1 l=0).fault 0: Success
> osd.68.log-256127-2011-02-17 15:45:15.907814 7fd412363940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6814/13164 pipe(0x7fd41c116c30 sd=70 pgs=127 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256128-2011-02-17 15:45:15.907829 7fd420433940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6819/11475 pipe(0xd4fd40 sd=23 pgs=54 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256129-2011-02-17 15:45:15.907850 7fd420433940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6819/11475 pipe(0xd4fd40 sd=23 pgs=54 cs=1 l=0).fault 0: Success
> osd.68.log-256130-2011-02-17 15:45:15.907871 7fd420433940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6819/11475 pipe(0xd4fd40 sd=23 pgs=54 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256131-2011-02-17 15:45:15.909670 7fd40cc0c940 -- 172.17.40.29:6814/12558 >> 172.17.40.25:6805/12657 pipe(0x7fd41c378cc0 sd=82 pgs=111 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256132-2011-02-17 15:45:15.910080 7fd40cc0c940 -- 172.17.40.29:6814/12558 >> 172.17.40.25:6805/12657 pipe(0x7fd41c378cc0 sd=82 pgs=111 cs=1 l=0).fault 0: Success
> osd.68.log-256133-2011-02-17 15:45:15.910108 7fd40cc0c940 -- 172.17.40.29:6814/12558 >> 172.17.40.25:6805/12657 pipe(0x7fd41c378cc0 sd=82 pgs=111 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256134-2011-02-17 15:45:15.910127 7fd411e5e940 -- 172.17.40.29:6814/12558 >> 172.17.40.26:6820/11475 pipe(0x7fd41c161d60 sd=86 pgs=131 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256135-2011-02-17 15:45:15.910194 7fd411e5e940 -- 172.17.40.29:6814/12558 >> 172.17.40.26:6820/11475 pipe(0x7fd41c161d60 sd=86 pgs=131 cs=1 l=0).fault 0: Success
> osd.68.log-256136-2011-02-17 15:45:15.910215 7fd411e5e940 -- 172.17.40.29:6814/12558 >> 172.17.40.26:6820/11475 pipe(0x7fd41c161d60 sd=86 pgs=131 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256137-2011-02-17 15:45:15.910228 7fd420938940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6810/11227 pipe(0xf20c30 sd=20 pgs=57 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256138-2011-02-17 15:45:15.910247 7fd420938940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6810/11227 pipe(0xf20c30 sd=20 pgs=57 cs=1 l=0).fault 0: Success
> osd.68.log-256139-2011-02-17 15:45:15.910266 7fd420938940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6810/11227 pipe(0xf20c30 sd=20 pgs=57 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256140-2011-02-17 15:45:15.910291 7fd410040940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6816/13237 pipe(0x7fd41c31ac20 sd=116 pgs=70 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256141-2011-02-17 15:45:15.910311 7fd410040940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6816/13237 pipe(0x7fd41c31ac20 sd=116 pgs=70 cs=1 l=0).fault 0: Success
> osd.68.log-256142-2011-02-17 15:45:15.910332 7fd410040940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6816/13237 pipe(0x7fd41c31ac20 sd=116 pgs=70 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256143-2011-02-17 15:45:15.910350 7fd40bafb940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6802/12017 pipe(0x7fd41c02db90 sd=143 pgs=129 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256144-2011-02-17 15:45:15.910597 7fd40bafb940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6802/12017 pipe(0x7fd41c02db90 sd=143 pgs=129 cs=1 l=0).fault 0: Success
> osd.68.log-256145-2011-02-17 15:45:15.910618 7fd40bafb940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6802/12017 pipe(0x7fd41c02db90 sd=143 pgs=129 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256146-2011-02-17 15:45:15.910667 7fd40c909940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6820/11999 pipe(0x7fd41c141ce0 sd=110 pgs=106 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256147-2011-02-17 15:45:15.912873 7fd40c909940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6820/11999 pipe(0x7fd41c141ce0 sd=110 pgs=106 cs=1 l=0).fault 0: Success
> osd.68.log-256148-2011-02-17 15:45:15.912926 7fd40c909940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6820/11999 pipe(0x7fd41c141ce0 sd=110 pgs=106 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256149-2011-02-17 15:45:15.912944 7fd41befe940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6822/11561 pipe(0xd70c20 sd=25 pgs=58 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256150-2011-02-17 15:45:15.912965 7fd41befe940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6822/11561 pipe(0xd70c20 sd=25 pgs=58 cs=1 l=0).fault 0: Success
> osd.68.log-256151-2011-02-17 15:45:15.912986 7fd41befe940 -- 172.17.40.29:6813/12558 >> 172.17.40.26:6822/11561 pipe(0xd70c20 sd=25 pgs=58 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256152-2011-02-17 15:45:15.913111 7fd4195d5940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6822/12076 pipe(0xc72d50 sd=46 pgs=58 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256153-2011-02-17 15:45:15.913133 7fd4195d5940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6822/12076 pipe(0xc72d50 sd=46 pgs=58 cs=1 l=0).fault 0: Success
> osd.68.log-256154-2011-02-17 15:45:15.913156 7fd4195d5940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6822/12076 pipe(0xc72d50 sd=46 pgs=58 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256155-2011-02-17 15:45:15.913184 7fd40b8f9940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6820/13948 pipe(0x7fd41c10fb90 sd=142 pgs=129 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256156-2011-02-17 15:45:15.914376 7fd40b8f9940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6820/13948 pipe(0x7fd41c10fb90 sd=142 pgs=129 cs=1 l=0).fault 0: Success
> osd.68.log-256157-2011-02-17 15:45:15.914399 7fd40b8f9940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6820/13948 pipe(0x7fd41c10fb90 sd=142 pgs=129 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256158-2011-02-17 15:45:15.914423 7fd412c6c940 -- 172.17.40.29:6813/12558 >> 172.17.40.32:6816/12882 pipe(0xea1f90 sd=98 pgs=61 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256159-2011-02-17 15:45:15.914443 7fd412c6c940 -- 172.17.40.29:6813/12558 >> 172.17.40.32:6816/12882 pipe(0xea1f90 sd=98 pgs=61 cs=1 l=0).fault 0: Success
> osd.68.log-256160-2011-02-17 15:45:15.914462 7fd412c6c940 -- 172.17.40.29:6813/12558 >> 172.17.40.32:6816/12882 pipe(0xea1f90 sd=98 pgs=61 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256161-2011-02-17 15:45:15.914479 7fd40f838940 -- 172.17.40.29:6813/12558 >> 172.17.40.23:6804/13532 pipe(0x7fd41c06bc20 sd=119 pgs=71 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256162-2011-02-17 15:45:15.914499 7fd40f838940 -- 172.17.40.29:6813/12558 >> 172.17.40.23:6804/13532 pipe(0x7fd41c06bc20 sd=119 pgs=71 cs=1 l=0).fault 0: Success
> osd.68.log-256163-2011-02-17 15:45:15.914519 7fd40f838940 -- 172.17.40.29:6813/12558 >> 172.17.40.23:6804/13532 pipe(0x7fd41c06bc20 sd=119 pgs=71 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256164-2011-02-17 15:45:15.914532 7fd40bbfc940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6817/12882 pipe(0x7fd41c1fad30 sd=136 pgs=119 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256165-2011-02-17 15:45:15.914935 7fd40bbfc940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6817/12882 pipe(0x7fd41c1fad30 sd=136 pgs=119 cs=1 l=0).fault 0: Success
> osd.68.log-256166-2011-02-17 15:45:15.914957 7fd40bbfc940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6817/12882 pipe(0x7fd41c1fad30 sd=136 pgs=119 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256167-2011-02-17 15:45:15.914971 7fd40b6f7940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6817/13853 pipe(0x7fd41c186ba0 sd=137 pgs=132 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256168-2011-02-17 15:45:15.915453 7fd40b6f7940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6817/13853 pipe(0x7fd41c186ba0 sd=137 pgs=132 cs=1 l=0).fault 0: Success
> osd.68.log-256169-2011-02-17 15:45:15.915467 7fd41a6e6940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6808/12993 pipe(0xe14c20 sd=37 pgs=90 cs=1 l=0).fault 0: Success
> osd.68.log-256170-2011-02-17 15:45:15.915507 7fd41a6e6940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6808/12993 pipe(0xe14c20 sd=37 pgs=90 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256171-2011-02-17 15:45:15.915542 7fd421241940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6808/11429 pipe(0xc63000 sd=15 pgs=136 cs=1 l=0).fault 107: Transport endpoint is not connected
> osd.68.log-256172-2011-02-17 15:45:15.915574 7fd421241940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6808/11429 pipe(0xc63000 sd=15 pgs=136 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256173-2011-02-17 15:45:15.915644 7fd421140940 -- 172.17.40.29:6813/12558 >> 172.17.40.29:6810/12473 pipe(0xcb0300 sd=14 pgs=1 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256174-2011-02-17 15:45:15.915778 7fd421140940 -- 172.17.40.29:6813/12558 >> 172.17.40.29:6810/12473 pipe(0xcb0300 sd=14 pgs=1 cs=1 l=0).fault 0: Success
> osd.68.log-256175-2011-02-17 15:45:15.915826 7fd421140940 -- 172.17.40.29:6813/12558 >> 172.17.40.29:6810/12473 pipe(0xcb0300 sd=14 pgs=1 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256176-2011-02-17 15:45:15.915927 7fd40e929940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6802/13458 pipe(0x7fd41c228c20 sd=135 pgs=121 cs=1 l=0).fault 0: Success
> osd.68.log-256177-2011-02-17 15:45:15.915964 7fd40e929940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6802/13458 pipe(0x7fd41c228c20 sd=135 pgs=121 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256178-2011-02-17 15:45:15.915994 7fd415797940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6805/11346 pipe(0x7fd41c28fc20 sd=60 pgs=120 cs=1 l=0).fault 0: Success
> osd.68.log-256179-2011-02-17 15:45:15.916021 7fd415797940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6805/11346 pipe(0x7fd41c28fc20 sd=60 pgs=120 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256180-2011-02-17 15:45:15.916283 7fd413676940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6811/11754 pipe(0x7fd41c265c20 sd=41 pgs=112 cs=1 l=0).fault 0: Success
> osd.68.log-256181-2011-02-17 15:45:15.916313 7fd413676940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6811/11754 pipe(0x7fd41c265c20 sd=41 pgs=112 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256182-2011-02-17 15:45:15.916456 7fd415595940 -- 172.17.40.29:6813/12558 >> 172.17.40.23:6807/13607 pipe(0x7fd41c033c20 sd=120 pgs=77 cs=1 l=0).fault 0: Success
> osd.68.log-256183-2011-02-17 15:45:15.916480 7fd415595940 -- 172.17.40.29:6813/12558 >> 172.17.40.23:6807/13607 pipe(0x7fd41c033c20 sd=120 pgs=77 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256184-2011-02-17 15:45:15.918620 7fd4197d7940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6823/12076 pipe(0xcafc20 sd=45 pgs=94 cs=1 l=0).fault 0: Success
> osd.68.log-256185-2011-02-17 15:45:15.920192 7fd4197d7940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6823/12076 pipe(0xcafc20 sd=45 pgs=94 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256186-2011-02-17 15:45:15.920214 7fd41aaea940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6807/12993 pipe(0xdd4c20 sd=35 pgs=54 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256187-2011-02-17 15:45:15.920234 7fd41aaea940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6807/12993 pipe(0xdd4c20 sd=35 pgs=54 cs=1 l=0).fault 0: Success
> osd.68.log-256188-2011-02-17 15:45:15.920254 7fd41aaea940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6807/12993 pipe(0xdd4c20 sd=35 pgs=54 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256189-2011-02-17 15:45:15.920276 7fd40ca0a940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6808/12636 pipe(0x7fd41c32ac20 sd=107 pgs=120 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256190-2011-02-17 15:45:15.920303 7fd40ca0a940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6808/12636 pipe(0x7fd41c32ac20 sd=107 pgs=120 cs=1 l=0).fault 0: Success
> osd.68.log-256191-2011-02-17 15:45:15.920323 7fd40ca0a940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6808/12636 pipe(0x7fd41c32ac20 sd=107 pgs=120 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256192-2011-02-17 15:45:15.924700 7fd4176b6940 -- 172.17.40.29:6813/12558 >> 172.17.40.25:6804/12657 pipe(0xee10c0 sd=61 pgs=61 cs=1 l=0).fault 0: Success
> osd.68.log-256193-2011-02-17 15:45:15.924802 7fd4176b6940 -- 172.17.40.29:6813/12558 >> 172.17.40.25:6804/12657 pipe(0xee10c0 sd=61 pgs=61 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256194-2011-02-17 15:45:15.924854 7fd412565940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6819/12848 pipe(0xd35b90 sd=101 pgs=58 cs=1 l=0).fault 0: Success
> osd.68.log-256195-2011-02-17 15:45:15.924875 7fd412565940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6819/12848 pipe(0xd35b90 sd=101 pgs=58 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256196-2011-02-17 15:45:15.925126 7fd4015d5940 -- 172.17.40.29:6812/12558 >> 172.17.40.50:0/2223503458 pipe(0xec9940 sd=185 pgs=40 cs=1 l=1).fault 0: Success
> osd.68.log-256197-2011-02-17 15:45:15.926469 7fd4171b1940 -- 172.17.40.29:6814/12558 >> 172.17.40.24:6817/11926 pipe(0xeef000 sd=64 pgs=84 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256198-2011-02-17 15:45:18.465361 7fd40b7f8940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6814/12798 pipe(0x7fd41c3c0d30 sd=152 pgs=116 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256199-2011-02-17 15:45:18.472367 7fd40b7f8940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6814/12798 pipe(0x7fd41c3c0d30 sd=152 pgs=116 cs=1 l=0).fault 0: Success
> osd.68.log-256200-2011-02-17 15:45:18.472401 7fd40b7f8940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6814/12798 pipe(0x7fd41c3c0d30 sd=152 pgs=116 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256201-2011-02-17 15:45:18.472450 7fd418fcf940 -- 172.17.40.29:6813/12558 >> 172.17.40.28:6810/12265 pipe(0xf1b050 sd=51 pgs=64 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256202-2011-02-17 15:45:18.472472 7fd418fcf940 -- 172.17.40.29:6813/12558 >> 172.17.40.28:6810/12265 pipe(0xf1b050 sd=51 pgs=64 cs=1 l=0).fault 0: Success
> osd.68.log-256203-2011-02-17 15:45:18.472523 7fd418fcf940 -- 172.17.40.29:6813/12558 >> 172.17.40.28:6810/12265 pipe(0xf1b050 sd=51 pgs=64 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256204-2011-02-17 15:45:18.472547 7fd411a5a940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6811/12265 pipe(0x7fd41c169cb0 sd=77 pgs=129 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256205-2011-02-17 15:45:18.472580 7fd411a5a940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6811/12265 pipe(0x7fd41c169cb0 sd=77 pgs=129 cs=1 l=0).fault 0: Success
> osd.68.log-256206-2011-02-17 15:45:18.472602 7fd411a5a940 -- 172.17.40.29:6814/12558 >> 172.17.40.28:6811/12265 pipe(0x7fd41c169cb0 sd=77 pgs=129 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256207-2011-02-17 15:45:18.472615 7fd40b9fa940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6823/13041 pipe(0x7fd41c1ceb60 sd=138 pgs=111 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256208-2011-02-17 15:45:18.474761 7fd40b9fa940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6823/13041 pipe(0x7fd41c1ceb60 sd=138 pgs=111 cs=1 l=0).fault 0: Success
> osd.68.log-256209-2011-02-17 15:45:18.474810 7fd40b9fa940 -- 172.17.40.29:6814/12558 >> 172.17.40.32:6823/13041 pipe(0x7fd41c1ceb60 sd=138 pgs=111 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256210-2011-02-17 15:45:18.477408 7fd42654e940 -- 172.17.40.29:6812/12558 <== client4240 172.17.40.73:0/3624355571 1 ==== osd_op(client4240.1:111 10000004e34.0000006e [write 0~4194304 [1@-1]] 0.a156 snapc 1=[]) ==== 128+0+4194304 (2756195295 0 0) 0xe43f80 con 0x7fd3fc001c00
> osd.68.log-256211-2011-02-17 15:45:18.479547 7fd413c7c940 -- 172.17.40.29:6814/12558 >> 172.17.40.27:6823/13093 pipe(0x7fd41c15fc30 sd=90 pgs=97 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256212-2011-02-17 15:45:18.479635 7fd40e626940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6804/11346 pipe(0x7fd41c13cc20 sd=113 pgs=72 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256213-2011-02-17 15:45:18.479736 7fd40e626940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6804/11346 pipe(0x7fd41c13cc20 sd=113 pgs=72 cs=1 l=0).fault 0: Success
> osd.68.log-256214-2011-02-17 15:45:18.479759 7fd40e626940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6804/11346 pipe(0x7fd41c13cc20 sd=113 pgs=72 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256215-2011-02-17 15:45:18.479776 7fd420d3c940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6807/11429 pipe(0xc63870 sd=17 pgs=4 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256216-2011-02-17 15:45:18.479805 7fd420d3c940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6807/11429 pipe(0xc63870 sd=17 pgs=4 cs=1 l=0).fault 0: Success
> osd.68.log-256217-2011-02-17 15:45:18.479827 7fd420d3c940 -- 172.17.40.29:6813/12558 >> 172.17.40.21:6807/11429 pipe(0xc63870 sd=17 pgs=4 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256218-2011-02-17 15:45:18.479846 7fd419cdc940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6822/12924 pipe(0xe8fc20 sd=42 pgs=58 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256219-2011-02-17 15:45:18.479867 7fd419cdc940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6822/12924 pipe(0xe8fc20 sd=42 pgs=58 cs=1 l=0).fault 0: Success
> osd.68.log-256220-2011-02-17 15:45:18.479898 7fd419cdc940 -- 172.17.40.29:6813/12558 >> 172.17.40.31:6822/12924 pipe(0xe8fc20 sd=42 pgs=58 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256221-2011-02-17 15:45:18.479915 7fd40f333940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6819/13310 pipe(0x7fd41c007c20 sd=117 pgs=75 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256222-2011-02-17 15:45:18.479936 7fd40f333940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6819/13310 pipe(0x7fd41c007c20 sd=117 pgs=75 cs=1 l=0).fault 0: Success
> osd.68.log-256223-2011-02-17 15:45:18.479955 7fd40f333940 -- 172.17.40.29:6813/12558 >> 172.17.40.22:6819/13310 pipe(0x7fd41c007c20 sd=117 pgs=75 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256224-2011-02-17 15:45:18.479973 7fd41a2e2940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6804/11594 pipe(0x7fd41c154c20 sd=122 pgs=71 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256225-2011-02-17 15:45:18.480025 7fd41a2e2940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6804/11594 pipe(0x7fd41c154c20 sd=122 pgs=71 cs=1 l=0).fault 0: Success
> osd.68.log-256226-2011-02-17 15:45:18.480044 7fd41a2e2940 -- 172.17.40.29:6813/12558 >> 172.17.40.24:6804/11594 pipe(0x7fd41c154c20 sd=122 pgs=71 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256227-2011-02-17 15:45:18.480074 7fd40d212940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6823/12924 pipe(0x7fd41c3fcc30 sd=104 pgs=120 cs=1 l=0).reader couldn't read tag, Success
> osd.68.log-256228-2011-02-17 15:45:18.480107 7fd40d212940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6823/12924 pipe(0x7fd41c3fcc30 sd=104 pgs=120 cs=1 l=0).fault 0: Success
> osd.68.log-256229-2011-02-17 15:45:18.480127 7fd40d212940 -- 172.17.40.29:6814/12558 >> 172.17.40.31:6823/12924 pipe(0x7fd41c3fcc30 sd=104 pgs=120 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256230-2011-02-17 15:45:18.480154 7fd427550940 osd68 5 pg[0.7fc( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [3,68] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x7fd404067010 op osd_sub_op(client4258.1:35 0.7fc 10000008109.00000022/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3
> osd.68.log-256231-2011-02-17 15:45:18.480271 7fd427550940 osd68 5 pg[0.680( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [90,68] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x7fd404031e40 op osd_sub_op(client4224.1:33 0.680 1000000ac0c.00000020/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3
> osd.68.log-256232-2011-02-17 15:45:18.480310 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] op_applied repgather(0x7fd3fc0eb480 applying 5'2 rep_tid=174 wfack=27,68 wfdisk=27 op=osd_op(client4244.1:126 1000000dee1.0000007d [write 0~4194304 [1@-1]] 0.588 snapc 1=[]))
> osd.68.log-256233-2011-02-17 15:45:18.480356 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] op_applied mode was rmw(wr=1)
> osd.68.log-256234-2011-02-17 15:45:18.480371 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] op_applied mode now idle(wr=0 WAKE) (finish_write)
> osd.68.log-256235-2011-02-17 15:45:18.480384 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] put_object_context 1000000dee1.0000007d/head 1 -> 0
> osd.68.log-256236-2011-02-17 15:45:18.480399 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] put_snapset_context 1000000dee1.0000007d 1 -> 0
> osd.68.log-256237-2011-02-17 15:45:18.480435 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] update_stats 3'13
> osd.68.log-256238-2011-02-17 15:45:18.480451 7fd427550940 osd68 5 pg[0.588( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,27] r=0 mlcod 0'0 active+clean] eval_repop repgather(0x7fd3fc0eb480 applied 5'2 rep_tid=174 wfack=27 wfdisk=27 op=osd_op(client4244.1:126 1000000dee1.0000007d [write 0~4194304 [1@-1]] 0.588 snapc 1=[])) wants=d
> osd.68.log-256239-2011-02-17 15:45:18.480471 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] op_applied repgather(0x7fd404013bf0 applying 5'2 rep_tid=175 wfack=31,68 wfdisk=31 op=osd_op(client4220.1:59 1000000ea9c.0000003a [write 0~4194304 [1@-1]] 0.cb03 snapc 1=[]))
> osd.68.log-256240-2011-02-17 15:45:18.480497 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] op_applied mode was rmw(wr=1)
> osd.68.log-256241-2011-02-17 15:45:18.480510 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] op_applied mode now idle(wr=0 WAKE) (finish_write)
> osd.68.log-256242-2011-02-17 15:45:18.480523 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] put_object_context 1000000ea9c.0000003a/head 1 -> 0
> osd.68.log-256243-2011-02-17 15:45:18.480536 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] put_snapset_context 1000000ea9c.0000003a 1 -> 0
> osd.68.log-256244-2011-02-17 15:45:18.480555 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] update_stats 3'13
> osd.68.log-256245-2011-02-17 15:45:18.480593 7fd427550940 osd68 5 pg[0.b03( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,31] r=0 mlcod 0'0 active+clean] eval_repop repgather(0x7fd404013bf0 applied 5'2 rep_tid=175 wfack=31 wfdisk=31 op=osd_op(client4220.1:59 1000000ea9c.0000003a [write 0~4194304 [1@-1]] 0.cb03 snapc 1=[])) wants=d
> osd.68.log-256246-2011-02-17 15:45:18.480612 7fd427550940 osd68 5 pg[0.b45( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [76,68] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x7fd3fc031200 op osd_sub_op(client4221.1:17 0.b45 1000000aff5.00000010/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3
> osd.68.log-256247-2011-02-17 15:45:18.480652 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] op_applied repgather(0x7fd404021370 applying 5'2 rep_tid=176 wfack=55,68 wfdisk=55 op=osd_op(client4246.1:15 10000007165.0000000e [write 0~4194304 [1@-1]] 0.4bcb snapc 1=[]))
> osd.68.log-256248-2011-02-17 15:45:18.480681 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] op_applied mode was rmw(wr=1)
> osd.68.log-256249-2011-02-17 15:45:18.480694 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] op_applied mode now idle(wr=0 WAKE) (finish_write)
> osd.68.log-256250-2011-02-17 15:45:18.480706 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] put_object_context 10000007165.0000000e/head 1 -> 0
> osd.68.log-256251-2011-02-17 15:45:18.480719 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] put_snapset_context 10000007165.0000000e 1 -> 0
> osd.68.log-256252-2011-02-17 15:45:18.480738 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] update_stats 3'13
> osd.68.log-256253-2011-02-17 15:45:18.480752 7fd427550940 osd68 5 pg[0.bcb( v 5'2 (0'0,5'2] n=2 ec=2 les=4 3/3/3) [68,55] r=0 mlcod 0'0 active+clean] eval_repop repgather(0x7fd404021370 applied 5'2 rep_tid=176 wfack=55 wfdisk=55 op=osd_op(client4246.1:15 10000007165.0000000e [write 0~4194304 [1@-1]] 0.4bcb snapc 1=[])) wants=d
> osd.68.log-256254-2011-02-17 15:45:18.480770 7fd427550940 osd68 5 pg[0.36b( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [21,68] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x7fd3fc0a9050 op osd_sub_op(client4213.1:82 0.36b 10000005dd8.00000051/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3
> osd.68.log-256255-2011-02-17 15:45:18.480811 7fd427550940 osd68 5 pg[0.38d( v 5'1 (0'0,5'1] n=1 ec=2 les=4 3/3/3) [5,68] r=1 luod=0'0 lcod 0'0 active] sub_op_modify_applied on 0x7fd40403f770 op osd_sub_op(client4256.1:52 0.38d 1000000b7c7.00000033/head [] v 5'1 snapset=0=[]:[] snapc=0=[]) v3
> osd.68.log-256256-2011-02-17 15:45:18.480901 7fd40db1b940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6820/13310 pipe(0xf5dc30 sd=133 pgs=123 cs=1 l=0).reader couldn't read tag, Transport endpoint is not connected
> osd.68.log-256257-2011-02-17 15:45:18.480929 7fd40db1b940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6820/13310 pipe(0xf5dc30 sd=133 pgs=123 cs=1 l=0).fault 107: Transport endpoint is not connected
> osd.68.log-256258-2011-02-17 15:45:18.480949 7fd40db1b940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6820/13310 pipe(0xf5dc30 sd=133 pgs=123 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256259-2011-02-17 15:45:18.481001 7fd4167a7940 -- 172.17.40.29:6814/12558 >> 172.17.40.26:6805/11056 pipe(0xf06700 sd=68 pgs=90 cs=1 l=0).fault 0: Success
> osd.68.log-256260-2011-02-17 15:45:18.481037 7fd4167a7940 -- 172.17.40.29:6814/12558 >> 172.17.40.26:6805/11056 pipe(0xf06700 sd=68 pgs=90 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256261-2011-02-17 15:45:18.481059 7fd413979940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6817/13237 pipe(0x7fd41c1fcc00 sd=34 pgs=124 cs=1 l=0).fault 0: Success
> osd.68.log-256262-2011-02-17 15:45:18.481085 7fd413979940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6817/13237 pipe(0x7fd41c1fcc00 sd=34 pgs=124 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256263-2011-02-17 15:45:18.481107 7fd40d919940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6823/11833 pipe(0x7fd41c37cc20 sd=146 pgs=116 cs=1 l=0).fault 0: Success
> osd.68.log-256264-2011-02-17 15:45:18.481139 7fd40d919940 -- 172.17.40.29:6814/12558 >> 172.17.40.21:6823/11833 pipe(0x7fd41c37cc20 sd=146 pgs=116 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256265-2011-02-17 15:45:18.481174 7fd4182c2940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6802/12830 pipe(0xebaaf0 sd=56 pgs=94 cs=1 l=0).fault 0: Success
> osd.68.log-256266-2011-02-17 15:45:18.481207 7fd4182c2940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6802/12830 pipe(0xebaaf0 sd=56 pgs=94 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256267-2011-02-17 15:45:18.481228 7fd41b3f3940 -- 172.17.40.29:6814/12558 >> 172.17.40.25:6802/12569 pipe(0xd2bd50 sd=30 pgs=87 cs=1 l=0).fault 0: Success
> osd.68.log-256268-2011-02-17 15:45:18.481262 7fd41b3f3940 -- 172.17.40.29:6814/12558 >> 172.17.40.25:6802/12569 pipe(0xd2bd50 sd=30 pgs=87 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256269-2011-02-17 15:45:18.481284 7fd41a4e4940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6811/13703 pipe(0xe40d50 sd=39 pgs=89 cs=1 l=0).fault 0: Success
> osd.68.log-256270-2011-02-17 15:45:18.481316 7fd41a4e4940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6811/13703 pipe(0xe40d50 sd=39 pgs=89 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256271-2011-02-17 15:45:18.481337 7fd4194d4940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6808/13607 pipe(0x7fd41c1f6be0 sd=26 pgs=134 cs=1 l=0).fault 0: Success
> osd.68.log-256272-2011-02-17 15:45:18.481372 7fd4194d4940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6808/13607 pipe(0x7fd41c1f6be0 sd=26 pgs=134 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256273-2011-02-17 15:45:18.481393 7fd414282940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6811/13069 pipe(0x7fd41c053c20 sd=88 pgs=97 cs=1 l=0).fault 0: Success
> osd.68.log-256274-2011-02-17 15:45:18.481426 7fd414282940 -- 172.17.40.29:6814/12558 >> 172.17.40.22:6811/13069 pipe(0x7fd41c053c20 sd=88 pgs=97 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256275-2011-02-17 15:45:18.481591 7fd412767940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6823/14021 pipe(0x7fd41c1b7c30 sd=100 pgs=87 cs=1 l=0).fault 0: Success
> osd.68.log-256276-2011-02-17 15:45:18.481633 7fd412767940 -- 172.17.40.29:6814/12558 >> 172.17.40.23:6823/14021 pipe(0x7fd41c1b7c30 sd=100 pgs=87 cs=1 l=0).fault with nothing to send, going to standby
> osd.68.log-256277-2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status
> osd.68.log-256278-2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done
> 
> 
> See anything useful in there?
> 
> Let me know if there's anything I can do to get you more information about this.
> 
> > 
> > Thanks for looking at this!
> 
> No problem :)
> 
> -- Jim
> 
> > sage
> > 
> > 
> > 
> > > 
> > > -- Jim
> > > 
> > > > 
> > > > sage
> > > > 
> > > 
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-18  7:13               ` Sage Weil
@ 2011-02-18 17:04                 ` Jim Schutt
  2011-02-18 17:15                 ` Gregory Farnum
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-02-18 17:04 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Fri, 2011-02-18 at 00:13 -0700, Sage Weil wrote:
> On Thu, 17 Feb 2011, Jim Schutt wrote:
> > Check out the result:
> >
> > osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick
> > osd.68.log:256028:2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock
> > osd.68.log:256029:2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock
> > osd.68.log:256031:2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report
> > osd.68.log:256032:2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs
> > osd.68.log:256033:2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient
> > osd.68.log:256034:2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick     <==
> > osd.68.log:256277:2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status <== 28 second gap
> > osd.68.log:256278:2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done
> > osd.68.log:256279:2011-02-17 15:45:18.481691 7fd42ad57940 osd68 5 tick
> > osd.68.log:256280:2011-02-17 15:45:18.481705 7fd42ad57940 osd68 5 tick getting read lock on map_lock
> > osd.68.log:256281:2011-02-17 15:45:18.481712 7fd42ad57940 osd68 5 tick got read lock on map_lock
> > osd.68.log:256688:2011-02-17 15:45:20.010705 7fd42ad57940 osd68 5 tick sending mon report
> > osd.68.log:256753:2011-02-17 15:45:20.012950 7fd42ad57940 osd68 5 tick removing stray pgs
> > osd.68.log:256754:2011-02-17 15:45:20.012959 7fd42ad57940 osd68 5 tick sending log to logclient
> > osd.68.log:256755:2011-02-17 15:45:20.012965 7fd42ad57940 osd68 5 tick arming timer for next tick
> > osd.68.log:256756:2011-02-17 15:45:20.012976 7fd42ad57940 osd68 5 tick checking dispatch queue status
> > osd.68.log:256757:2011-02-17 15:45:20.012993 7fd42ad57940 osd68 5 tick done
> >
> > Why should it take 28 seconds to add a new timer event?
> 
> Huh.. that is pretty weird.  I see multiple sync in there, too, so it's
> not like something was somehow blocking on a btrfs commit.
> 
> Anybody else have ideas?  :/

So I thought I'd add debug timer = 20 to the [osd] section 
in my config.  Rebuilding my filesystem resulted in this:

importing contents of /tmp/keyring.mds.an15 into /tmp/monkeyring.10158
creating /tmp/keyring.osd.0
cauthtool: ./common/Mutex.h:118: void Mutex::Lock(bool): Assertion `r == 0' failed.
*** Caught signal (Aborted) ***
in thread 7f6e8f29f6f0
 ceph version 0.24.3 (commit:97d65f2cbca2046fa9a014fcdff1628641d43fd8)
 1: (ceph::BackTrace::BackTrace(int)+0x2a) [0x45cd8c]
 2: /usr/bin/cauthtool [0x466a9d]
 3: /lib64/libpthread.so.0 [0x7f6e8ee93b10]
 4: (gsignal()+0x35) [0x7f6e8dcfe265]
 5: (abort()+0x110) [0x7f6e8dcffd10]
 6: (__assert_fail()+0xf6) [0x7f6e8dcf76e6]
 7: (Mutex::Lock(bool)+0x88) [0x45b5e6]
 8: /usr/bin/cauthtool [0x45ba1e]
 9: (operator<<(std::ostream&, _dbeginl_t)+0x11) [0x45c733]
 a: (SafeTimer::shutdown()+0x2a) [0x484d6e]
 b: /usr/bin/cauthtool [0x480a0b]
 c: (FlusherStopper::~FlusherStopper()+0x11) [0x48322d]
 d: /usr/bin/cauthtool [0x480a2e]
 e: (exit()+0xe5) [0x7f6e8dd013a5]
 f: (__libc_start_main()+0xfb) [0x7f6e8dceb99b]
 10: (__gxx_personality_v0()+0x2d1) [0x44ce09]

I guess I'll rebuild it without timer debugging, then
start it with timer debugging, and see what happens.

-- Jim

> 
> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-18  7:13               ` Sage Weil
  2011-02-18 17:04                 ` Jim Schutt
@ 2011-02-18 17:15                 ` Gregory Farnum
  2011-02-18 18:41                 ` Jim Schutt
  2011-02-18 19:07                 ` Colin McCabe
  3 siblings, 0 replies; 94+ messages in thread
From: Gregory Farnum @ 2011-02-18 17:15 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jim Schutt, ceph-devel

On Thursday, February 17, 2011 at 11:13 PM, Sage Weil wrote:
On Thu, 17 Feb 2011, Jim Schutt wrote:
> > Why should it take 28 seconds to add a new timer event?
> 
> Huh.. that is pretty weird. I see multiple sync in there, too, so it's 
> not like something was somehow blocking on a btrfs commit.
> 
> Anybody else have ideas? :/
> 
> sage
I must be missing something but I looked through that code (it's short), and as far as I can tell there's no blocking code anywhere at all. The only time it takes any locks is when grabbing the dout locks for those debug prints; it doesn't initiate any file IO or wait for any completions; it just does a few map ops and g_time.now() arithmetic.
It does signal Timer::cond at the end of the timer insert, but the only things waiting on that should be the timer thread itself, which needs to get back the osd_lock, which is currently locked so that shouldn't be a problem either!
-Greg




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-18  7:13               ` Sage Weil
  2011-02-18 17:04                 ` Jim Schutt
  2011-02-18 17:15                 ` Gregory Farnum
@ 2011-02-18 18:41                 ` Jim Schutt
  2011-02-18 19:07                 ` Colin McCabe
  3 siblings, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-02-18 18:41 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Fri, 2011-02-18 at 00:13 -0700, Sage Weil wrote:
> On Thu, 17 Feb 2011, Jim Schutt wrote:
> > Check out the result:
> >
> > osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick
> > osd.68.log:256028:2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock
> > osd.68.log:256029:2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock
> > osd.68.log:256031:2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report
> > osd.68.log:256032:2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs
> > osd.68.log:256033:2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient
> > osd.68.log:256034:2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick     <==
> > osd.68.log:256277:2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status <== 28 second gap
> > osd.68.log:256278:2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done
> > osd.68.log:256279:2011-02-17 15:45:18.481691 7fd42ad57940 osd68 5 tick
> > osd.68.log:256280:2011-02-17 15:45:18.481705 7fd42ad57940 osd68 5 tick getting read lock on map_lock
> > osd.68.log:256281:2011-02-17 15:45:18.481712 7fd42ad57940 osd68 5 tick got read lock on map_lock
> > osd.68.log:256688:2011-02-17 15:45:20.010705 7fd42ad57940 osd68 5 tick sending mon report
> > osd.68.log:256753:2011-02-17 15:45:20.012950 7fd42ad57940 osd68 5 tick removing stray pgs
> > osd.68.log:256754:2011-02-17 15:45:20.012959 7fd42ad57940 osd68 5 tick sending log to logclient
> > osd.68.log:256755:2011-02-17 15:45:20.012965 7fd42ad57940 osd68 5 tick arming timer for next tick
> > osd.68.log:256756:2011-02-17 15:45:20.012976 7fd42ad57940 osd68 5 tick checking dispatch queue status
> > osd.68.log:256757:2011-02-17 15:45:20.012993 7fd42ad57940 osd68 5 tick done
> >
> > Why should it take 28 seconds to add a new timer event?
> 
> Huh.. that is pretty weird.  I see multiple sync in there, too, so it's
> not like something was somehow blocking on a btrfs commit.

Here's another run; the tick gap is in a different place:

osd.91.log:354239:2011-02-18 10:21:49.391986 7f012c6d7940 osd91 5 tick
osd.91.log:354240:2011-02-18 10:21:49.392059 7f012c6d7940 osd91 5 tick getting read lock on map_lock
osd.91.log:354241:2011-02-18 10:21:49.392067 7f012c6d7940 osd91 5 tick got read lock on map_lock
osd.91.log:354243:2011-02-18 10:21:49.392210 7f012c6d7940 osd91 5 tick sending mon report
osd.91.log:354244:2011-02-18 10:21:49.392217 7f012c6d7940 osd91 5 tick removing stray pgs
osd.91.log:354245:2011-02-18 10:21:49.392225 7f012c6d7940 osd91 5 tick sending log to logclient
osd.91.log:354246:2011-02-18 10:21:49.392231 7f012c6d7940 osd91 5 tick arming timer for next tick
osd.91.log:354247:2011-02-18 10:21:49.392241 7f012c6d7940 osd91 5 tick checking dispatch queue status
osd.91.log:354248:2011-02-18 10:21:49.392247 7f012c6d7940 osd91 5 tick done                              <==
osd.91.log:355120:2011-02-18 10:22:14.948941 7f012c6d7940 osd91 5 tick                                   <== 25 second gap
osd.91.log:355121:2011-02-18 10:22:14.948952 7f012c6d7940 osd91 5 tick getting read lock on map_lock
osd.91.log:355122:2011-02-18 10:22:14.948959 7f012c6d7940 osd91 5 tick got read lock on map_lock
osd.91.log:355338:2011-02-18 10:22:14.956905 7f012c6d7940 osd91 5 tick sending mon report
osd.91.log:355398:2011-02-18 10:22:14.958590 7f012c6d7940 osd91 5 tick removing stray pgs
osd.91.log:355399:2011-02-18 10:22:14.958598 7f012c6d7940 osd91 5 tick sending log to logclient
osd.91.log:355400:2011-02-18 10:22:14.958605 7f012c6d7940 osd91 5 tick arming timer for next tick
osd.91.log:355401:2011-02-18 10:22:14.958615 7f012c6d7940 osd91 5 tick checking dispatch queue status
osd.91.log:355402:2011-02-18 10:22:14.958625 7f012c6d7940 osd91 5 tick done


> Anybody else have ideas?  :/

Hmmm, when I started using 2.6.28-rc kernels I enabled cgroups:

CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_NS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_BLK_CGROUP is not set
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_NET_CLS_CGROUP is not set

Since I can't think of anything else to try, I'll
try turning them off.....

-- Jim

> 
> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-18  7:13               ` Sage Weil
                                   ` (2 preceding siblings ...)
  2011-02-18 18:41                 ` Jim Schutt
@ 2011-02-18 19:07                 ` Colin McCabe
  2011-02-18 20:48                   ` Jim Schutt
  3 siblings, 1 reply; 94+ messages in thread
From: Colin McCabe @ 2011-02-18 19:07 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jim Schutt, Gregory Farnum, ceph-devel

> Anybody else have ideas?  :/
>

At the risk of asking a dumb question, does syslog have messages from
ntpd about date/time adjustments?

Also, what is the load on the system like? dout will write to a
logfile using write(2), which of course is a blocking system call. 28
seconds is a long time for a write to block, but things can get weird
under heavy I/O.

A more general problem is priority inversion through the dout lock.
For example, some other piece of code could be doing something like this

dout << my_happy_printer_function() << dendl

If my_happy_printer_function takes some heavily contended locks, this
will turn into
1. lock dout lock
2. lock heavily contended locks
3. write(2) to logfile
4. unlock ""
5. unlock dout lock

This is one reason why I want to get rid of the dout lock (although
not the only one.)

Colin


On Thu, Feb 17, 2011 at 11:13 PM, Sage Weil <sage@newdream.net> wrote:
> On Thu, 17 Feb 2011, Jim Schutt wrote:
>> Check out the result:
>>
>> osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick
>> osd.68.log:256028:2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock
>> osd.68.log:256029:2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock
>> osd.68.log:256031:2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report
>> osd.68.log:256032:2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs
>> osd.68.log:256033:2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient
>> osd.68.log:256034:2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick     <==
>> osd.68.log:256277:2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status <== 28 second gap
>> osd.68.log:256278:2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done
>> osd.68.log:256279:2011-02-17 15:45:18.481691 7fd42ad57940 osd68 5 tick
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-18 19:07                 ` Colin McCabe
@ 2011-02-18 20:48                   ` Jim Schutt
  2011-02-18 20:58                     ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-02-18 20:48 UTC (permalink / raw)
  To: Colin McCabe; +Cc: Sage Weil, Gregory Farnum, ceph-devel


On Fri, 2011-02-18 at 12:07 -0700, Colin McCabe wrote:
> > Anybody else have ideas?  :/
> >
> 
> At the risk of asking a dumb question, does syslog have messages from
> ntpd about date/time adjustments?

No, just a few messages like this:

Feb 18 08:34:43 an12 ntpd[4939]: kernel time sync enabled 0001
Feb 18 09:08:53 an12 ntpd[4939]: kernel time sync enabled 4001
Feb 18 10:34:16 an12 ntpd[4939]: kernel time sync enabled 0001
Feb 18 11:25:30 an12 ntpd[4939]: kernel time sync enabled 4001
Feb 18 11:42:36 an12 ntpd[4939]: kernel time sync enabled 0001
Feb 18 12:33:52 an12 ntpd[4939]: kernel time sync enabled 4001
Feb 18 13:08:00 an12 ntpd[4939]: kernel time sync enabled 0001

> 
> Also, what is the load on the system like? dout will write to a
> logfile using write(2), which of course is a blocking system call. 28
> seconds is a long time for a write to block, but things can get weird
> under heavy I/O.

During my tests my systems have high iowait  (40-80%), but CPU
utilization is relatively low.  However, my ceph logs are going 
to a dedicated disk via a dedicated I/O controller; i.e. my 
ceph data disks are FC and my log disk is a local SATA disk.

-- Jim

> 
> A more general problem is priority inversion through the dout lock.
> For example, some other piece of code could be doing something like this
> 
> dout << my_happy_printer_function() << dendl
> 
> If my_happy_printer_function takes some heavily contended locks, this
> will turn into
> 1. lock dout lock
> 2. lock heavily contended locks
> 3. write(2) to logfile
> 4. unlock ""
> 5. unlock dout lock
> 
> This is one reason why I want to get rid of the dout lock (although
> not the only one.)
> 
> Colin
> 
> 
> On Thu, Feb 17, 2011 at 11:13 PM, Sage Weil <sage@newdream.net> wrote:
> > On Thu, 17 Feb 2011, Jim Schutt wrote:
> >> Check out the result:
> >>
> >> osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick
> >> osd.68.log:256028:2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock
> >> osd.68.log:256029:2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock
> >> osd.68.log:256031:2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report
> >> osd.68.log:256032:2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs
> >> osd.68.log:256033:2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient
> >> osd.68.log:256034:2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick     <==
> >> osd.68.log:256277:2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status <== 28 second gap
> >> osd.68.log:256278:2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done
> >> osd.68.log:256279:2011-02-17 15:45:18.481691 7fd42ad57940 osd68 5 tick
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-18 20:48                   ` Jim Schutt
@ 2011-02-18 20:58                     ` Sage Weil
  2011-02-18 21:09                       ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-02-18 20:58 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Colin McCabe, Gregory Farnum, ceph-devel

On Fri, 18 Feb 2011, Jim Schutt wrote:
> On Fri, 2011-02-18 at 12:07 -0700, Colin McCabe wrote:
> > > Anybody else have ideas?  :/
> > >
> > 
> > At the risk of asking a dumb question, does syslog have messages from
> > ntpd about date/time adjustments?
> 
> No, just a few messages like this:
> 
> Feb 18 08:34:43 an12 ntpd[4939]: kernel time sync enabled 0001
> Feb 18 09:08:53 an12 ntpd[4939]: kernel time sync enabled 4001
> Feb 18 10:34:16 an12 ntpd[4939]: kernel time sync enabled 0001
> Feb 18 11:25:30 an12 ntpd[4939]: kernel time sync enabled 4001
> Feb 18 11:42:36 an12 ntpd[4939]: kernel time sync enabled 0001
> Feb 18 12:33:52 an12 ntpd[4939]: kernel time sync enabled 4001
> Feb 18 13:08:00 an12 ntpd[4939]: kernel time sync enabled 0001
> 
> > 
> > Also, what is the load on the system like? dout will write to a
> > logfile using write(2), which of course is a blocking system call. 28
> > seconds is a long time for a write to block, but things can get weird
> > under heavy I/O.
> 
> During my tests my systems have high iowait  (40-80%), but CPU
> utilization is relatively low.  However, my ceph logs are going 
> to a dedicated disk via a dedicated I/O controller; i.e. my 
> ceph data disks are FC and my log disk is a local SATA disk.

Also, at least in your earlier example, lots of other crap is going into 
the log in a timely manner.

One thing that might shed some light on this is getting a series of 
snapshots of /proc/$cosd_pid/tasks/*/{status,sched,syscall} during this 
period.  (We also need to map the pthread ids we see in the logs to the 
task pid...)  It sounds to me like something is preventing the kernel from 
scheduling the thread, unrelated to and pthread-level primitives the code 
is interacting with.  I'm not familiar enough with the scheduling 
internals to know what that might be.

sage


> 
> -- Jim
> 
> > 
> > A more general problem is priority inversion through the dout lock.
> > For example, some other piece of code could be doing something like this
> > 
> > dout << my_happy_printer_function() << dendl
> > 
> > If my_happy_printer_function takes some heavily contended locks, this
> > will turn into
> > 1. lock dout lock
> > 2. lock heavily contended locks
> > 3. write(2) to logfile
> > 4. unlock ""
> > 5. unlock dout lock
> > 
> > This is one reason why I want to get rid of the dout lock (although
> > not the only one.)
> > 
> > Colin
> > 
> > 
> > On Thu, Feb 17, 2011 at 11:13 PM, Sage Weil <sage@newdream.net> wrote:
> > > On Thu, 17 Feb 2011, Jim Schutt wrote:
> > >> Check out the result:
> > >>
> > >> osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick
> > >> osd.68.log:256028:2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock
> > >> osd.68.log:256029:2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock
> > >> osd.68.log:256031:2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report
> > >> osd.68.log:256032:2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs
> > >> osd.68.log:256033:2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient
> > >> osd.68.log:256034:2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick     <==
> > >> osd.68.log:256277:2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status <== 28 second gap
> > >> osd.68.log:256278:2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done
> > >> osd.68.log:256279:2011-02-17 15:45:18.481691 7fd42ad57940 osd68 5 tick
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-18 20:58                     ` Sage Weil
@ 2011-02-18 21:09                       ` Jim Schutt
  0 siblings, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-02-18 21:09 UTC (permalink / raw)
  To: Sage Weil; +Cc: Colin McCabe, Gregory Farnum, ceph-devel


On Fri, 2011-02-18 at 13:58 -0700, Sage Weil wrote:
> On Fri, 18 Feb 2011, Jim Schutt wrote:
> > On Fri, 2011-02-18 at 12:07 -0700, Colin McCabe wrote:
> > > > Anybody else have ideas?  :/
> > > >
> > > 
> > > At the risk of asking a dumb question, does syslog have messages from
> > > ntpd about date/time adjustments?
> > 
> > No, just a few messages like this:
> > 
> > Feb 18 08:34:43 an12 ntpd[4939]: kernel time sync enabled 0001
> > Feb 18 09:08:53 an12 ntpd[4939]: kernel time sync enabled 4001
> > Feb 18 10:34:16 an12 ntpd[4939]: kernel time sync enabled 0001
> > Feb 18 11:25:30 an12 ntpd[4939]: kernel time sync enabled 4001
> > Feb 18 11:42:36 an12 ntpd[4939]: kernel time sync enabled 0001
> > Feb 18 12:33:52 an12 ntpd[4939]: kernel time sync enabled 4001
> > Feb 18 13:08:00 an12 ntpd[4939]: kernel time sync enabled 0001
> > 
> > > 
> > > Also, what is the load on the system like? dout will write to a
> > > logfile using write(2), which of course is a blocking system call. 28
> > > seconds is a long time for a write to block, but things can get weird
> > > under heavy I/O.
> > 
> > During my tests my systems have high iowait  (40-80%), but CPU
> > utilization is relatively low.  However, my ceph logs are going 
> > to a dedicated disk via a dedicated I/O controller; i.e. my 
> > ceph data disks are FC and my log disk is a local SATA disk.
> 
> Also, at least in your earlier example, lots of other crap is going into 
> the log in a timely manner.
> 
> One thing that might shed some light on this is getting a series of 
> snapshots of /proc/$cosd_pid/tasks/*/{status,sched,syscall} during this 
> period. 

I worked on that a little yesterday... 

Right now I am running 8 cosds/server, and such a /proc snapshot
takes ~6 seconds to complete, due to the number of threads
I guess.

I'll need to do some more testing at smaller osd counts to
see if I can reproduce and get such a snapshot to complete
quickly enough to be useful.

Also, I was thinking of trying latencytop, or trying to
learn enough about some of the new kernel diagnostics
to see if there's anything there that might help.

>  (We also need to map the pthread ids we see in the logs to the 
> task pid...)  It sounds to me like something is preventing the kernel from 
> scheduling the thread, unrelated to and pthread-level primitives the code 
> is interacting with. 

I was thinking along those lines - I'll soon have some
results with CONFIG_CGROUP_SCHED disabled.  Since it's
relatively new, maybe it's causing unexpected trouble
here.  But I'm really just guessing.

-- Jim

>  I'm not familiar enough with the scheduling 
> internals to know what that might be.
> 
> sage
> 
> 
> > 
> > -- Jim
> > 
> > > 
> > > A more general problem is priority inversion through the dout lock.
> > > For example, some other piece of code could be doing something like this
> > > 
> > > dout << my_happy_printer_function() << dendl
> > > 
> > > If my_happy_printer_function takes some heavily contended locks, this
> > > will turn into
> > > 1. lock dout lock
> > > 2. lock heavily contended locks
> > > 3. write(2) to logfile
> > > 4. unlock ""
> > > 5. unlock dout lock
> > > 
> > > This is one reason why I want to get rid of the dout lock (although
> > > not the only one.)
> > > 
> > > Colin
> > > 
> > > 
> > > On Thu, Feb 17, 2011 at 11:13 PM, Sage Weil <sage@newdream.net> wrote:
> > > > On Thu, 17 Feb 2011, Jim Schutt wrote:
> > > >> Check out the result:
> > > >>
> > > >> osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick
> > > >> osd.68.log:256028:2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock
> > > >> osd.68.log:256029:2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock
> > > >> osd.68.log:256031:2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report
> > > >> osd.68.log:256032:2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs
> > > >> osd.68.log:256033:2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient
> > > >> osd.68.log:256034:2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick     <==
> > > >> osd.68.log:256277:2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status <== 28 second gap
> > > >> osd.68.log:256278:2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done
> > > >> osd.68.log:256279:2011-02-17 15:45:18.481691 7fd42ad57940 osd68 5 tick
> > > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-17 16:11           ` Sage Weil
  2011-02-17 23:31             ` Jim Schutt
@ 2011-02-23 17:52             ` Jim Schutt
  2011-02-23 18:12               ` Gregory Farnum
  1 sibling, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-02-23 17:52 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

[-- Attachment #1: Type: text/plain, Size: 3194 bytes --]


On Thu, 2011-02-17 at 09:11 -0700, Sage Weil wrote:
> On Thu, 17 Feb 2011, Jim Schutt wrote:
> > Hi Sage,
> > 
> > On Wed, 2011-02-16 at 17:54 -0700, Sage Weil wrote:
> > > On Wed, 16 Feb 2011, Sage Weil wrote:
> > > > shouldn't affect anything.  We may have missed something.. do you have a 
> > > > log showing this in action?
> > > 
> > > Obviously yes, looking at your original email.  :)  At the beginning of 
> > > each log line we include a thread id.  What would be really helpful would 
> > > be to narrow down where in OSD::heartbeat_entry() and heartbeat() things 
> > > are blocking, either based on the existing output, or by adding additional 
> > > dout lines at interesting points in time.
> > 
> > I'll take a deeper look at my existing logs with
> > that in mind; let me know if you'd like me to
> > send you some.
> > 
> > I have also been looking at map_lock, as it seems
> > to be shared between the heartbeat and map update
> > threads.
> > 
> > Would instrumenting acquiring/releasing that lock
> > be helpful?  Is there some other lock that may
> > be more fruitful to instrument?  I can reproduce 
> > pretty reliably, so adding instrumentation is 
> > no problem.
> 
> The heartbeat thread is doing a map_lock.try_get_read() because it 
> frequently is held by another thread, so that shouldn't ever block. 
> 
> The possibilities I see are:
>  - peer_stat_lock
>  - the monc->sub_want / renew_subs calls (monc has an internal lock), 
> although that code should only trigger with a single osd.  :/
>  - heartbeat_lock itself could be held by another thread; i'd instrument 
> all locks/unlocks there, along with the wakeup in heartbeat().

I think the culprit is osd_lock.

The OSD tick timer needs to acquire the osd_lock mutex to
wake up, right?  So if osd_lock is ever held for an excessive 
time, the tick will be delayed.

I used debug lockdep = 20 on my osds, and saw this via
egrep "10:07:(2[2-9]|3[0-3])" osd.70.log | egrep "tick|osd_lock" | grep -v "tick .*"

2011-02-23 10:07:23.454954 0x7f51a17fa940 lockdep: _will_lock OSD::osd_lock (9)
2011-02-23 10:07:23.455118 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
2011-02-23 10:07:23.455254 0x7f51a37fe940 lockdep: _will_lock OSD::osd_lock (9)
2011-02-23 10:07:23.455262 0x7f51a37fe940 lockdep: _locked OSD::osd_lock
2011-02-23 10:07:33.008092 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
2011-02-23 10:07:33.009135 0x7f51a17fa940 lockdep: _locked OSD::osd_lock
2011-02-23 10:07:33.009217 0x7f51a17fa940 lockdep: _will_unlock OSD::osd_lock
2011-02-23 10:07:33.009307 0x7f51ad89d940 osd70 149 tick

I've attached a full log of the 10 second period that osd_lock
was held.

Is it possible to separate out the parts of tick() that
need to covered by osd_lock and do them somewhere else, 
so that a new lock, say tick_lock, can be used to
protect the tick timer?

-- Jim


> 
> Thanks for looking at this!
> sage
> 
> 
> 
> > 
> > -- Jim
> > 
> > > 
> > > sage
> > > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 

[-- Attachment #2: osd.70.log.stall.bz2 --]
[-- Type: application/x-bzip, Size: 25916 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-23 17:52             ` Jim Schutt
@ 2011-02-23 18:12               ` Gregory Farnum
  2011-02-23 18:54                 ` Sage Weil
  2011-02-23 19:23                 ` Jim Schutt
  0 siblings, 2 replies; 94+ messages in thread
From: Gregory Farnum @ 2011-02-23 18:12 UTC (permalink / raw)
  To: Jim Schutt; +Cc: ceph-devel


On Wednesday, February 23, 2011 at 9:52 AM, Jim Schutt wrote: 
> 
> On Thu, 2011-02-17 at 09:11 -0700, Sage Weil wrote:
> > On Thu, 17 Feb 2011, Jim Schutt wrote:
> > > Hi Sage,
> > > 
> > > On Wed, 2011-02-16 at 17:54 -0700, Sage Weil wrote:
> > > > On Wed, 16 Feb 2011, Sage Weil wrote:
> > > > > shouldn't affect anything. We may have missed something.. do you have a 
> > > > > log showing this in action?
> > > > 
> > > > Obviously yes, looking at your original email. :) At the beginning of 
> > > > each log line we include a thread id. What would be really helpful would 
> > > > be to narrow down where in OSD::heartbeat_entry() and heartbeat() things 
> > > > are blocking, either based on the existing output, or by adding additional 
> > > > dout lines at interesting points in time.
> > > 
> > > I'll take a deeper look at my existing logs with
> > > that in mind; let me know if you'd like me to
> > > send you some.
> > > 
> > > I have also been looking at map_lock, as it seems
> > > to be shared between the heartbeat and map update
> > > threads.
> > > 
> > > Would instrumenting acquiring/releasing that lock
> > > be helpful? Is there some other lock that may
> > > be more fruitful to instrument? I can reproduce 
> > > pretty reliably, so adding instrumentation is 
> > > no problem.
> > 
> > The heartbeat thread is doing a map_lock.try_get_read() because it 
> > frequently is held by another thread, so that shouldn't ever block. 
> > 
> > The possibilities I see are:
> >  - peer_stat_lock
> >  - the monc->sub_want / renew_subs calls (monc has an internal lock), 
> > although that code should only trigger with a single osd. :/
> >  - heartbeat_lock itself could be held by another thread; i'd instrument 
> > all locks/unlocks there, along with the wakeup in heartbeat().
> 
> I think the culprit is osd_lock.
> 
> The OSD tick timer needs to acquire the osd_lock mutex to
> wake up, right? So if osd_lock is ever held for an excessive 
> time, the tick will be delayed.
> 
> I used debug lockdep = 20 on my osds, and saw this via
> egrep "10:07:(2[2-9]|3[0-3])" osd.70.log | egrep "tick|osd_lock" | grep -v "tick .*"
> 
> 2011-02-23 10:07:23.454954 0x7f51a17fa940 lockdep: _will_lock OSD::osd_lock (9)
> 2011-02-23 10:07:23.455118 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
> 2011-02-23 10:07:23.455254 0x7f51a37fe940 lockdep: _will_lock OSD::osd_lock (9)
> 2011-02-23 10:07:23.455262 0x7f51a37fe940 lockdep: _locked OSD::osd_lock
> 2011-02-23 10:07:33.008092 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
> 2011-02-23 10:07:33.009135 0x7f51a17fa940 lockdep: _locked OSD::osd_lock
> 2011-02-23 10:07:33.009217 0x7f51a17fa940 lockdep: _will_unlock OSD::osd_lock
> 2011-02-23 10:07:33.009307 0x7f51ad89d940 osd70 149 tick
> 
> I've attached a full log of the 10 second period that osd_lock
> was held.
> 
> Is it possible to separate out the parts of tick() that
> need to covered by osd_lock and do them somewhere else, 
> so that a new lock, say tick_lock, can be used to
> protect the tick timer?
> 
> -- Jim
> 

tick() isn't involved in heartbeats, actually, so any blocks there should be unrelated to heartbeat bugs (and once you're in tick() or functions it calls, there's no losing the heartbeat!). 
I have managed to get OSDs wrongly marking each other down during startup when they're peering large numbers of PGs/pools, as they disagree on who they need to be heartbeating (due to the slow handling of new osd maps and pg creates); if you're mostly seeing OSDs get incorrectly marked down during low epochs (your original email said epoch 7) this is probably what you're finding. 

We still have no idea what could be causing the stall *inside* of tick(), though. :/
-Greg




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-23 18:12               ` Gregory Farnum
@ 2011-02-23 18:54                 ` Sage Weil
  2011-02-23 19:12                   ` Gregory Farnum
  2011-02-23 19:23                 ` Jim Schutt
  1 sibling, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-02-23 18:54 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Jim Schutt, ceph-devel

On Wed, 23 Feb 2011, Gregory Farnum wrote:
> 
> On Wednesday, February 23, 2011 at 9:52 AM, Jim Schutt wrote: 
> > 
> > On Thu, 2011-02-17 at 09:11 -0700, Sage Weil wrote:
> > > On Thu, 17 Feb 2011, Jim Schutt wrote:
> > > > Hi Sage,
> > > > 
> > > > On Wed, 2011-02-16 at 17:54 -0700, Sage Weil wrote:
> > > > > On Wed, 16 Feb 2011, Sage Weil wrote:
> > > > > > shouldn't affect anything. We may have missed something.. do you have a 
> > > > > > log showing this in action?
> > > > > 
> > > > > Obviously yes, looking at your original email. :) At the beginning of 
> > > > > each log line we include a thread id. What would be really helpful would 
> > > > > be to narrow down where in OSD::heartbeat_entry() and heartbeat() things 
> > > > > are blocking, either based on the existing output, or by adding additional 
> > > > > dout lines at interesting points in time.
> > > > 
> > > > I'll take a deeper look at my existing logs with
> > > > that in mind; let me know if you'd like me to
> > > > send you some.
> > > > 
> > > > I have also been looking at map_lock, as it seems
> > > > to be shared between the heartbeat and map update
> > > > threads.
> > > > 
> > > > Would instrumenting acquiring/releasing that lock
> > > > be helpful? Is there some other lock that may
> > > > be more fruitful to instrument? I can reproduce 
> > > > pretty reliably, so adding instrumentation is 
> > > > no problem.
> > > 
> > > The heartbeat thread is doing a map_lock.try_get_read() because it 
> > > frequently is held by another thread, so that shouldn't ever block. 
> > > 
> > > The possibilities I see are:
> > >  - peer_stat_lock
> > >  - the monc->sub_want / renew_subs calls (monc has an internal lock), 
> > > although that code should only trigger with a single osd. :/
> > >  - heartbeat_lock itself could be held by another thread; i'd instrument 
> > > all locks/unlocks there, along with the wakeup in heartbeat().
> > 
> > I think the culprit is osd_lock.
> > 
> > The OSD tick timer needs to acquire the osd_lock mutex to
> > wake up, right? So if osd_lock is ever held for an excessive 
> > time, the tick will be delayed.
> > 
> > I used debug lockdep = 20 on my osds, and saw this via
> > egrep "10:07:(2[2-9]|3[0-3])" osd.70.log | egrep "tick|osd_lock" | grep -v "tick .*"
> > 
> > 2011-02-23 10:07:23.454954 0x7f51a17fa940 lockdep: _will_lock OSD::osd_lock (9)
> > 2011-02-23 10:07:23.455118 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
> > 2011-02-23 10:07:23.455254 0x7f51a37fe940 lockdep: _will_lock OSD::osd_lock (9)
> > 2011-02-23 10:07:23.455262 0x7f51a37fe940 lockdep: _locked OSD::osd_lock
> > 2011-02-23 10:07:33.008092 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
> > 2011-02-23 10:07:33.009135 0x7f51a17fa940 lockdep: _locked OSD::osd_lock
> > 2011-02-23 10:07:33.009217 0x7f51a17fa940 lockdep: _will_unlock OSD::osd_lock
> > 2011-02-23 10:07:33.009307 0x7f51ad89d940 osd70 149 tick
> > 
> > I've attached a full log of the 10 second period that osd_lock
> > was held.
> > 
> > Is it possible to separate out the parts of tick() that
> > need to covered by osd_lock and do them somewhere else, 
> > so that a new lock, say tick_lock, can be used to
> > protect the tick timer?
> > 
> > -- Jim
> > 
> 
> tick() isn't involved in heartbeats, actually, so any blocks there 
> should be unrelated to heartbeat bugs (and once you're in tick() or 
> functions it calls, there's no losing the heartbeat!).

Right.  I'm not sure tick is related to the problem?

> I have managed to get OSDs wrongly marking each other down during 
> startup when they're peering large numbers of PGs/pools, as they 
> disagree on who they need to be heartbeating (due to the slow handling 
> of new osd maps and pg creates); if you're mostly seeing OSDs get 
> incorrectly marked down during low epochs (your original email said 
> epoch 7) this is probably what you're finding.

FWIW, this isn't supposed to happen either.. the implementation may be 
broken somewhat.  The idea is that once an OSD starts to expect a 
heartbeat it should tell them so.  And if an OSD is told that a future 
epoch says it should send heartbeats to node foo, then it will do so, at 
least until it processes that epoch.

If you have logs reproducing this, let's take a closer look so we can fix 
it.  The implementation has probably drifted from the original 
design/intent... :)

> We still have no idea what could be causing the stall *inside* of 
> tick(), though. :/

You mean heartbeat(), right?  Yep, still no clue...  :(

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-23 18:54                 ` Sage Weil
@ 2011-02-23 19:12                   ` Gregory Farnum
  0 siblings, 0 replies; 94+ messages in thread
From: Gregory Farnum @ 2011-02-23 19:12 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jim Schutt, ceph-devel


On Wednesday, February 23, 2011 at 10:54 AM, Sage Weil wrote: 
> On Wed, 23 Feb 2011, Gregory Farnum wrote:
> > I have managed to get OSDs wrongly marking each other down during 
> > startup when they're peering large numbers of PGs/pools, as they 
> > disagree on who they need to be heartbeating (due to the slow handling 
> > of new osd maps and pg creates); if you're mostly seeing OSDs get 
> > incorrectly marked down during low epochs (your original email said 
> > epoch 7) this is probably what you're finding.
> 
> FWIW, this isn't supposed to happen either.. the implementation may be 
> broken somewhat. The idea is that once an OSD starts to expect a 
> heartbeat it should tell them so. And if an OSD is told that a future 
> epoch says it should send heartbeats to node foo, then it will do so, at 
> least until it processes that epoch.
Hmmm -- I don't think they're telling the other OSDs that they're heartbeat partners! At least I didn't see anything that would make that happen. They just start expecting pings, and in some cases they will start sending them because they notice they're a local replica too, but there's nothing in those messages like "you owe me pings as of epoch x".
Are there stubs you know of that I should look at in re-implementing this behavior?

> > We still have no idea what could be causing the stall *inside* of 
> > tick(), though. :/
> 
> You mean heartbeat(), right? Yep, still no clue... :(
> 
Well the 28-second stall is inside of tick() as it arms a timer for the next tick. Heartbeat is definitely failing but nobody's quite sure why, as I recall. 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-23 18:12               ` Gregory Farnum
  2011-02-23 18:54                 ` Sage Weil
@ 2011-02-23 19:23                 ` Jim Schutt
  2011-02-23 20:27                   ` Gregory Farnum
  2011-03-02  0:53                   ` Sage Weil
  1 sibling, 2 replies; 94+ messages in thread
From: Jim Schutt @ 2011-02-23 19:23 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel


On Wed, 2011-02-23 at 11:12 -0700, Gregory Farnum wrote:
> On Wednesday, February 23, 2011 at 9:52 AM, Jim Schutt wrote: 
> > 
> > On Thu, 2011-02-17 at 09:11 -0700, Sage Weil wrote:
> > > On Thu, 17 Feb 2011, Jim Schutt wrote:
> > > > Hi Sage,
> > > > 
> > > > On Wed, 2011-02-16 at 17:54 -0700, Sage Weil wrote:
> > > > > On Wed, 16 Feb 2011, Sage Weil wrote:
> > > > > > shouldn't affect anything. We may have missed something.. do you have a 
> > > > > > log showing this in action?
> > > > > 
> > > > > Obviously yes, looking at your original email. :) At the beginning of 
> > > > > each log line we include a thread id. What would be really helpful would 
> > > > > be to narrow down where in OSD::heartbeat_entry() and heartbeat() things 
> > > > > are blocking, either based on the existing output, or by adding additional 
> > > > > dout lines at interesting points in time.
> > > > 
> > > > I'll take a deeper look at my existing logs with
> > > > that in mind; let me know if you'd like me to
> > > > send you some.
> > > > 
> > > > I have also been looking at map_lock, as it seems
> > > > to be shared between the heartbeat and map update
> > > > threads.
> > > > 
> > > > Would instrumenting acquiring/releasing that lock
> > > > be helpful? Is there some other lock that may
> > > > be more fruitful to instrument? I can reproduce 
> > > > pretty reliably, so adding instrumentation is 
> > > > no problem.
> > > 
> > > The heartbeat thread is doing a map_lock.try_get_read() because it 
> > > frequently is held by another thread, so that shouldn't ever block. 
> > > 
> > > The possibilities I see are:
> > >  - peer_stat_lock
> > >  - the monc->sub_want / renew_subs calls (monc has an internal lock), 
> > > although that code should only trigger with a single osd. :/
> > >  - heartbeat_lock itself could be held by another thread; i'd instrument 
> > > all locks/unlocks there, along with the wakeup in heartbeat().
> > 
> > I think the culprit is osd_lock.
> > 
> > The OSD tick timer needs to acquire the osd_lock mutex to
> > wake up, right? So if osd_lock is ever held for an excessive 
> > time, the tick will be delayed.
> > 
> > I used debug lockdep = 20 on my osds, and saw this via
> > egrep "10:07:(2[2-9]|3[0-3])" osd.70.log | egrep "tick|osd_lock" | grep -v "tick .*"
> > 
> > 2011-02-23 10:07:23.454954 0x7f51a17fa940 lockdep: _will_lock OSD::osd_lock (9)
> > 2011-02-23 10:07:23.455118 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
> > 2011-02-23 10:07:23.455254 0x7f51a37fe940 lockdep: _will_lock OSD::osd_lock (9)
> > 2011-02-23 10:07:23.455262 0x7f51a37fe940 lockdep: _locked OSD::osd_lock
> > 2011-02-23 10:07:33.008092 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
> > 2011-02-23 10:07:33.009135 0x7f51a17fa940 lockdep: _locked OSD::osd_lock
> > 2011-02-23 10:07:33.009217 0x7f51a17fa940 lockdep: _will_unlock OSD::osd_lock
> > 2011-02-23 10:07:33.009307 0x7f51ad89d940 osd70 149 tick
> > 
> > I've attached a full log of the 10 second period that osd_lock
> > was held.
> > 
> > Is it possible to separate out the parts of tick() that
> > need to covered by osd_lock and do them somewhere else, 
> > so that a new lock, say tick_lock, can be used to
> > protect the tick timer?
> > 
> > -- Jim
> > 
> 
> tick() isn't involved in heartbeats, actually, so any blocks there should be unrelated to heartbeat bugs (and once you're in tick() or functions it calls, there's no losing the heartbeat!). 

I guess I was confused by tick() calling heartbeat_check().
In all the logs I collect I find the trouble spots by
looking for when tick() delayed.

> I have managed to get OSDs wrongly marking each other down during startup when they're peering large numbers of PGs/pools, as they disagree on who they need to be heartbeating (due to the slow handling of new osd maps and pg creates); if you're mostly seeing OSDs get incorrectly marked down during low epochs (your original email said epoch 7) this is probably what you're finding. 
> 

What I've been trying to look for is heartbeat stalls after I 
start up a bunch of clients writing.  I'm really not sure why that
original log caught one at such an early epoch - maybe there's
two things going on?

> We still have no idea what could be causing the stall *inside* of tick(), though. :/

I think that one was just lucky.  Most of the stalls I've
collected are between ticks.

-- Jim

> -Greg
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-23 19:23                 ` Jim Schutt
@ 2011-02-23 20:27                   ` Gregory Farnum
  2011-03-02  0:53                   ` Sage Weil
  1 sibling, 0 replies; 94+ messages in thread
From: Gregory Farnum @ 2011-02-23 20:27 UTC (permalink / raw)
  To: Jim Schutt; +Cc: ceph-devel

On Wednesday, February 23, 2011 at 11:23 AM, Jim Schutt wrote:
> > I have managed to get OSDs wrongly marking each other down during startup when they're peering large numbers of PGs/pools, as they disagree on who they need to be heartbeating (due to the slow handling of new osd maps and pg creates); if you're mostly seeing OSDs get incorrectly marked down during low epochs (your original email said epoch 7) this is probably what you're finding. 
> 
> What I've been trying to look for is heartbeat stalls after I 
> start up a bunch of clients writing. I'm really not sure why that
> original log caught one at such an early epoch - maybe there's
> two things going on?
> 
That wouldn't surprise me too much, but is something to keep in mind when observing. :)

> > We still have no idea what could be causing the stall *inside* of tick(), though. :/
> 
> I think that one was just lucky. Most of the stalls I've
> collected are between ticks.
Stalls between ticks make a lot of sense, since tick requires the osd_lock and we have some functions holding it for way too long, but as far as we can tell a stalled tick() function shouldn't break anything -- heartbeats are sent independently, and all the processing of heartbeats (where you detect down OSDs) is done inside of tick in such a way that it's not going to lose delivery of heartbeats -- that shouldn't be a problem!




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-23 19:23                 ` Jim Schutt
  2011-02-23 20:27                   ` Gregory Farnum
@ 2011-03-02  0:53                   ` Sage Weil
  2011-03-02 15:21                     ` Jim Schutt
  1 sibling, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-02  0:53 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

Hi Jim,

We've fixed a few different bugs over the last week that were causing 
heartbeat issues.  Nothing that explains why we would see the hang that 
you did, but other problems that caused the same 'wrongly marked me down' 
issue.  Are you still seeing this problem with the latest 'next' and/or 
'master' branch?

Also, if you don't mind reproducing, can you post a larger segment of the 
log?  The really interesting question is what the heartbeat thread 
(heartbeat_entry()) is doing during this period that tick() is blocked up, 
since that's the thread that's responsible for sending the ping messages 
to peer OSDs.

Thanks!
sage



On Wed, 23 Feb 2011, Jim Schutt wrote:

> 
> On Wed, 2011-02-23 at 11:12 -0700, Gregory Farnum wrote:
> > On Wednesday, February 23, 2011 at 9:52 AM, Jim Schutt wrote: 
> > > 
> > > On Thu, 2011-02-17 at 09:11 -0700, Sage Weil wrote:
> > > > On Thu, 17 Feb 2011, Jim Schutt wrote:
> > > > > Hi Sage,
> > > > > 
> > > > > On Wed, 2011-02-16 at 17:54 -0700, Sage Weil wrote:
> > > > > > On Wed, 16 Feb 2011, Sage Weil wrote:
> > > > > > > shouldn't affect anything. We may have missed something.. do you have a 
> > > > > > > log showing this in action?
> > > > > > 
> > > > > > Obviously yes, looking at your original email. :) At the beginning of 
> > > > > > each log line we include a thread id. What would be really helpful would 
> > > > > > be to narrow down where in OSD::heartbeat_entry() and heartbeat() things 
> > > > > > are blocking, either based on the existing output, or by adding additional 
> > > > > > dout lines at interesting points in time.
> > > > > 
> > > > > I'll take a deeper look at my existing logs with
> > > > > that in mind; let me know if you'd like me to
> > > > > send you some.
> > > > > 
> > > > > I have also been looking at map_lock, as it seems
> > > > > to be shared between the heartbeat and map update
> > > > > threads.
> > > > > 
> > > > > Would instrumenting acquiring/releasing that lock
> > > > > be helpful? Is there some other lock that may
> > > > > be more fruitful to instrument? I can reproduce 
> > > > > pretty reliably, so adding instrumentation is 
> > > > > no problem.
> > > > 
> > > > The heartbeat thread is doing a map_lock.try_get_read() because it 
> > > > frequently is held by another thread, so that shouldn't ever block. 
> > > > 
> > > > The possibilities I see are:
> > > >  - peer_stat_lock
> > > >  - the monc->sub_want / renew_subs calls (monc has an internal lock), 
> > > > although that code should only trigger with a single osd. :/
> > > >  - heartbeat_lock itself could be held by another thread; i'd instrument 
> > > > all locks/unlocks there, along with the wakeup in heartbeat().
> > > 
> > > I think the culprit is osd_lock.
> > > 
> > > The OSD tick timer needs to acquire the osd_lock mutex to
> > > wake up, right? So if osd_lock is ever held for an excessive 
> > > time, the tick will be delayed.
> > > 
> > > I used debug lockdep = 20 on my osds, and saw this via
> > > egrep "10:07:(2[2-9]|3[0-3])" osd.70.log | egrep "tick|osd_lock" | grep -v "tick .*"
> > > 
> > > 2011-02-23 10:07:23.454954 0x7f51a17fa940 lockdep: _will_lock OSD::osd_lock (9)
> > > 2011-02-23 10:07:23.455118 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
> > > 2011-02-23 10:07:23.455254 0x7f51a37fe940 lockdep: _will_lock OSD::osd_lock (9)
> > > 2011-02-23 10:07:23.455262 0x7f51a37fe940 lockdep: _locked OSD::osd_lock
> > > 2011-02-23 10:07:33.008092 0x7f51a37fe940 lockdep: _will_unlock OSD::osd_lock
> > > 2011-02-23 10:07:33.009135 0x7f51a17fa940 lockdep: _locked OSD::osd_lock
> > > 2011-02-23 10:07:33.009217 0x7f51a17fa940 lockdep: _will_unlock OSD::osd_lock
> > > 2011-02-23 10:07:33.009307 0x7f51ad89d940 osd70 149 tick
> > > 
> > > I've attached a full log of the 10 second period that osd_lock
> > > was held.
> > > 
> > > Is it possible to separate out the parts of tick() that
> > > need to covered by osd_lock and do them somewhere else, 
> > > so that a new lock, say tick_lock, can be used to
> > > protect the tick timer?
> > > 
> > > -- Jim
> > > 
> > 
> > tick() isn't involved in heartbeats, actually, so any blocks there should be unrelated to heartbeat bugs (and once you're in tick() or functions it calls, there's no losing the heartbeat!). 
> 
> I guess I was confused by tick() calling heartbeat_check().
> In all the logs I collect I find the trouble spots by
> looking for when tick() delayed.
> 
> > I have managed to get OSDs wrongly marking each other down during startup when they're peering large numbers of PGs/pools, as they disagree on who they need to be heartbeating (due to the slow handling of new osd maps and pg creates); if you're mostly seeing OSDs get incorrectly marked down during low epochs (your original email said epoch 7) this is probably what you're finding. 
> > 
> 
> What I've been trying to look for is heartbeat stalls after I 
> start up a bunch of clients writing.  I'm really not sure why that
> original log caught one at such an early epoch - maybe there's
> two things going on?
> 
> > We still have no idea what could be causing the stall *inside* of tick(), though. :/
> 
> I think that one was just lucky.  Most of the stalls I've
> collected are between ticks.
> 
> -- Jim
> 
> > -Greg
> > 
> > 
> > 
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-02  0:53                   ` Sage Weil
@ 2011-03-02 15:21                     ` Jim Schutt
  2011-03-02 17:10                       ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-02 15:21 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Tue, 2011-03-01 at 17:53 -0700, Sage Weil wrote:
> Hi Jim,
> 
> We've fixed a few different bugs over the last week that were causing 
> heartbeat issues. 

Great!

>  Nothing that explains why we would see the hang that 
> you did, but other problems that caused the same 'wrongly marked me down' 
> issue.  Are you still seeing this problem with the latest 'next' and/or 
> 'master' branch?

I've been trying to isolate this on the stable branch
since my last posting - I can still reproduce at will
with my 96 osd test, but I haven't made much progress
at tracking down what is going wrong.

> 
> Also, if you don't mind reproducing, can you post a larger segment of the 
> log? 

Sure.  I've got some extra debug printing going in
my tree - the most interesting is a patch to log
queue, operation, and total elapsed times in
dispatch_entry() - it makes is really easy to
find when things go wrong.

I'll try to reproduce with master and post logs.
Is it OK for me to add my extra debug patches for
that?  I'll post them with the logs if so.

>  The really interesting question is what the heartbeat thread 
> (heartbeat_entry()) is doing during this period that tick() is blocked up, 
> since that's the thread that's responsible for sending the ping messages 
> to peer OSDs.

One of the things I am seeing is handle_osd_ping()
getting stalled, but I haven't been able to track
down why.

I'll see if I see the same signature with master,
and post logs.

-- Jim

> 
> Thanks!
> sage
> 
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-02 15:21                     ` Jim Schutt
@ 2011-03-02 17:10                       ` Sage Weil
  2011-03-02 20:54                         ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-02 17:10 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Wed, 2 Mar 2011, Jim Schutt wrote:
> On Tue, 2011-03-01 at 17:53 -0700, Sage Weil wrote:
> > Hi Jim,
> > 
> > We've fixed a few different bugs over the last week that were causing 
> > heartbeat issues. 
> 
> Great!
> 
> >  Nothing that explains why we would see the hang that 
> > you did, but other problems that caused the same 'wrongly marked me down' 
> > issue.  Are you still seeing this problem with the latest 'next' and/or 
> > 'master' branch?
> 
> I've been trying to isolate this on the stable branch
> since my last posting - I can still reproduce at will
> with my 96 osd test, but I haven't made much progress
> at tracking down what is going wrong.
> 
> > 
> > Also, if you don't mind reproducing, can you post a larger segment of the 
> > log? 
> 
> Sure.  I've got some extra debug printing going in
> my tree - the most interesting is a patch to log
> queue, operation, and total elapsed times in
> dispatch_entry() - it makes is really easy to
> find when things go wrong.
>
> I'll try to reproduce with master and post logs.
> Is it OK for me to add my extra debug patches for
> that?  I'll post them with the logs if so.

Absolutely.

> >  The really interesting question is what the heartbeat thread 
> > (heartbeat_entry()) is doing during this period that tick() is blocked up, 
> > since that's the thread that's responsible for sending the ping messages 
> > to peer OSDs.
> 
> One of the things I am seeing is handle_osd_ping()
> getting stalled, but I haven't been able to track
> down why.
> 
> I'll see if I see the same signature with master,
> and post logs.

Thanks!  Keep us posted.
sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-02 17:10                       ` Sage Weil
@ 2011-03-02 20:54                         ` Jim Schutt
  2011-03-02 21:45                           ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-02 20:54 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Wed, 2011-03-02 at 10:10 -0700, Sage Weil wrote:
> > I'll see if I see the same signature with master,
> > and post logs.
> 
> Thanks!  Keep us posted.

Hmmm, I'm not having much luck with master (commit 
0fb5ef2ce92 + extra debugging) on a 96-osd filesystem;
lots of dead OSDs during startup.

I used to use a global chdir option to redirect
my core files; my servers are readonly NFS-root, 
with /root on a ramdisk, so being able to point
those core files at persistent storage was useful.
That seems to have changed somewhat.

The backtraces all seem to look like this:

(gdb) bt
#0  0x00007f5d496d79dd in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
#1  0x000000000089bd67 in handle_fatal_signal (signum=6) at common/signal.cc:78
#2  <signal handler called>
#3  0x00007f5d48542265 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#4  0x00007f5d48543d10 in abort () at abort.c:88
#5  0x00007f5d48db8cb4 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#6  0x00007f5d48db6db6 in ?? () from /usr/lib64/libstdc++.so.6
#7  0x00007f5d48db6de3 in std::terminate() () from /usr/lib64/libstdc++.so.6
#8  0x00007f5d48db6eca in __cxa_throw () from /usr/lib64/libstdc++.so.6
#9  0x000000000066b1ac in ceph::buffer::list::iterator::copy (this=0x7f5d3f4fde40, len=2, dest=0x7f5d3f4fd77e "") at ./include/buffer.h:637
#10 0x00000000006d998c in decode_raw<__le16> (t=@0x7f5d3f4fd77e, p=...) at ./include/encoding.h:35
#11 0x0000000000769a27 in decode (v=@0x7f5d3f4fd7ae, p=...) at ./include/encoding.h:82
#12 0x000000000079c857 in OSDMap::Incremental::decode (this=0x7f5d3f4fd8a0, p=...) at osd/OSDMap.h:204
#13 0x0000000000731803 in OSD::handle_osd_map (this=0x2266af0, m=0x24b2c30) at osd/OSD.cc:2859
#14 0x0000000000733198 in OSD::_dispatch (this=0x2266af0, m=0x24b2c30) at osd/OSD.cc:2428
#15 0x00000000007344e0 in OSD::ms_dispatch (this=0x2266af0, m=0x24b2c30) at osd/OSD.cc:2301
#16 0x000000000068ab9d in Messenger::ms_deliver_dispatch (this=0x220aa30, m=0x24b2c30) at msg/Messenger.h:97
#17 0x0000000000677894 in SimpleMessenger::dispatch_entry (this=0x220aa30) at msg/SimpleMessenger.cc:357
#18 0x000000000066eb2d in SimpleMessenger::DispatchThread::entry (this=0x220aed0) at msg/SimpleMessenger.h:534
#19 0x000000000068b258 in Thread::_entry_func (arg=0x220aed0) at ./common/Thread.h:47
#20 0x00007f5d496cf73d in start_thread (arg=<value optimized out>) at pthread_create.c:301
#21 0x00007f5d485e5f6d in clone () from /lib64/libc.so.6

-- Jim

> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-02 20:54                         ` Jim Schutt
@ 2011-03-02 21:45                           ` Sage Weil
  2011-03-02 21:59                             ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-02 21:45 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Wed, 2 Mar 2011, Jim Schutt wrote:
> 
> On Wed, 2011-03-02 at 10:10 -0700, Sage Weil wrote:
> > > I'll see if I see the same signature with master,
> > > and post logs.
> > 
> > Thanks!  Keep us posted.
> 
> Hmmm, I'm not having much luck with master (commit 
> 0fb5ef2ce92 + extra debugging) on a 96-osd filesystem;
> lots of dead OSDs during startup.

Commit c916905a8a14029653aae45f0a9fb6c9b4c39e05 (master) should fix this.

> I used to use a global chdir option to redirect
> my core files; my servers are readonly NFS-root, 
> with /root on a ramdisk, so being able to point
> those core files at persistent storage was useful.
> That seems to have changed somewhat.

Hmm, I don't think this behavior should have changed.  Can you look at a 
running daemon's /proc/$pid/cwd and see if it's incorrect?

Thanks-
sage


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-02 21:45                           ` Sage Weil
@ 2011-03-02 21:59                             ` Jim Schutt
  2011-03-02 22:57                               ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-02 21:59 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Wed, 2011-03-02 at 14:45 -0700, Sage Weil wrote:
> On Wed, 2 Mar 2011, Jim Schutt wrote:
> > 
> > On Wed, 2011-03-02 at 10:10 -0700, Sage Weil wrote:
> > > > I'll see if I see the same signature with master,
> > > > and post logs.
> > > 
> > > Thanks!  Keep us posted.
> > 
> > Hmmm, I'm not having much luck with master (commit 
> > 0fb5ef2ce92 + extra debugging) on a 96-osd filesystem;
> > lots of dead OSDs during startup.
> 
> Commit c916905a8a14029653aae45f0a9fb6c9b4c39e05 (master) should fix this.

I try it out, thanks!

> 
> > I used to use a global chdir option to redirect
> > my core files; my servers are readonly NFS-root, 
> > with /root on a ramdisk, so being able to point
> > those core files at persistent storage was useful.
> > That seems to have changed somewhat.
> 
> Hmm, I don't think this behavior should have changed.  Can you look at a 
> running daemon's /proc/$pid/cwd and see if it's incorrect?

# cat /proc/13120/cmdline
/usr/bin/cosd-i0-c/etc/ceph/ceph.conf

# ls -l /proc/13120/cwd
lrwxrwxrwx 1 root root 0 Mar  2 14:49 /proc/13120/cwd -> /ram/root

# head -3 /etc/ceph/ceph.conf 

[global]
        chdir = /var/log/ceph ; Core files end up here

# ls /var/log/ceph
lost+found  osd.0.log  osd.1.log  osd.2.log  osd.3.log  osd.4.log  osd.5.log  osd.6.log  osd.7.log

# df -h /var/log/ceph
Filesystem            Size  Used Avail Use% Mounted on
/dev/disk/by-path/scsi-0:2:00:00p1
                       33G  2.0G   31G   6% /ram/var/log/ceph

I'm not sure either what could have happened.  I'll 
try to figure out if I'm doing something differently...

Thanks -- Jim

> 
> Thanks-
> sage
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-02 21:59                             ` Jim Schutt
@ 2011-03-02 22:57                               ` Jim Schutt
  2011-03-02 23:20                                 ` Gregory Farnum
                                                   ` (2 more replies)
  0 siblings, 3 replies; 94+ messages in thread
From: Jim Schutt @ 2011-03-02 22:57 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Wed, 2011-03-02 at 14:59 -0700, Jim Schutt wrote:
> 
> On Wed, 2011-03-02 at 14:45 -0700, Sage Weil wrote:
> > On Wed, 2 Mar 2011, Jim Schutt wrote:
> > > 
> > > On Wed, 2011-03-02 at 10:10 -0700, Sage Weil wrote:
> > > > > I'll see if I see the same signature with master,
> > > > > and post logs.
> > > > 
> > > > Thanks!  Keep us posted.
> > > 
> > > Hmmm, I'm not having much luck with master (commit 
> > > 0fb5ef2ce92 + extra debugging) on a 96-osd filesystem;
> > > lots of dead OSDs during startup.
> > 
> > Commit c916905a8a14029653aae45f0a9fb6c9b4c39e05 (master) should fix this.
> 
> I try it out, thanks!

I don't get any more core files with master commit 67355779ecc.
Now my cosds just die - no stack trace in the log, no core
file, nothing in syslog or dmesg ...

I'm not sure how to track down what's happening here...

-- Jim




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-02 22:57                               ` Jim Schutt
@ 2011-03-02 23:20                                 ` Gregory Farnum
  2011-03-02 23:25                                   ` Jim Schutt
  2011-03-03  2:26                                 ` Colin McCabe
  2011-03-03  5:03                                 ` Sage Weil
  2 siblings, 1 reply; 94+ messages in thread
From: Gregory Farnum @ 2011-03-02 23:20 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Sage Weil, ceph-devel

On Wed, Mar 2, 2011 at 2:57 PM, Jim Schutt <jaschut@sandia.gov> wrote:
>
> On Wed, 2011-03-02 at 14:59 -0700, Jim Schutt wrote:
>>
>> On Wed, 2011-03-02 at 14:45 -0700, Sage Weil wrote:
>> > On Wed, 2 Mar 2011, Jim Schutt wrote:
>> > >
>> > > On Wed, 2011-03-02 at 10:10 -0700, Sage Weil wrote:
>> > > > > I'll see if I see the same signature with master,
>> > > > > and post logs.
>> > > >
>> > > > Thanks!  Keep us posted.
>> > >
>> > > Hmmm, I'm not having much luck with master (commit
>> > > 0fb5ef2ce92 + extra debugging) on a 96-osd filesystem;
>> > > lots of dead OSDs during startup.
>> >
>> > Commit c916905a8a14029653aae45f0a9fb6c9b4c39e05 (master) should fix this.
>>
>> I try it out, thanks!
>
> I don't get any more core files with master commit 67355779ecc.
> Now my cosds just die - no stack trace in the log, no core
> file, nothing in syslog or dmesg ...
Another commit got in that changed the logging behavior slightly --
which log file are you opening?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-02 23:20                                 ` Gregory Farnum
@ 2011-03-02 23:25                                   ` Jim Schutt
  2011-03-02 23:33                                     ` Gregory Farnum
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-02 23:25 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, ceph-devel


On Wed, 2011-03-02 at 16:20 -0700, Gregory Farnum wrote:
> On Wed, Mar 2, 2011 at 2:57 PM, Jim Schutt <jaschut@sandia.gov> wrote:
> >
> > On Wed, 2011-03-02 at 14:59 -0700, Jim Schutt wrote:
> >>
> >> On Wed, 2011-03-02 at 14:45 -0700, Sage Weil wrote:
> >> > On Wed, 2 Mar 2011, Jim Schutt wrote:
> >> > >
> >> > > On Wed, 2011-03-02 at 10:10 -0700, Sage Weil wrote:
> >> > > > > I'll see if I see the same signature with master,
> >> > > > > and post logs.
> >> > > >
> >> > > > Thanks!  Keep us posted.
> >> > >
> >> > > Hmmm, I'm not having much luck with master (commit
> >> > > 0fb5ef2ce92 + extra debugging) on a 96-osd filesystem;
> >> > > lots of dead OSDs during startup.
> >> >
> >> > Commit c916905a8a14029653aae45f0a9fb6c9b4c39e05 (master) should fix this.
> >>
> >> I try it out, thanks!
> >
> > I don't get any more core files with master commit 67355779ecc.
> > Now my cosds just die - no stack trace in the log, no core
> > file, nothing in syslog or dmesg ...
> Another commit got in that changed the logging behavior slightly --
> which log file are you opening?
> 

Well, I don't have any specific logging config.
So my logs would show up in /var/log/ceph, and
they still seem to be there.  They contain logging
info, just not a stack trace that might explain
why the cosd died.

-- Jim




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-02 23:25                                   ` Jim Schutt
@ 2011-03-02 23:33                                     ` Gregory Farnum
  0 siblings, 0 replies; 94+ messages in thread
From: Gregory Farnum @ 2011-03-02 23:33 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Sage Weil, ceph-devel

On Wednesday, March 2, 2011 at 3:25 PM, Jim Schutt wrote:

> On Wed, 2011-03-02 at 16:20 -0700, Gregory Farnum wrote:
> > On Wed, Mar 2, 2011 at 2:57 PM, Jim Schutt <jaschut@sandia.gov> wrote:
> > > 
> > > On Wed, 2011-03-02 at 14:59 -0700, Jim Schutt wrote:
> > > > 
> > > > On Wed, 2011-03-02 at 14:45 -0700, Sage Weil wrote:
> > > > > On Wed, 2 Mar 2011, Jim Schutt wrote:
> > > > > > 
> > > > > > On Wed, 2011-03-02 at 10:10 -0700, Sage Weil wrote:
> > > > > > > > I'll see if I see the same signature with master,
> > > > > > > > and post logs.
> > > > > > > 
> > > > > > > Thanks! Keep us posted.
> > > > > > 
> > > > > > Hmmm, I'm not having much luck with master (commit
> > > > > > 0fb5ef2ce92 + extra debugging) on a 96-osd filesystem;
> > > > > > lots of dead OSDs during startup.
> > > > > 
> > > > > Commit c916905a8a14029653aae45f0a9fb6c9b4c39e05 (master) should fix this.
> > > > 
> > > > I try it out, thanks!
> > > 
> > > I don't get any more core files with master commit 67355779ecc.
> > > Now my cosds just die - no stack trace in the log, no core
> > > file, nothing in syslog or dmesg ...
> > Another commit got in that changed the logging behavior slightly --
> > which log file are you opening?
> 
> Well, I don't have any specific logging config.
> So my logs would show up in /var/log/ceph, and
> they still seem to be there. They contain logging
> info, just not a stack trace that might explain
> why the cosd died.
> 
> -- Jim
>  What's the end of your log look like? We can at least look at what it was doing. :) 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-02 22:57                               ` Jim Schutt
  2011-03-02 23:20                                 ` Gregory Farnum
@ 2011-03-03  2:26                                 ` Colin McCabe
  2011-03-03 20:03                                   ` Jim Schutt
  2011-03-03  5:03                                 ` Sage Weil
  2 siblings, 1 reply; 94+ messages in thread
From: Colin McCabe @ 2011-03-03  2:26 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Sage Weil, Gregory Farnum, ceph-devel

Hi Jim,

We have seen this problem before. The usual suspects are the oom
killer (grep for "out of memory" in syslog).
Unfortunately, SIGKILL is uncatchable and that's what the OOM killer sends.

Another problem that can prevent core files from being generated is
bad ulimit -c settings or a bad setting for core_pattern and friends.
One problem I have a lot too is that the partition I'm writing core
files to fills up.

If none of that works, it's possible that someone is calling exit()
somewhere. You can attach a gdb to the process and put a breakpoint on
exit() to see if this is going on. There's a lot of "your foo is not
bar enough, I hate your config, exit(1)" type code that gets executed
while the daemon is starting up. It sounds like you should be past
that point, though.

Colin


On Wed, Mar 2, 2011 at 2:57 PM, Jim Schutt <jaschut@sandia.gov> wrote:
>
> On Wed, 2011-03-02 at 14:59 -0700, Jim Schutt wrote:
>>
>> On Wed, 2011-03-02 at 14:45 -0700, Sage Weil wrote:
>> > On Wed, 2 Mar 2011, Jim Schutt wrote:
>> > >
>> > > On Wed, 2011-03-02 at 10:10 -0700, Sage Weil wrote:
>> > > > > I'll see if I see the same signature with master,
>> > > > > and post logs.
>> > > >
>> > > > Thanks!  Keep us posted.
>> > >
>> > > Hmmm, I'm not having much luck with master (commit
>> > > 0fb5ef2ce92 + extra debugging) on a 96-osd filesystem;
>> > > lots of dead OSDs during startup.
>> >
>> > Commit c916905a8a14029653aae45f0a9fb6c9b4c39e05 (master) should fix this.
>>
>> I try it out, thanks!
>
> I don't get any more core files with master commit 67355779ecc.
> Now my cosds just die - no stack trace in the log, no core
> file, nothing in syslog or dmesg ...
>
> I'm not sure how to track down what's happening here...
>
> -- Jim
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-02 22:57                               ` Jim Schutt
  2011-03-02 23:20                                 ` Gregory Farnum
  2011-03-03  2:26                                 ` Colin McCabe
@ 2011-03-03  5:03                                 ` Sage Weil
  2011-03-03 16:35                                   ` Jim Schutt
  2011-03-03 17:28                                   ` Jim Schutt
  2 siblings, 2 replies; 94+ messages in thread
From: Sage Weil @ 2011-03-03  5:03 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

Hi Jim,

On Wed, 2 Mar 2011, Jim Schutt wrote:

> 
> On Wed, 2011-03-02 at 14:59 -0700, Jim Schutt wrote:
> > 
> > On Wed, 2011-03-02 at 14:45 -0700, Sage Weil wrote:
> > > On Wed, 2 Mar 2011, Jim Schutt wrote:
> > > > 
> > > > On Wed, 2011-03-02 at 10:10 -0700, Sage Weil wrote:
> > > > > > I'll see if I see the same signature with master,
> > > > > > and post logs.
> > > > > 
> > > > > Thanks!  Keep us posted.
> > > > 
> > > > Hmmm, I'm not having much luck with master (commit 
> > > > 0fb5ef2ce92 + extra debugging) on a 96-osd filesystem;
> > > > lots of dead OSDs during startup.
> > > 
> > > Commit c916905a8a14029653aae45f0a9fb6c9b4c39e05 (master) should fix this.
> > 
> > I try it out, thanks!
> 
> I don't get any more core files with master commit 67355779ecc.
> Now my cosds just die - no stack trace in the log, no core
> file, nothing in syslog or dmesg ...
> 
> I'm not sure how to track down what's happening here...

Hmm.  I'm not able to reproduce this here (tho I only have ~15 nodes 
available at the moment).  Seeing the last bit of the logs on the crashed 
nodes will help.

I pushed a fix for the chdir issue, though!

Thanks-
sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03  5:03                                 ` Sage Weil
@ 2011-03-03 16:35                                   ` Jim Schutt
  2011-03-03 17:28                                   ` Jim Schutt
  1 sibling, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-03-03 16:35 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Wed, 2011-03-02 at 22:03 -0700, Sage Weil wrote:
> Hi Jim,
> 
> On Wed, 2 Mar 2011, Jim Schutt wrote:
> 
> > 
> > On Wed, 2011-03-02 at 14:59 -0700, Jim Schutt wrote:
> > > 
> > > On Wed, 2011-03-02 at 14:45 -0700, Sage Weil wrote:
> > > > On Wed, 2 Mar 2011, Jim Schutt wrote:
> > > > > 
> > > > > On Wed, 2011-03-02 at 10:10 -0700, Sage Weil wrote:
> > > > > > > I'll see if I see the same signature with master,
> > > > > > > and post logs.
> > > > > > 
> > > > > > Thanks!  Keep us posted.
> > > > > 
> > > > > Hmmm, I'm not having much luck with master (commit 
> > > > > 0fb5ef2ce92 + extra debugging) on a 96-osd filesystem;
> > > > > lots of dead OSDs during startup.
> > > > 
> > > > Commit c916905a8a14029653aae45f0a9fb6c9b4c39e05 (master) should fix this.
> > > 
> > > I try it out, thanks!
> > 
> > I don't get any more core files with master commit 67355779ecc.
> > Now my cosds just die - no stack trace in the log, no core
> > file, nothing in syslog or dmesg ...
> > 
> > I'm not sure how to track down what's happening here...
> 
> Hmm.  I'm not able to reproduce this here (tho I only have ~15 nodes 
> available at the moment).  Seeing the last bit of the logs on the crashed 
> nodes will help.

I reproduced this morning using master branch commit 1a2e2a77f35c.
Still no core files that I can find.

Here's the last 50 lines of the log for some of the early 
cosd deaths.

--------
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.921947 7fa84cef0940 journal queue_completions_thru seq 23135 queueing seq 23131 0x7fa8414e8ff0
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.921965 7fa84cef0940 journal queue_completions_thru seq 23135 queueing seq 23132 0x7fa84152cf90
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.921978 7fa84cef0940 journal queue_completions_thru seq 23135 queueing seq 23133 0x7fa8411ee180
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.921988 7fa84cef0940 journal queue_completions_thru seq 23135 queueing seq 23134 0x7fa841594f90
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922000 7fa84cef0940 journal queue_completions_thru seq 23135 queueing seq 23135 0x7fa84158cf90
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922013 7fa84cef0940 journal write_thread throttle finished 18 ops and 8724 bytes, now 4 ops and 1944 bytes
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922036 7fa84cef0940 journal room 511668223 max_size 526385152 pos 192196608 header.start 177483776 top 4096
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922044 7fa84cef0940 journal check_for_full at 192196608 : 8192 < 511668223
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922051 7fa84cef0940 journal prepare_single_write 1 will write 192196608 : seq 23136 len 486 -> 8192 (head 40 pre_pad 4056 ebl 486 post_pad 3570 tail 40) (ebl alignment 0)
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922079 7fa84cef0940 journal room 511660031 max_size 526385152 pos 192204800 header.start 177483776 top 4096
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922094 7fa84cef0940 journal check_for_full at 192204800 : 8192 < 511660031
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922102 7fa84cef0940 journal prepare_single_write 2 will write 192204800 : seq 23137 len 486 -> 8192 (head 40 pre_pad 4056 ebl 486 post_pad 3570 tail 40) (ebl alignment 0)
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922112 7fa84cef0940 journal room 511651839 max_size 526385152 pos 192212992 header.start 177483776 top 4096
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922122 7fa84cef0940 journal check_for_full at 192212992 : 8192 < 511651839
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922129 7fa84cef0940 journal prepare_single_write 3 will write 192212992 : seq 23138 len 486 -> 8192 (head 40 pre_pad 4056 ebl 486 post_pad 3570 tail 40) (ebl alignment 0)
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922153 7fa84cef0940 journal room 511643647 max_size 526385152 pos 192221184 header.start 177483776 top 4096
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922160 7fa84cef0940 journal check_for_full at 192221184 : 8192 < 511643647
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922167 7fa84cef0940 journal prepare_single_write 4 will write 192221184 : seq 23139 len 486 -> 8192 (head 40 pre_pad 4056 ebl 486 post_pad 3570 tail 40) (ebl alignment 0)
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922176 7fa84cef0940 journal prepare_multi_write queue_pos now 192229376
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.922183 7fa84cef0940 journal do_write writing 192196608~32768
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930753 7fa8266e6940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).reader got MSG
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930778 7fa8266e6940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).reader got envelope type=70 src osd83 front=61 data=0 off 0
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930792 7fa8266e6940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).reader wants 61 from dispatch throttler 0/35000000
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930807 7fa8266e6940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).reader got front 61
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930821 7fa8266e6940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).aborted = 0
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930832 7fa8266e6940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930851 7fa8266e6940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).reader got message 175 0x33397c0 osd_ping(e7 as_of 7) v1
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930864 7fa8266e6940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).queue_received queuing pipe
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930879 7fa8266e6940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).reader reading tag...
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930899 7fa826ded940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930921 7fa826ded940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).write_ack 175
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930938 7fa826ded940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930948 7fa826ded940 -- 172.17.40.23:6817/24736 >> 172.17.40.32:6811/21905 pipe(0x7fa840e38100 sd=149 pgs=89 cs=1 l=0).writer sleeping
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930969 7fa8476e5940 -- 172.17.40.23:6817/24736 dispatch_entry pipe 0x7fa840e38100 dequeued 0x33397c0
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930983 7fa8476e5940 -- 172.17.40.23:6817/24736 <== osd83 172.17.40.32:6811/21905 175 ==== osd_ping(e7 as_of 7) v1 ==== 61+0+0 (1790669248 0 0) 0x33397c0 con 0x7fa840297d50
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.930991 7fa8476e5940 osd21 7 heartbeat_dispatch 0x33397c0
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.931005 7fa8476e5940 osd21 7 handle_osd_ping from osd83 got stat stat(2011-03-03 08:35:29.927476 oprate=0.135292 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.931020 7fa8476e5940 osd21 7 _share_map_incoming osd83 172.17.40.32:6811/21905 7
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.931036 7fa8476e5940 osd21 7 take_peer_stat peer osd83 stat(2011-03-03 08:35:29.927476 oprate=0.135292 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.931052 7fa8476e5940 -- 172.17.40.23:6817/24736 dispatch_throttle_release 61 to dispatch throttler 61/35000000
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.931063 7fa8476e5940 -- 172.17.40.23:6817/24736 done calling dispatch on 0x33397c0
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.934258 7fa8466e3940 -- 172.17.40.23:6815/24736 >> 172.17.40.34:6789/0 pipe(0x7fa8400013a0 sd=13 pgs=2573 cs=1 l=1).reader couldn't read tag, Success
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.934277 7fa8466e3940 -- 172.17.40.23:6815/24736 >> 172.17.40.34:6789/0 pipe(0x7fa8400013a0 sd=13 pgs=2573 cs=1 l=1).fault 0: Success
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.934319 7fa8466e3940 -- 172.17.40.23:6815/24736 >> 172.17.40.34:6789/0 pipe(0x7fa8400013a0 sd=13 pgs=2573 cs=1 l=1).fault on lossy channel, failing
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.934335 7fa8466e3940 -- 172.17.40.23:6815/24736 >> 172.17.40.34:6789/0 pipe(0x7fa8400013a0 sd=13 pgs=2573 cs=1 l=1).fail
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.934351 7fa8466e3940 -- 172.17.40.23:6815/24736 >> 172.17.40.34:6789/0 pipe(0x7fa8400013a0 sd=13 pgs=2573 cs=1 l=1).stop
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.934371 7fa8466e3940 -- 172.17.40.23:6815/24736 >> 172.17.40.34:6789/0 pipe(0x7fa8400013a0 sd=13 pgs=2573 cs=1 l=1).discard_queue
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.934389 7fa8466e3940 -- 172.17.40.23:6815/24736 >> 172.17.40.34:6789/0 pipe(0x7fa8400013a0 sd=13 pgs=2573 cs=1 l=1). dequeued pipe 
/var/log/ceph/osd.21.log:2011-03-03 08:35:29.934404 7fa8466e3940 -- 172.17.40.23:6815/24736 >> 172.17.40.34:6789/0 pipe(0x7fa8400013a0 sd=13 pgs=2573 cs=1 l=1).  discard 0x7fa82843a390
--------
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912317 7fb3bfdd9940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).reader got envelope type=70 src osd34 front=61 data=0 off 0
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912332 7fb3bfdd9940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).reader wants 61 from dispatch throttler 0/35000000
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912365 7fb3bfdd9940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).reader got front 61
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912383 7fb3bfdd9940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).aborted = 0
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912396 7fb3bfdd9940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912422 7fb3bfdd9940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).reader got message 285 0x7fb3b4761210 osd_ping(e7 as_of 7) v1
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912442 7fb3bfdd9940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).queue_received queuing pipe
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912472 7fb3bfdd9940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).reader reading tag...
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912504 7fb3bd9b5940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912529 7fb3bd9b5940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).write_ack 285
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912562 7fb3ca474940 -- 172.17.40.22:6823/27793 dispatch_entry pipe 0x1ee7a30 dequeued 0x7fb3b4761210
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912594 7fb3ca474940 -- 172.17.40.22:6823/27793 <== osd34 172.17.40.25:6808/26003 285 ==== osd_ping(e7 as_of 7) v1 ==== 61+0+0 (3933685369 0 0) 0x7fb3b4761210 con 0x1f05d80
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912606 7fb3ca474940 osd15 7 heartbeat_dispatch 0x7fb3b4761210
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912625 7fb3ca474940 osd15 7 handle_osd_ping from osd34 got stat stat(2011-03-03 08:35:29.911371 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912640 7fb3bd9b5940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912653 7fb3bd9b5940 -- 172.17.40.22:6823/27793 >> 172.17.40.25:6808/26003 pipe(0x1ee7a30 sd=32 pgs=59 cs=1 l=0).writer sleeping
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912668 7fb3ca474940 osd15 7 _share_map_incoming osd34 172.17.40.25:6808/26003 7
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912686 7fb3ca474940 osd15 7 take_peer_stat peer osd34 stat(2011-03-03 08:35:29.911371 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912704 7fb3ca474940 -- 172.17.40.22:6823/27793 dispatch_throttle_release 61 to dispatch throttler 61/35000000
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.912713 7fb3ca474940 -- 172.17.40.22:6823/27793 done calling dispatch on 0x7fb3b4761210
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932667 7fb3c9371940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).reader got MSG
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932698 7fb3c9371940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).reader got envelope type=70 src osd43 front=61 data=0 off 0
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932709 7fb3c9371940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).reader wants 61 from dispatch throttler 0/35000000
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932757 7fb3c9371940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).reader got front 61
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932769 7fb3c9371940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).aborted = 0
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932778 7fb3c9371940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932797 7fb3c9371940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).reader got message 181 0x251fff0 osd_ping(e7 as_of 7) v1
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932813 7fb3c9371940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).queue_received queuing pipe
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932831 7fb3c9371940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).reader reading tag...
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932844 7fb3ca474940 -- 172.17.40.22:6823/27793 dispatch_entry pipe 0x1c14620 dequeued 0x251fff0
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932870 7fb3ca474940 -- 172.17.40.22:6823/27793 <== osd43 172.17.40.27:6811/21226 181 ==== osd_ping(e7 as_of 7) v1 ==== 61+0+0 (3987907583 0 0) 0x251fff0 con 0x1bded80
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932882 7fb3ca474940 osd15 7 heartbeat_dispatch 0x251fff0
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932901 7fb3ca474940 osd15 7 handle_osd_ping from osd43 got stat stat(2011-03-03 08:35:29.931299 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932919 7fb3ca474940 osd15 7 _share_map_incoming osd43 172.17.40.27:6811/21226 7
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932942 7fb3ca474940 osd15 7 take_peer_stat peer osd43 stat(2011-03-03 08:35:29.931299 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932963 7fb3ca474940 -- 172.17.40.22:6823/27793 dispatch_throttle_release 61 to dispatch throttler 61/35000000
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.932976 7fb3ca474940 -- 172.17.40.22:6823/27793 done calling dispatch on 0x251fff0
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933030 7fb3c0eea940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933049 7fb3c0eea940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).write_ack 181
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933078 7fb3c0eea940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933091 7fb3c0eea940 -- 172.17.40.22:6823/27793 >> 172.17.40.27:6811/21226 pipe(0x1c14620 sd=13 pgs=45 cs=1 l=0).writer sleeping
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933243 7fb3c9472940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).reader couldn't read tag, Success
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933263 7fb3c9472940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).fault 0: Success
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933286 7fb3c9472940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).fault on lossy channel, failing
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933313 7fb3c9472940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).fail
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933349 7fb3c9472940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).stop
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933369 7fb3c9472940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).discard_queue
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933391 7fb3c9472940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1). dequeued pipe 
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933411 7fb3c9472940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).  discard 0x7fb3c4ccc890
/var/log/ceph/osd.15.log:2011-03-03 08:35:29.933436 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).do_sendmail short write did 195207, still have 91335
--------
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.908735 7f28c9208940 -- 172.17.40.32:6821/22338 --> mon0 172.17.40.34:6789/0 -- pg_stats(1616 pgs v 7) v1 -- ?+0 0x7f28bd1cb940
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.908767 7f28c9208940 -- 172.17.40.32:6821/22338 submit_message pg_stats(1616 pgs v 7) v1 remote, 172.17.40.34:6789/0, have pipe.
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913351 7f28b95f1940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).reader got MSG
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913386 7f28b95f1940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).reader got envelope type=70 src osd37 front=61 data=0 off 0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913396 7f28b95f1940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).reader wants 61 from dispatch throttler 0/35000000
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913410 7f28b95f1940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).reader got front 61
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913421 7f28b95f1940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).aborted = 0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913430 7f28b95f1940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913449 7f28b95f1940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).reader got message 293 0x7f28bd1f2200 osd_ping(e7 as_of 7) v1
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913464 7f28b95f1940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).queue_received queuing pipe
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913484 7f28b95f1940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).reader reading tag...
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913525 7f28c29fb940 -- 172.17.40.32:6823/22338 dispatch_entry pipe 0x1a1e2d0 dequeued 0x7f28bd1f2200
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913554 7f28c29fb940 -- 172.17.40.32:6823/22338 <== osd37 172.17.40.25:6817/26313 293 ==== osd_ping(e7 as_of 7) v1 ==== 61+0+0 (1171319859 0 0) 0x7f28bd1f2200 con 0x19cc9c0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913568 7f28c29fb940 osd87 7 heartbeat_dispatch 0x7f28bd1f2200
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913590 7f28c29fb940 osd87 7 handle_osd_ping from osd37 got stat stat(2011-03-03 08:35:29.906027 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913613 7f28b7ad6940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913633 7f28b7ad6940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).write_ack 293
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913652 7f28c29fb940 osd87 7 _share_map_incoming osd37 172.17.40.25:6817/26313 7
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913677 7f28b7ad6940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913690 7f28b7ad6940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6817/26313 pipe(0x1a1e2d0 sd=37 pgs=63 cs=1 l=0).writer sleeping
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913707 7f28c29fb940 osd87 7 take_peer_stat peer osd37 stat(2011-03-03 08:35:29.906027 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913729 7f28c29fb940 -- 172.17.40.32:6823/22338 dispatch_throttle_release 61 to dispatch throttler 61/35000000
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.913738 7f28c29fb940 -- 172.17.40.32:6823/22338 done calling dispatch on 0x7f28bd1f2200
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916731 7f28b0b67940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).reader got MSG
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916749 7f28b0b67940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).reader got envelope type=70 src osd34 front=61 data=0 off 0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916760 7f28b0b67940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).reader wants 61 from dispatch throttler 0/35000000
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916773 7f28b0b67940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).reader got front 61
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916787 7f28b0b67940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).aborted = 0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916816 7f28b0b67940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916836 7f28b0b67940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).reader got message 283 0x7f28ac160cc0 osd_ping(e7 as_of 7) v1
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916852 7f28b0b67940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).queue_received queuing pipe
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916868 7f28b0b67940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).reader reading tag...
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916894 7f28c29fb940 -- 172.17.40.32:6823/22338 dispatch_entry pipe 0x1a8b130 dequeued 0x7f28ac160cc0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916920 7f28c29fb940 -- 172.17.40.32:6823/22338 <== osd34 172.17.40.25:6808/26003 283 ==== osd_ping(e7 as_of 7) v1 ==== 61+0+0 (3933685369 0 0) 0x7f28ac160cc0 con 0x17bdd10
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916935 7f28c29fb940 osd87 7 heartbeat_dispatch 0x7f28ac160cc0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916957 7f28c29fb940 osd87 7 handle_osd_ping from osd34 got stat stat(2011-03-03 08:35:29.911371 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.916982 7f28b7bd7940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.917003 7f28b7bd7940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).write_ack 283
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.917029 7f28c29fb940 osd87 7 _share_map_incoming osd34 172.17.40.25:6808/26003 7
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.917095 7f28b7bd7940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.917113 7f28b7bd7940 -- 172.17.40.32:6823/22338 >> 172.17.40.25:6808/26003 pipe(0x1a8b130 sd=91 pgs=63 cs=1 l=0).writer sleeping
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.917197 7f28c29fb940 osd87 7 take_peer_stat peer osd34 stat(2011-03-03 08:35:29.911371 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.917220 7f28c29fb940 -- 172.17.40.32:6823/22338 dispatch_throttle_release 61 to dispatch throttler 61/35000000
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.917233 7f28c29fb940 -- 172.17.40.32:6823/22338 done calling dispatch on 0x7f28ac160cc0
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.934853 7f28c19f9940 -- 172.17.40.32:6821/22338 >> 172.17.40.34:6789/0 pipe(0x1184ce0 sd=13 pgs=2615 cs=1 l=1).reader couldn't read tag, Success
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.934880 7f28c19f9940 -- 172.17.40.32:6821/22338 >> 172.17.40.34:6789/0 pipe(0x1184ce0 sd=13 pgs=2615 cs=1 l=1).fault 0: Success
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.934894 7f28c19f9940 -- 172.17.40.32:6821/22338 >> 172.17.40.34:6789/0 pipe(0x1184ce0 sd=13 pgs=2615 cs=1 l=1).fault on lossy channel, failing
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.934906 7f28c19f9940 -- 172.17.40.32:6821/22338 >> 172.17.40.34:6789/0 pipe(0x1184ce0 sd=13 pgs=2615 cs=1 l=1).fail
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.934917 7f28c19f9940 -- 172.17.40.32:6821/22338 >> 172.17.40.34:6789/0 pipe(0x1184ce0 sd=13 pgs=2615 cs=1 l=1).stop
/var/log/ceph/osd.87.log:2011-03-03 08:35:29.934934 7f28c18f8940 -- 172.17.40.32:6821/22338 >> 172.17.40.34:6789/0 pipe(0x1184ce0 sd=13 pgs=2615 cs=1 l=1).do_sendmail short write did 133455, still have 74755
--------
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079134 7ff76aaaa940 -- 172.17.40.21:6805/22667 >> 172.17.40.33:6808/12863 pipe(0x2c0ef30 sd=156 pgs=86 cs=1 l=0).aborted = 0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079145 7ff76aaaa940 -- 172.17.40.21:6805/22667 >> 172.17.40.33:6808/12863 pipe(0x2c0ef30 sd=156 pgs=86 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079166 7ff77a1e1940 -- 172.17.40.21:6805/22667 >> 172.17.40.31:6808/23718 pipe(0x3405a20 sd=24 pgs=55 cs=1 l=0).reader got envelope type=70 src osd74 front=61 data=0 off 0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079185 7ff77a1e1940 -- 172.17.40.21:6805/22667 >> 172.17.40.31:6808/23718 pipe(0x3405a20 sd=24 pgs=55 cs=1 l=0).reader wants 61 from dispatch throttler 488/35000000
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079201 7ff77a1e1940 -- 172.17.40.21:6805/22667 >> 172.17.40.31:6808/23718 pipe(0x3405a20 sd=24 pgs=55 cs=1 l=0).reader got front 61
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079215 7ff77a1e1940 -- 172.17.40.21:6805/22667 >> 172.17.40.31:6808/23718 pipe(0x3405a20 sd=24 pgs=55 cs=1 l=0).aborted = 0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079225 7ff77a1e1940 -- 172.17.40.21:6805/22667 >> 172.17.40.31:6808/23718 pipe(0x3405a20 sd=24 pgs=55 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079244 7ff77a1e1940 -- 172.17.40.21:6805/22667 >> 172.17.40.31:6808/23718 pipe(0x3405a20 sd=24 pgs=55 cs=1 l=0).reader got message 307 0x38e9050 osd_ping(e7 as_of 7) v1
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079266 7ff77befe940 -- 172.17.40.21:6805/22667 >> 172.17.40.31:6808/23718 pipe(0x3405a20 sd=24 pgs=55 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079278 7ff77befe940 -- 172.17.40.21:6805/22667 >> 172.17.40.31:6808/23718 pipe(0x3405a20 sd=24 pgs=55 cs=1 l=0).write_ack 307
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079292 7ff77befe940 -- 172.17.40.21:6805/22667 >> 172.17.40.31:6808/23718 pipe(0x3405a20 sd=24 pgs=55 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079303 7ff77befe940 -- 172.17.40.21:6805/22667 >> 172.17.40.31:6808/23718 pipe(0x3405a20 sd=24 pgs=55 cs=1 l=0).writer sleeping
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079317 7ff77afef940 -- 172.17.40.21:6805/22667 >> 172.17.40.24:6823/27325 pipe(0x340fcd0 sd=17 pgs=109 cs=1 l=0).reader got MSG
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079339 7ff77b1f1940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6802/21594 pipe(0x31c97f0 sd=27 pgs=46 cs=1 l=0).reader got envelope type=70 src osd80 front=61 data=0 off 0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079357 7ff77b1f1940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6802/21594 pipe(0x31c97f0 sd=27 pgs=46 cs=1 l=0).reader wants 61 from dispatch throttler 549/35000000
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079374 7ff77b1f1940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6802/21594 pipe(0x31c97f0 sd=27 pgs=46 cs=1 l=0).reader got front 61
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079387 7ff77b1f1940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6802/21594 pipe(0x31c97f0 sd=27 pgs=46 cs=1 l=0).aborted = 0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079398 7ff77b1f1940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6802/21594 pipe(0x31c97f0 sd=27 pgs=46 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079415 7ff7799d9940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6817/22123 pipe(0x7ff77c589230 sd=30 pgs=8 cs=1 l=0).reader got MSG
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079434 7ff7799d9940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6817/22123 pipe(0x7ff77c589230 sd=30 pgs=8 cs=1 l=0).reader got envelope type=70 src osd85 front=61 data=0 off 0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079445 7ff7799d9940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6817/22123 pipe(0x7ff77c589230 sd=30 pgs=8 cs=1 l=0).reader wants 61 from dispatch throttler 610/35000000
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079472 7ff7799d9940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6817/22123 pipe(0x7ff77c589230 sd=30 pgs=8 cs=1 l=0).reader got front 61
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079485 7ff7799d9940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6817/22123 pipe(0x7ff77c589230 sd=30 pgs=8 cs=1 l=0).aborted = 0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079494 7ff7799d9940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6817/22123 pipe(0x7ff77c589230 sd=30 pgs=8 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079512 7ff7799d9940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6817/22123 pipe(0x7ff77c589230 sd=30 pgs=8 cs=1 l=0).reader got message 289 0x39a6220 osd_ping(e7 as_of 7) v1
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079528 7ff77a5e5940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6817/22123 pipe(0x7ff77c589230 sd=30 pgs=8 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079538 7ff77a5e5940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6817/22123 pipe(0x7ff77c589230 sd=30 pgs=8 cs=1 l=0).write_ack 289
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079551 7ff77a5e5940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6817/22123 pipe(0x7ff77c589230 sd=30 pgs=8 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079565 7ff77a5e5940 -- 172.17.40.21:6805/22667 >> 172.17.40.32:6817/22123 pipe(0x7ff77c589230 sd=30 pgs=8 cs=1 l=0).writer sleeping
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079578 7ff766464940 -- 172.17.40.21:6804/22667 >> 172.17.40.33:6801/12649 pipe(0x7ff77d7f6a90 sd=155 pgs=67 cs=1 l=0).write_ack 423
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079597 7ff766666940 -- 172.17.40.21:6804/22667 >> 172.17.40.33:6801/12649 pipe(0x7ff77d7f6a90 sd=155 pgs=67 cs=1 l=0).reader got message 424 0x7ff7705b5e60 pg_info(1 pgs e7) v1
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079619 7ff767f7f940 -- 172.17.40.21:6805/22667 >> 172.17.40.24:6814/27015 pipe(0x7ff77c74ea90 sd=141 pgs=84 cs=1 l=0).reader got MSG
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079639 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).fault on lossy channel, failing
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079654 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).fail
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079663 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).stop
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079673 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).discard_queue
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079683 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1). dequeued pipe 
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.079694 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).  discard 0x7ff7702f91f0
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.080250 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).  discard 0x7ff77023ef30
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.080810 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).  discard 0x7ff770371050
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.081353 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).  discard 0x7ff770402050
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.081883 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).  discard 0x7ff771282e90
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.082366 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).  discard 0x7ff7712e1050
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.082901 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).  discard 0x7ff770cfc050
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.083368 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).  discard 0x7ff770c40910
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.083855 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).  discard 0x7ff77007dd40
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.084407 7ff783751940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).  discard 0x7ff770f56010
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.085122 7ff783650940 -- 172.17.40.21:6803/22667 >> 172.17.40.34:6789/0 pipe(0x7ff77c000ef0 sd=13 pgs=2582 cs=1 l=1).do_sendmail short write did 205176, still have 53378
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.085146 7ff784f54940 -- 172.17.40.21:6804/22667 dispatch_throttle_release 316 to dispatch throttler 632/35000000
/var/log/ceph/osd.1.log:2011-03-03 08:35:30.085161 7ff784f54940 -- 172.17.40.21:6804/22667 done calling dispatch on 0x7ff7706c4590
--------
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.509143 7fbd1ba60940 -- 172.17.40.33:6802/12649 done calling dispatch on 0x29ccab0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521556 7fbd06dcd940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).reader got MSG
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521593 7fbd06dcd940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).reader got envelope type=70 src osd84 front=61 data=0 off 0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521603 7fbd06dcd940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).reader wants 61 from dispatch throttler 0/35000000
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521617 7fbd06dcd940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).reader got front 61
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521629 7fbd06dcd940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).aborted = 0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521638 7fbd06dcd940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521656 7fbd06dcd940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).reader got message 189 0x7fbd0d5d41f0 osd_ping(e7 as_of 7) v1
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521672 7fbd06dcd940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).queue_received queuing pipe
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521694 7fbd06dcd940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).reader reading tag...
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521717 7fbd1ba60940 -- 172.17.40.33:6802/12649 dispatch_entry pipe 0x23aa8e0 dequeued 0x7fbd0d5d41f0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521741 7fbd1ba60940 -- 172.17.40.33:6802/12649 <== osd84 172.17.40.32:6814/22015 189 ==== osd_ping(e7 as_of 7) v1 ==== 61+0+0 (1356160346 0 0) 0x7fbd0d5d41f0 con 0x23aab50
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521762 7fbd1ba60940 osd88 7 heartbeat_dispatch 0x7fbd0d5d41f0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521781 7fbd1ba60940 osd88 7 handle_osd_ping from osd84 got stat stat(2011-03-03 08:35:45.514326 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521797 7fbd1ba60940 osd88 7 _share_map_incoming osd84 172.17.40.32:6814/22015 7
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521818 7fbd1ba60940 osd88 7 take_peer_stat peer osd84 stat(2011-03-03 08:35:45.514326 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521844 7fbd1ba60940 -- 172.17.40.33:6802/12649 dispatch_throttle_release 61 to dispatch throttler 61/35000000
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521881 7fbd1ba60940 -- 172.17.40.33:6802/12649 done calling dispatch on 0x7fbd0d5d41f0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521903 7fbd06ccc940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521921 7fbd06ccc940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).write_ack 189
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521940 7fbd06ccc940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.521953 7fbd06ccc940 -- 172.17.40.33:6802/12649 >> 172.17.40.32:6814/22015 pipe(0x23aa8e0 sd=81 pgs=103 cs=1 l=0).writer sleeping
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.531760 7fbd058b8940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).reader got MSG
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.531785 7fbd058b8940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).reader got envelope type=70 src osd70 front=61 data=0 off 0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.531801 7fbd058b8940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).reader wants 61 from dispatch throttler 0/35000000
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.531828 7fbd058b8940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).reader got front 61
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.531846 7fbd058b8940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).aborted = 0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.531860 7fbd058b8940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.531886 7fbd058b8940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).reader got message 181 0x29ccab0 osd_ping(e7 as_of 7) v1
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.531907 7fbd058b8940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).queue_received queuing pipe
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.531942 7fbd058b8940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).reader reading tag...
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.531973 7fbd01575940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.531993 7fbd01575940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).write_ack 181
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.532007 7fbd01575940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.532017 7fbd01575940 -- 172.17.40.33:6802/12649 >> 172.17.40.30:6820/23749 pipe(0x7fbd0c17a220 sd=34 pgs=113 cs=1 l=0).writer sleeping
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.532166 7fbd1ba60940 -- 172.17.40.33:6802/12649 dispatch_entry pipe 0x7fbd0c17a220 dequeued 0x29ccab0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.532185 7fbd1ba60940 -- 172.17.40.33:6802/12649 <== osd70 172.17.40.30:6820/23749 181 ==== osd_ping(e7 as_of 7) v1 ==== 61+0+0 (3167970972 0 0) 0x29ccab0 con 0x7fbd0c17a490
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.532192 7fbd1ba60940 osd88 7 heartbeat_dispatch 0x29ccab0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.532206 7fbd1ba60940 osd88 7 handle_osd_ping from osd70 got stat stat(2011-03-03 08:35:45.524509 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.532217 7fbd1ba60940 osd88 7 _share_map_incoming osd70 172.17.40.30:6820/23749 7
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.532233 7fbd1ba60940 osd88 7 take_peer_stat peer osd70 stat(2011-03-03 08:35:45.524509 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.532246 7fbd1ba60940 -- 172.17.40.33:6802/12649 dispatch_throttle_release 61 to dispatch throttler 61/35000000
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.532255 7fbd1ba60940 -- 172.17.40.33:6802/12649 done calling dispatch on 0x29ccab0
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.560887 7fbd1a95d940 -- 172.17.40.33:6800/12649 >> 172.17.40.34:6789/0 pipe(0x7fbd14000a70 sd=12 pgs=2627 cs=1 l=1).reader couldn't read tag, Success
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.560929 7fbd1a95d940 -- 172.17.40.33:6800/12649 >> 172.17.40.34:6789/0 pipe(0x7fbd14000a70 sd=12 pgs=2627 cs=1 l=1).fault 0: Success
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.560947 7fbd1a95d940 -- 172.17.40.33:6800/12649 >> 172.17.40.34:6789/0 pipe(0x7fbd14000a70 sd=12 pgs=2627 cs=1 l=1).fault on lossy channel, failing
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.560964 7fbd1a95d940 -- 172.17.40.33:6800/12649 >> 172.17.40.34:6789/0 pipe(0x7fbd14000a70 sd=12 pgs=2627 cs=1 l=1).fail
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.560979 7fbd1a95d940 -- 172.17.40.33:6800/12649 >> 172.17.40.34:6789/0 pipe(0x7fbd14000a70 sd=12 pgs=2627 cs=1 l=1).stop
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.560996 7fbd1a95d940 -- 172.17.40.33:6800/12649 >> 172.17.40.34:6789/0 pipe(0x7fbd14000a70 sd=12 pgs=2627 cs=1 l=1).discard_queue
/var/log/ceph/osd.88.log:2011-03-03 08:35:45.561019 7fbd1aa5e940 -- 172.17.40.33:6800/12649 >> 172.17.40.34:6789/0 pipe(0x7fbd14000a70 sd=12 pgs=2627 cs=1 l=1).do_sendmail short write did 129480, still have 364826
--------
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.402368 7f34c41f2940 -- 172.17.40.31:6808/23718 done calling dispatch on 0x7f34a4341000
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.477963 7f34c06e8940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).reader got MSG
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.477990 7f34c06e8940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).reader got envelope type=70 src osd0 front=61 data=0 off 0
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478031 7f34c06e8940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).reader wants 61 from dispatch throttler 0/35000000
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478052 7f34c06e8940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).reader got front 61
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478069 7f34c06e8940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).aborted = 0
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478082 7f34c06e8940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478108 7f34c06e8940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).reader got message 297 0x29f3a90 osd_ping(e7 as_of 7) v1
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478129 7f34c06e8940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).queue_received queuing pipe
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478155 7f34c06e8940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).reader reading tag...
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478184 7f34c41f2940 -- 172.17.40.31:6808/23718 dispatch_entry pipe 0x258bd90 dequeued 0x29f3a90
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478203 7f34c41f2940 -- 172.17.40.31:6808/23718 <== osd0 172.17.40.21:6802/22558 297 ==== osd_ping(e7 as_of 7) v1 ==== 61+0+0 (1206220517 0 0) 0x29f3a90 con 0x23cdf00
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478235 7f34c41f2940 osd74 7 heartbeat_dispatch 0x29f3a90
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478255 7f34c41f2940 osd74 7 handle_osd_ping from osd0 got stat stat(2011-03-03 08:35:45.470964 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478282 7f34b9de4940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478310 7f34b9de4940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).write_ack 297
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478332 7f34c41f2940 osd74 7 _share_map_incoming osd0 172.17.40.21:6802/22558 7
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478356 7f34c41f2940 osd74 7 take_peer_stat peer osd0 stat(2011-03-03 08:35:45.470964 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478387 7f34b9de4940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478405 7f34b9de4940 -- 172.17.40.31:6808/23718 >> 172.17.40.21:6802/22558 pipe(0x258bd90 sd=26 pgs=77 cs=1 l=0).writer sleeping
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478422 7f34c41f2940 -- 172.17.40.31:6808/23718 dispatch_throttle_release 61 to dispatch throttler 61/35000000
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.478433 7f34c41f2940 -- 172.17.40.31:6808/23718 done calling dispatch on 0x29f3a90
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524290 7f34b2067940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).reader got MSG
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524312 7f34b2067940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).reader got envelope type=70 src osd9 front=61 data=0 off 0
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524323 7f34b2067940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).reader wants 61 from dispatch throttler 0/35000000
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524337 7f34b2067940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).reader got front 61
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524350 7f34b2067940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).aborted = 0
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524359 7f34b2067940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).reader got 61 + 0 + 0 byte message
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524380 7f34b2067940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).reader got message 186 0x29f3a90 osd_ping(e7 as_of 7) v1
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524413 7f34b2067940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).queue_received queuing pipe
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524434 7f34b2067940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).reader reading tag...
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524450 7f34c41f2940 -- 172.17.40.31:6808/23718 dispatch_entry pipe 0x2098f30 dequeued 0x29f3a90
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524471 7f34c41f2940 -- 172.17.40.31:6808/23718 <== osd9 172.17.40.22:6805/27174 186 ==== osd_ping(e7 as_of 7) v1 ==== 61+0+0 (2221755111 0 0) 0x29f3a90 con 0x2b67a70
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524505 7f34c41f2940 osd74 7 heartbeat_dispatch 0x29f3a90
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524533 7f34c41f2940 osd74 7 handle_osd_ping from osd9 got stat stat(2011-03-03 08:35:45.516304 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524551 7f34c41f2940 osd74 7 _share_map_incoming osd9 172.17.40.22:6805/27174 7
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524578 7f34c41f2940 osd74 7 take_peer_stat peer osd9 stat(2011-03-03 08:35:45.516304 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524594 7f34c41f2940 -- 172.17.40.31:6808/23718 dispatch_throttle_release 61 to dispatch throttler 61/35000000
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524604 7f34c41f2940 -- 172.17.40.31:6808/23718 done calling dispatch on 0x29f3a90
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524731 7f34b2e75940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524749 7f34b2e75940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).write_ack 186
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524765 7f34b2e75940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).writer: state = 2 policy.server=0
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.524775 7f34b2e75940 -- 172.17.40.31:6808/23718 >> 172.17.40.22:6805/27174 pipe(0x2098f30 sd=92 pgs=66 cs=1 l=0).writer sleeping
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.561000 7f34c31f0940 -- 172.17.40.31:6806/23718 >> 172.17.40.34:6789/0 pipe(0x7f34bc000ea0 sd=12 pgs=2631 cs=1 l=1).reader couldn't read tag, Success
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.561022 7f34c31f0940 -- 172.17.40.31:6806/23718 >> 172.17.40.34:6789/0 pipe(0x7f34bc000ea0 sd=12 pgs=2631 cs=1 l=1).fault 0: Success
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.561042 7f34c31f0940 -- 172.17.40.31:6806/23718 >> 172.17.40.34:6789/0 pipe(0x7f34bc000ea0 sd=12 pgs=2631 cs=1 l=1).fault on lossy channel, failing
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.561060 7f34c31f0940 -- 172.17.40.31:6806/23718 >> 172.17.40.34:6789/0 pipe(0x7f34bc000ea0 sd=12 pgs=2631 cs=1 l=1).fail
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.561078 7f34c31f0940 -- 172.17.40.31:6806/23718 >> 172.17.40.34:6789/0 pipe(0x7f34bc000ea0 sd=12 pgs=2631 cs=1 l=1).stop
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.561097 7f34c31f0940 -- 172.17.40.31:6806/23718 >> 172.17.40.34:6789/0 pipe(0x7f34bc000ea0 sd=12 pgs=2631 cs=1 l=1).discard_queue
/var/log/ceph/osd.74.log:2011-03-03 08:35:45.561126 7f34cf1da940 -- 172.17.40.31:6806/23718 >> 172.17.40.34:6789/0 pipe(0x7f34bc000ea0 sd=12 pgs=2631 cs=1 l=1).do_sendmail short write did 143424, still have 8838
--------

> 
> I pushed a fix for the chdir issue, though!

Thanks!

-- Jim

> 
> Thanks-
> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03  5:03                                 ` Sage Weil
  2011-03-03 16:35                                   ` Jim Schutt
@ 2011-03-03 17:28                                   ` Jim Schutt
  2011-03-03 18:04                                     ` Sage Weil
  1 sibling, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-03 17:28 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Wed, 2011-03-02 at 22:03 -0700, Sage Weil wrote:
> > I'm not sure how to track down what's happening here...
> 
> Hmm.  I'm not able to reproduce this here (tho I only have ~15 nodes 
> available at the moment).  Seeing the last bit of the logs on the crashed 
> nodes will help.
> 

So this might be interesting.  In my last email, osd.15.log ended with

2011-03-03 08:35:29.933436 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).do_sendmail short write did 195207, still have 91335


It occurred to me you might like to know what thread
7fb3d545c940 was doing when it got that short write:

# grep 7fb3d545c940 osd.15.log | tail
2011-03-03 08:32:33.108190 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer encoding 45 0x7fb3c4ad6970 pg_stats(1228 pgs v 6) v1
2011-03-03 08:32:33.114972 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer sending 45 0x7fb3c4ad6970
2011-03-03 08:32:33.115001 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_message 0x7fb3c4ad6970
2011-03-03 08:34:01.154979 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer: state = 2 policy.server=0
2011-03-03 08:34:01.154991 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_keepalive
2011-03-03 08:34:01.155010 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_ack 29
2011-03-03 08:34:01.155041 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer encoding 46 0x7fb3c4b9fd90 pg_stats(1228 pgs v 6) v1
2011-03-03 08:34:01.163035 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer sending 46 0x7fb3c4b9fd90
2011-03-03 08:34:01.163069 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_message 0x7fb3c4b9fd90
2011-03-03 08:35:29.933436 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).do_sendmail short write did 195207, still have 91335

I assume this means the short write happened on sending
pg_stats? 172.17.40.34 is where my monitor is running.

-- Jim




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 17:28                                   ` Jim Schutt
@ 2011-03-03 18:04                                     ` Sage Weil
  2011-03-03 18:42                                       ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-03 18:04 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Thu, 3 Mar 2011, Jim Schutt wrote:
> 
> On Wed, 2011-03-02 at 22:03 -0700, Sage Weil wrote:
> > > I'm not sure how to track down what's happening here...
> > 
> > Hmm.  I'm not able to reproduce this here (tho I only have ~15 nodes 
> > available at the moment).  Seeing the last bit of the logs on the crashed 
> > nodes will help.
> > 

Can you confirm that the chdir is working now?  Maybe put an assert(0) in 
tick() so we can verify core dumps are working in general?

Also, can you confirm that there's nothing interesting in dmesg on these 
nodes (like OOM)?

Thanks-
sage


> 
> So this might be interesting.  In my last email, osd.15.log ended with
> 
> 2011-03-03 08:35:29.933436 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).do_sendmail short write did 195207, still have 91335
> 
> 
> It occurred to me you might like to know what thread
> 7fb3d545c940 was doing when it got that short write:
> 
> # grep 7fb3d545c940 osd.15.log | tail
> 2011-03-03 08:32:33.108190 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer encoding 45 0x7fb3c4ad6970 pg_stats(1228 pgs v 6) v1
> 2011-03-03 08:32:33.114972 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer sending 45 0x7fb3c4ad6970
> 2011-03-03 08:32:33.115001 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_message 0x7fb3c4ad6970
> 2011-03-03 08:34:01.154979 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer: state = 2 policy.server=0
> 2011-03-03 08:34:01.154991 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_keepalive
> 2011-03-03 08:34:01.155010 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_ack 29
> 2011-03-03 08:34:01.155041 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer encoding 46 0x7fb3c4b9fd90 pg_stats(1228 pgs v 6) v1
> 2011-03-03 08:34:01.163035 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer sending 46 0x7fb3c4b9fd90
> 2011-03-03 08:34:01.163069 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_message 0x7fb3c4b9fd90
> 2011-03-03 08:35:29.933436 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).do_sendmail short write did 195207, still have 91335
> 
> I assume this means the short write happened on sending
> pg_stats? 172.17.40.34 is where my monitor is running.
> 
> -- Jim
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 18:04                                     ` Sage Weil
@ 2011-03-03 18:42                                       ` Jim Schutt
  2011-03-03 18:51                                         ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-03 18:42 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Thu, 2011-03-03 at 11:04 -0700, Sage Weil wrote:
> On Thu, 3 Mar 2011, Jim Schutt wrote:
> > 
> > On Wed, 2011-03-02 at 22:03 -0700, Sage Weil wrote:
> > > > I'm not sure how to track down what's happening here...
> > > 
> > > Hmm.  I'm not able to reproduce this here (tho I only have ~15 nodes 
> > > available at the moment).  Seeing the last bit of the logs on the crashed 
> > > nodes will help.
> > > 
> 
> Can you confirm that the chdir is working now?  Maybe put an assert(0) in 
> tick() so we can verify core dumps are working in general?

Great idea, and chdir is definitely working; got 96 core 
files as expected.

> 
> Also, can you confirm that there's nothing interesting in dmesg on these 
> nodes (like OOM)?

The only thing even remotely interesting is the occasional
btrfs message such as:
  [ 7778.199273] btrfs: unlinked 1 orphans
  [69347.002760] btrfs: truncated 1 orphans

Otherwise, no kernel stack traces of the sort I'm
used to seeing; 'dmesg | egrep -i "oom|mem|btrfs"'
only shows those orphan messages.

-- Jim

> 
> Thanks-
> sage
> 
> 
> > 
> > So this might be interesting.  In my last email, osd.15.log ended with
> > 
> > 2011-03-03 08:35:29.933436 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).do_sendmail short write did 195207, still have 91335
> > 
> > 
> > It occurred to me you might like to know what thread
> > 7fb3d545c940 was doing when it got that short write:
> > 
> > # grep 7fb3d545c940 osd.15.log | tail
> > 2011-03-03 08:32:33.108190 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer encoding 45 0x7fb3c4ad6970 pg_stats(1228 pgs v 6) v1
> > 2011-03-03 08:32:33.114972 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer sending 45 0x7fb3c4ad6970
> > 2011-03-03 08:32:33.115001 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_message 0x7fb3c4ad6970
> > 2011-03-03 08:34:01.154979 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer: state = 2 policy.server=0
> > 2011-03-03 08:34:01.154991 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_keepalive
> > 2011-03-03 08:34:01.155010 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_ack 29
> > 2011-03-03 08:34:01.155041 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer encoding 46 0x7fb3c4b9fd90 pg_stats(1228 pgs v 6) v1
> > 2011-03-03 08:34:01.163035 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer sending 46 0x7fb3c4b9fd90
> > 2011-03-03 08:34:01.163069 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_message 0x7fb3c4b9fd90
> > 2011-03-03 08:35:29.933436 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).do_sendmail short write did 195207, still have 91335
> > 
> > I assume this means the short write happened on sending
> > pg_stats? 172.17.40.34 is where my monitor is running.
> > 
> > -- Jim
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 18:42                                       ` Jim Schutt
@ 2011-03-03 18:51                                         ` Sage Weil
  2011-03-03 19:39                                           ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-03 18:51 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Thu, 3 Mar 2011, Jim Schutt wrote:
> On Thu, 2011-03-03 at 11:04 -0700, Sage Weil wrote:
> > On Thu, 3 Mar 2011, Jim Schutt wrote:
> > > 
> > > On Wed, 2011-03-02 at 22:03 -0700, Sage Weil wrote:
> > > > > I'm not sure how to track down what's happening here...
> > > > 
> > > > Hmm.  I'm not able to reproduce this here (tho I only have ~15 nodes 
> > > > available at the moment).  Seeing the last bit of the logs on the crashed 
> > > > nodes will help.
> > > > 
> > 
> > Can you confirm that the chdir is working now?  Maybe put an assert(0) in 
> > tick() so we can verify core dumps are working in general?
> 
> Great idea, and chdir is definitely working; got 96 core 
> files as expected.

Can you put an assert(0) at the top of OSD::shutdown() so we can verify 
that the OSD isn't trying to shut itself down cleanly?  (There are a few 
cases where it might do that.)  The logs you had make it look a bit like 
that could be the case.  Or that it is crashing in an unpleasant way in 
the messenger pipe teardown.

Thanks!
sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 18:51                                         ` Sage Weil
@ 2011-03-03 19:39                                           ` Jim Schutt
  0 siblings, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-03-03 19:39 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Thu, 2011-03-03 at 11:51 -0700, Sage Weil wrote:
> On Thu, 3 Mar 2011, Jim Schutt wrote:
> > On Thu, 2011-03-03 at 11:04 -0700, Sage Weil wrote:
> > > On Thu, 3 Mar 2011, Jim Schutt wrote:
> > > > 
> > > > On Wed, 2011-03-02 at 22:03 -0700, Sage Weil wrote:
> > > > > > I'm not sure how to track down what's happening here...
> > > > > 
> > > > > Hmm.  I'm not able to reproduce this here (tho I only have ~15 nodes 
> > > > > available at the moment).  Seeing the last bit of the logs on the crashed 
> > > > > nodes will help.
> > > > > 
> > > 
> > > Can you confirm that the chdir is working now?  Maybe put an assert(0) in 
> > > tick() so we can verify core dumps are working in general?
> > 
> > Great idea, and chdir is definitely working; got 96 core 
> > files as expected.
> 
> Can you put an assert(0) at the top of OSD::shutdown() so we can verify 
> that the OSD isn't trying to shut itself down cleanly?  (There are a few 
> cases where it might do that.)  The logs you had make it look a bit like 
> that could be the case.  Or that it is crashing in an unpleasant way in 
> the messenger pipe teardown.

No luck there.  Dead OSDs, but no core files.

FWIW, I've got a patch for init-ceph that lets
me run every daemon instance under valgrind and
log its output to a separate file.  I could try 
that if you think it might be useful.

Things run pretty slowly that way, so if there's
other testing you'd like me to try I should do
it first.

-- Jim

> 
> Thanks!
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03  2:26                                 ` Colin McCabe
@ 2011-03-03 20:03                                   ` Jim Schutt
  2011-03-03 20:47                                     ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-03 20:03 UTC (permalink / raw)
  To: Colin McCabe; +Cc: Sage Weil, Gregory Farnum, ceph-devel


On Wed, 2011-03-02 at 19:26 -0700, Colin McCabe wrote:
> Hi Jim,
> 
> We have seen this problem before. The usual suspects are the oom
> killer (grep for "out of memory" in syslog).
> Unfortunately, SIGKILL is uncatchable and that's what the OOM killer sends.
> 
> Another problem that can prevent core files from being generated is
> bad ulimit -c settings or a bad setting for core_pattern and friends.
> One problem I have a lot too is that the partition I'm writing core
> files to fills up.
> 
> If none of that works, it's possible that someone is calling exit()
> somewhere. You can attach a gdb to the process and put a breakpoint on
> exit() to see if this is going on. There's a lot of "your foo is not
> bar enough, I hate your config, exit(1)" type code that gets executed
> while the daemon is starting up. It sounds like you should be past
> that point, though.

I've finally gotten a little info, using a variant of
your gdb idea: I waited until many of the OSD instances
had died, then I attached gdb to several that were left,
and waited.

Two of them died the same way, like this:

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7fd7888c8940 (LWP 28693)]
0x00007fd7a9b82f2b in sendmsg () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007fd7a9b82f2b in sendmsg () from /lib64/libpthread.so.0
#1  0x0000000000672e0b in SimpleMessenger::Pipe::do_sendmsg (
    this=0x7fd799b67c20, sd=13, msg=0x7fd7888c7f20, len=251237, more=false)
    at msg/SimpleMessenger.cc:1994
#2  0x00000000006739d3 in SimpleMessenger::Pipe::write_message (
    this=0x7fd799b67c20, m=0x7fd79b2dcb70) at msg/SimpleMessenger.cc:2217
#3  0x000000000067e74a in SimpleMessenger::Pipe::writer (this=0x7fd799b67c20)
    at msg/SimpleMessenger.cc:1734
#4  0x000000000066fa2b in SimpleMessenger::Pipe::Writer::entry (
    this=0x7fd799b67e70) at msg/SimpleMessenger.h:204
#5  0x000000000068282e in Thread::_entry_func (arg=0x7fd799b67e70)
    at ./common/Thread.h:41
#6  0x00007fd7a9b7b73d in start_thread (arg=<value optimized out>)
    at pthread_create.c:301
#7  0x00007fd7a8a91f6d in clone () from /lib64/libc.so.6
(gdb) 


Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7f1aed7f3940 (LWP 28726)]
0x00007f1b01238f2b in sendmsg () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f1b01238f2b in sendmsg () from /lib64/libpthread.so.0
#1  0x0000000000672e0b in SimpleMessenger::Pipe::do_sendmsg (
    this=0x7f1af15c94d0, sd=114, msg=0x7f1aed7f2f20, len=126728, more=false)
    at msg/SimpleMessenger.cc:1994
#2  0x00000000006739d3 in SimpleMessenger::Pipe::write_message (
    this=0x7f1af15c94d0, m=0x23d3010) at msg/SimpleMessenger.cc:2217
#3  0x000000000067e74a in SimpleMessenger::Pipe::writer (this=0x7f1af15c94d0)
    at msg/SimpleMessenger.cc:1734
#4  0x000000000066fa2b in SimpleMessenger::Pipe::Writer::entry (
    this=0x7f1af15c9720) at msg/SimpleMessenger.h:204
#5  0x000000000068282e in Thread::_entry_func (arg=0x7f1af15c9720)
    at ./common/Thread.h:41
#6  0x00007f1b0123173d in start_thread (arg=<value optimized out>)
    at pthread_create.c:301
#7  0x00007f1b00147f6d in clone () from /lib64/libc.so.6

The third also got 

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7f531fefe940 (LWP 28700)]
0x00007f533ffeaf2b in sendmsg () from /lib64/libpthread.so.0
(gdb) 

but something was a little different and I didn't get a 
backtrace from it.

-- Jim


> 
> Colin
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 20:03                                   ` Jim Schutt
@ 2011-03-03 20:47                                     ` Jim Schutt
  2011-03-03 20:55                                       ` Yehuda Sadeh Weinraub
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-03 20:47 UTC (permalink / raw)
  To: Colin McCabe; +Cc: Sage Weil, Gregory Farnum, ceph-devel


On Thu, 2011-03-03 at 13:03 -0700, Jim Schutt wrote:
> > If none of that works, it's possible that someone is calling exit()
> > somewhere. You can attach a gdb to the process and put a breakpoint on
> > exit() to see if this is going on. There's a lot of "your foo is not
> > bar enough, I hate your config, exit(1)" type code that gets executed
> > while the daemon is starting up. It sounds like you should be past
> > that point, though.
> 
> I've finally gotten a little info, using a variant of
> your gdb idea: I waited until many of the OSD instances
> had died, then I attached gdb to several that were left,
> and waited.
> 
> Two of them died the same way, like this:
> 
> Program received signal SIGPIPE, Broken pipe.
> [Switching to Thread 0x7fd7888c8940 (LWP 28693)]
> 0x00007fd7a9b82f2b in sendmsg () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007fd7a9b82f2b in sendmsg () from /lib64/libpthread.so.0
> #1  0x0000000000672e0b in SimpleMessenger::Pipe::do_sendmsg (
>     this=0x7fd799b67c20, sd=13, msg=0x7fd7888c7f20, len=251237, more=false)
>     at msg/SimpleMessenger.cc:1994
> #2  0x00000000006739d3 in SimpleMessenger::Pipe::write_message (
>     this=0x7fd799b67c20, m=0x7fd79b2dcb70) at msg/SimpleMessenger.cc:2217
> #3  0x000000000067e74a in SimpleMessenger::Pipe::writer (this=0x7fd799b67c20)
>     at msg/SimpleMessenger.cc:1734
> #4  0x000000000066fa2b in SimpleMessenger::Pipe::Writer::entry (
>     this=0x7fd799b67e70) at msg/SimpleMessenger.h:204
> #5  0x000000000068282e in Thread::_entry_func (arg=0x7fd799b67e70)
>     at ./common/Thread.h:41
> #6  0x00007fd7a9b7b73d in start_thread (arg=<value optimized out>)
>     at pthread_create.c:301
> #7  0x00007fd7a8a91f6d in clone () from /lib64/libc.so.6
> (gdb) 
> 

Has something maybe changed in signal handling recently?

Maybe SIGPIPE used to be blocked, and sendmsg() would
return -EPIPE, but now it's not blocked and not handled?

This bit in linux-2.6.git/net/core/stream.c is what made
me wonder, but maybe it's a red herring:

int sk_stream_error(struct sock *sk, int flags, int err)
{
	if (err == -EPIPE)
		err = sock_error(sk) ? : -EPIPE;
	if (err == -EPIPE && !(flags & MSG_NOSIGNAL))
		send_sig(SIGPIPE, current, 0);
	return err;
}

-- Jim




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 20:47                                     ` Jim Schutt
@ 2011-03-03 20:55                                       ` Yehuda Sadeh Weinraub
  2011-03-03 21:45                                         ` Jim Schutt
  2011-03-03 21:53                                         ` Colin McCabe
  0 siblings, 2 replies; 94+ messages in thread
From: Yehuda Sadeh Weinraub @ 2011-03-03 20:55 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Colin McCabe, Sage Weil, Gregory Farnum, ceph-devel

On Thu, Mar 3, 2011 at 12:47 PM, Jim Schutt <jaschut@sandia.gov> wrote:
> Has something maybe changed in signal handling recently?
>
> Maybe SIGPIPE used to be blocked, and sendmsg() would
> return -EPIPE, but now it's not blocked and not handled?
>
> This bit in linux-2.6.git/net/core/stream.c is what made
> me wonder, but maybe it's a red herring:
>
> int sk_stream_error(struct sock *sk, int flags, int err)
> {
>        if (err == -EPIPE)
>                err = sock_error(sk) ? : -EPIPE;
>        if (err == -EPIPE && !(flags & MSG_NOSIGNAL))
>                send_sig(SIGPIPE, current, 0);
>        return err;
> }

It was actually just changed at
35c4a9ffeadfe202b247c8e23719518a874f54e6, so if you're on latest
master then it might be it. You can try reverting that commit, or can
try this:

index da22c7c..6f746d4 100644
--- a/src/msg/SimpleMessenger.cc
+++ b/src/msg/SimpleMessenger.cc
@@ -1991,7 +1991,7 @@ int SimpleMessenger::Pipe::do_sendmsg(int sd,
struct msghdr *msg, int len, bool
       assert(l == len);
     }

-    int r = ::sendmsg(sd, msg, more ? MSG_MORE : 0);
+    int r = ::sendmsg(sd, msg, MSG_NOSIGNAL | (more ? MSG_MORE : 0));
     if (r == 0)
       dout(10) << "do_sendmsg hmm do_sendmsg got r==0!" << dendl;
     if (r < 0) {


Yehuda
>
> -- Jim
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 20:55                                       ` Yehuda Sadeh Weinraub
@ 2011-03-03 21:45                                         ` Jim Schutt
  2011-03-03 22:22                                           ` Sage Weil
  2011-03-03 21:53                                         ` Colin McCabe
  1 sibling, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-03 21:45 UTC (permalink / raw)
  To: Yehuda Sadeh Weinraub; +Cc: Colin McCabe, Sage Weil, Gregory Farnum, ceph-devel


On Thu, 2011-03-03 at 13:55 -0700, Yehuda Sadeh Weinraub wrote:
> On Thu, Mar 3, 2011 at 12:47 PM, Jim Schutt <jaschut@sandia.gov> wrote:
> > Has something maybe changed in signal handling recently?
> >
> > Maybe SIGPIPE used to be blocked, and sendmsg() would
> > return -EPIPE, but now it's not blocked and not handled?
> >
> > This bit in linux-2.6.git/net/core/stream.c is what made
> > me wonder, but maybe it's a red herring:
> >
> > int sk_stream_error(struct sock *sk, int flags, int err)
> > {
> >        if (err == -EPIPE)
> >                err = sock_error(sk) ? : -EPIPE;
> >        if (err == -EPIPE && !(flags & MSG_NOSIGNAL))
> >                send_sig(SIGPIPE, current, 0);
> >        return err;
> > }
> 
> It was actually just changed at
> 35c4a9ffeadfe202b247c8e23719518a874f54e6, so if you're on latest
> master then it might be it. You can try reverting that commit, or can
> try this:
> 
> index da22c7c..6f746d4 100644
> --- a/src/msg/SimpleMessenger.cc
> +++ b/src/msg/SimpleMessenger.cc
> @@ -1991,7 +1991,7 @@ int SimpleMessenger::Pipe::do_sendmsg(int sd,
> struct msghdr *msg, int len, bool
>        assert(l == len);
>      }
> 
> -    int r = ::sendmsg(sd, msg, more ? MSG_MORE : 0);
> +    int r = ::sendmsg(sd, msg, MSG_NOSIGNAL | (more ? MSG_MORE : 0));
>      if (r == 0)
>        dout(10) << "do_sendmsg hmm do_sendmsg got r==0!" << dendl;
>      if (r < 0) {
> 

That seems to have fixed this issue.  At least, before
OSDs would start dying within a few minutes of starting
up a new file system; it's been over a half hour since
I started one up with this patch, and all OSDs are still
running.

Thanks!

-- Jim

> 
> Yehuda
> >
> > -- Jim
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 20:55                                       ` Yehuda Sadeh Weinraub
  2011-03-03 21:45                                         ` Jim Schutt
@ 2011-03-03 21:53                                         ` Colin McCabe
  2011-03-03 23:06                                           ` Jim Schutt
  1 sibling, 1 reply; 94+ messages in thread
From: Colin McCabe @ 2011-03-03 21:53 UTC (permalink / raw)
  To: Yehuda Sadeh Weinraub; +Cc: Jim Schutt, Sage Weil, Gregory Farnum, ceph-devel

Oh, SIGPIPE, my old nemesis. I should have guessed!

I think it's time to block SIGPIPE everywhere... It's much better to
get EPIPE than to use a signal handler for this.

regards,
Colin


On Thu, Mar 3, 2011 at 12:55 PM, Yehuda Sadeh Weinraub
<yehudasa@gmail.com> wrote:
> On Thu, Mar 3, 2011 at 12:47 PM, Jim Schutt <jaschut@sandia.gov> wrote:
>> Has something maybe changed in signal handling recently?
>>
>> Maybe SIGPIPE used to be blocked, and sendmsg() would
>> return -EPIPE, but now it's not blocked and not handled?
>>
>> This bit in linux-2.6.git/net/core/stream.c is what made
>> me wonder, but maybe it's a red herring:
>>
>> int sk_stream_error(struct sock *sk, int flags, int err)
>> {
>>        if (err == -EPIPE)
>>                err = sock_error(sk) ? : -EPIPE;
>>        if (err == -EPIPE && !(flags & MSG_NOSIGNAL))
>>                send_sig(SIGPIPE, current, 0);
>>        return err;
>> }
>
> It was actually just changed at
> 35c4a9ffeadfe202b247c8e23719518a874f54e6, so if you're on latest
> master then it might be it. You can try reverting that commit, or can
> try this:
>
> index da22c7c..6f746d4 100644
> --- a/src/msg/SimpleMessenger.cc
> +++ b/src/msg/SimpleMessenger.cc
> @@ -1991,7 +1991,7 @@ int SimpleMessenger::Pipe::do_sendmsg(int sd,
> struct msghdr *msg, int len, bool
>       assert(l == len);
>     }
>
> -    int r = ::sendmsg(sd, msg, more ? MSG_MORE : 0);
> +    int r = ::sendmsg(sd, msg, MSG_NOSIGNAL | (more ? MSG_MORE : 0));
>     if (r == 0)
>       dout(10) << "do_sendmsg hmm do_sendmsg got r==0!" << dendl;
>     if (r < 0) {
>
>
> Yehuda
>>
>> -- Jim
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 21:45                                         ` Jim Schutt
@ 2011-03-03 22:22                                           ` Sage Weil
  2011-03-03 22:34                                             ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-03 22:22 UTC (permalink / raw)
  To: Jim Schutt
  Cc: Yehuda Sadeh Weinraub, Colin McCabe, Gregory Farnum, ceph-devel

On Thu, 3 Mar 2011, Jim Schutt wrote:
> That seems to have fixed this issue.  At least, before
> OSDs would start dying within a few minutes of starting
> up a new file system; it's been over a half hour since
> I started one up with this patch, and all OSDs are still
> running.

That's good news; good catch with SIGPIPE!

Does that mean the original problem with the OSDs getting marked down has 
also gone away for you too?

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 22:22                                           ` Sage Weil
@ 2011-03-03 22:34                                             ` Jim Schutt
  0 siblings, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-03-03 22:34 UTC (permalink / raw)
  To: Sage Weil; +Cc: Yehuda Sadeh Weinraub, Colin McCabe, Gregory Farnum, ceph-devel


On Thu, 2011-03-03 at 15:22 -0700, Sage Weil wrote:
> On Thu, 3 Mar 2011, Jim Schutt wrote:
> > That seems to have fixed this issue.  At least, before
> > OSDs would start dying within a few minutes of starting
> > up a new file system; it's been over a half hour since
> > I started one up with this patch, and all OSDs are still
> > running.
> 
> That's good news; good catch with SIGPIPE!
> 
> Does that mean the original problem with the OSDs getting marked down has 
> also gone away for you too?

I don't know yet - I'm just starting to retest that.

I'll let you know as soon as I know something.

Thanks -- Jim

> 
> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 21:53                                         ` Colin McCabe
@ 2011-03-03 23:06                                           ` Jim Schutt
  2011-03-03 23:30                                             ` Colin McCabe
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-03 23:06 UTC (permalink / raw)
  To: Colin McCabe; +Cc: Yehuda Sadeh Weinraub, Sage Weil, Gregory Farnum, ceph-devel

Hi Colin,

On Thu, 2011-03-03 at 14:53 -0700, Colin McCabe wrote:
> Oh, SIGPIPE, my old nemesis. I should have guessed!
> 
> I think it's time to block SIGPIPE everywhere... It's much better to
> get EPIPE than to use a signal handler for this.

I saw your commit d1fce13f9855.

It seems like a41865e323 can be reverted, and
everything should work correctly?

-- Jim

> 
> regards,
> Colin
> 
> 
> On Thu, Mar 3, 2011 at 12:55 PM, Yehuda Sadeh Weinraub
> <yehudasa@gmail.com> wrote:
> > On Thu, Mar 3, 2011 at 12:47 PM, Jim Schutt <jaschut@sandia.gov> wrote:
> >> Has something maybe changed in signal handling recently?
> >>
> >> Maybe SIGPIPE used to be blocked, and sendmsg() would
> >> return -EPIPE, but now it's not blocked and not handled?
> >>
> >> This bit in linux-2.6.git/net/core/stream.c is what made
> >> me wonder, but maybe it's a red herring:
> >>
> >> int sk_stream_error(struct sock *sk, int flags, int err)
> >> {
> >>        if (err == -EPIPE)
> >>                err = sock_error(sk) ? : -EPIPE;
> >>        if (err == -EPIPE && !(flags & MSG_NOSIGNAL))
> >>                send_sig(SIGPIPE, current, 0);
> >>        return err;
> >> }
> >
> > It was actually just changed at
> > 35c4a9ffeadfe202b247c8e23719518a874f54e6, so if you're on latest
> > master then it might be it. You can try reverting that commit, or can
> > try this:
> >
> > index da22c7c..6f746d4 100644
> > --- a/src/msg/SimpleMessenger.cc
> > +++ b/src/msg/SimpleMessenger.cc
> > @@ -1991,7 +1991,7 @@ int SimpleMessenger::Pipe::do_sendmsg(int sd,
> > struct msghdr *msg, int len, bool
> >       assert(l == len);
> >     }
> >
> > -    int r = ::sendmsg(sd, msg, more ? MSG_MORE : 0);
> > +    int r = ::sendmsg(sd, msg, MSG_NOSIGNAL | (more ? MSG_MORE : 0));
> >     if (r == 0)
> >       dout(10) << "do_sendmsg hmm do_sendmsg got r==0!" << dendl;
> >     if (r < 0) {
> >
> >
> > Yehuda
> >>
> >> -- Jim
> >>
> >>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 23:06                                           ` Jim Schutt
@ 2011-03-03 23:30                                             ` Colin McCabe
  2011-03-03 23:37                                               ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Colin McCabe @ 2011-03-03 23:30 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Yehuda Sadeh Weinraub, Sage Weil, Gregory Farnum, ceph-devel

Hi Jim,

It's true that either one of these commits should do the trick, but I
would rather keep a41865e323 to make it explicit that we don't want
SIGPIPE. It's kind of a form of documentation in the code ("oh, I see
they're not using SIGPIPE"), and more documentation is a good thing.

The other issue is that we shouldn't alter the signal disposition of
library users. So it's prudent to put MSG_NOSIGNAL on all of our calls
to send() and sendmsg for consistency, in case we ever end up calling
send() from a library user's in a thread that we didn't create. (I
don't think we do currently, but we may in the future I guess.)

cheers,
Colin


On Thu, Mar 3, 2011 at 3:06 PM, Jim Schutt <jaschut@sandia.gov> wrote:
> Hi Colin,
>
> On Thu, 2011-03-03 at 14:53 -0700, Colin McCabe wrote:
>> Oh, SIGPIPE, my old nemesis. I should have guessed!
>>
>> I think it's time to block SIGPIPE everywhere... It's much better to
>> get EPIPE than to use a signal handler for this.
>
> I saw your commit d1fce13f9855.
>
> It seems like a41865e323 can be reverted, and
> everything should work correctly?
>
> -- Jim
>
>>
>> regards,
>> Colin
>>
>>
>> On Thu, Mar 3, 2011 at 12:55 PM, Yehuda Sadeh Weinraub
>> <yehudasa@gmail.com> wrote:
>> > On Thu, Mar 3, 2011 at 12:47 PM, Jim Schutt <jaschut@sandia.gov> wrote:
>> >> Has something maybe changed in signal handling recently?
>> >>
>> >> Maybe SIGPIPE used to be blocked, and sendmsg() would
>> >> return -EPIPE, but now it's not blocked and not handled?
>> >>
>> >> This bit in linux-2.6.git/net/core/stream.c is what made
>> >> me wonder, but maybe it's a red herring:
>> >>
>> >> int sk_stream_error(struct sock *sk, int flags, int err)
>> >> {
>> >>        if (err == -EPIPE)
>> >>                err = sock_error(sk) ? : -EPIPE;
>> >>        if (err == -EPIPE && !(flags & MSG_NOSIGNAL))
>> >>                send_sig(SIGPIPE, current, 0);
>> >>        return err;
>> >> }
>> >
>> > It was actually just changed at
>> > 35c4a9ffeadfe202b247c8e23719518a874f54e6, so if you're on latest
>> > master then it might be it. You can try reverting that commit, or can
>> > try this:
>> >
>> > index da22c7c..6f746d4 100644
>> > --- a/src/msg/SimpleMessenger.cc
>> > +++ b/src/msg/SimpleMessenger.cc
>> > @@ -1991,7 +1991,7 @@ int SimpleMessenger::Pipe::do_sendmsg(int sd,
>> > struct msghdr *msg, int len, bool
>> >       assert(l == len);
>> >     }
>> >
>> > -    int r = ::sendmsg(sd, msg, more ? MSG_MORE : 0);
>> > +    int r = ::sendmsg(sd, msg, MSG_NOSIGNAL | (more ? MSG_MORE : 0));
>> >     if (r == 0)
>> >       dout(10) << "do_sendmsg hmm do_sendmsg got r==0!" << dendl;
>> >     if (r < 0) {
>> >
>> >
>> > Yehuda
>> >>
>> >> -- Jim
>> >>
>> >>
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-03 23:30                                             ` Colin McCabe
@ 2011-03-03 23:37                                               ` Jim Schutt
  0 siblings, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-03-03 23:37 UTC (permalink / raw)
  To: Colin McCabe; +Cc: Yehuda Sadeh Weinraub, Sage Weil, Gregory Farnum, ceph-devel


On Thu, 2011-03-03 at 16:30 -0700, Colin McCabe wrote:
> Hi Jim,
> 
> It's true that either one of these commits should do the trick, but I
> would rather keep a41865e323 to make it explicit that we don't want
> SIGPIPE. It's kind of a form of documentation in the code ("oh, I see
> they're not using SIGPIPE"), and more documentation is a good thing.

OK, that makes sense.

> 
> The other issue is that we shouldn't alter the signal disposition of
> library users. So it's prudent to put MSG_NOSIGNAL on all of our calls
> to send() and sendmsg for consistency, in case we ever end up calling
> send() from a library user's in a thread that we didn't create. (I
> don't think we do currently, but we may in the future I guess.)

Right, I didn't think about the library case.
Thanks for filling me in.

-- Jim

> 
> cheers,
> Colin
> 
> 
> On Thu, Mar 3, 2011 at 3:06 PM, Jim Schutt <jaschut@sandia.gov> wrote:
> > Hi Colin,
> >
> > On Thu, 2011-03-03 at 14:53 -0700, Colin McCabe wrote:
> >> Oh, SIGPIPE, my old nemesis. I should have guessed!
> >>
> >> I think it's time to block SIGPIPE everywhere... It's much better to
> >> get EPIPE than to use a signal handler for this.
> >
> > I saw your commit d1fce13f9855.
> >
> > It seems like a41865e323 can be reverted, and
> > everything should work correctly?
> >
> > -- Jim
> >
> >>
> >> regards,
> >> Colin
> >>
> >>
> >> On Thu, Mar 3, 2011 at 12:55 PM, Yehuda Sadeh Weinraub
> >> <yehudasa@gmail.com> wrote:
> >> > On Thu, Mar 3, 2011 at 12:47 PM, Jim Schutt <jaschut@sandia.gov> wrote:
> >> >> Has something maybe changed in signal handling recently?
> >> >>
> >> >> Maybe SIGPIPE used to be blocked, and sendmsg() would
> >> >> return -EPIPE, but now it's not blocked and not handled?
> >> >>
> >> >> This bit in linux-2.6.git/net/core/stream.c is what made
> >> >> me wonder, but maybe it's a red herring:
> >> >>
> >> >> int sk_stream_error(struct sock *sk, int flags, int err)
> >> >> {
> >> >>        if (err == -EPIPE)
> >> >>                err = sock_error(sk) ? : -EPIPE;
> >> >>        if (err == -EPIPE && !(flags & MSG_NOSIGNAL))
> >> >>                send_sig(SIGPIPE, current, 0);
> >> >>        return err;
> >> >> }
> >> >
> >> > It was actually just changed at
> >> > 35c4a9ffeadfe202b247c8e23719518a874f54e6, so if you're on latest
> >> > master then it might be it. You can try reverting that commit, or can
> >> > try this:
> >> >
> >> > index da22c7c..6f746d4 100644
> >> > --- a/src/msg/SimpleMessenger.cc
> >> > +++ b/src/msg/SimpleMessenger.cc
> >> > @@ -1991,7 +1991,7 @@ int SimpleMessenger::Pipe::do_sendmsg(int sd,
> >> > struct msghdr *msg, int len, bool
> >> >       assert(l == len);
> >> >     }
> >> >
> >> > -    int r = ::sendmsg(sd, msg, more ? MSG_MORE : 0);
> >> > +    int r = ::sendmsg(sd, msg, MSG_NOSIGNAL | (more ? MSG_MORE : 0));
> >> >     if (r == 0)
> >> >       dout(10) << "do_sendmsg hmm do_sendmsg got r==0!" << dendl;
> >> >     if (r < 0) {
> >> >
> >> >
> >> > Yehuda
> >> >>
> >> >> -- Jim
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >
> >>
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-17 23:31             ` Jim Schutt
  2011-02-18  7:13               ` Sage Weil
@ 2011-03-09 16:02               ` Jim Schutt
  2011-03-09 17:07                 ` Gregory Farnum
  1 sibling, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-09 16:02 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Thu, 2011-02-17 at 16:31 -0700, Jim Schutt wrote:
> 
> On Thu, 2011-02-17 at 09:11 -0700, Sage Weil wrote:
> > On Thu, 17 Feb 2011, Jim Schutt wrote:
> > > Hi Sage,
> > >
> > > On Wed, 2011-02-16 at 17:54 -0700, Sage Weil wrote:
> > > > On Wed, 16 Feb 2011, Sage Weil wrote:
> > > > > shouldn't affect anything.  We may have missed something.. do you have a
> > > > > log showing this in action?
> > > >
> > > > Obviously yes, looking at your original email.  :)  At the beginning of
> > > > each log line we include a thread id.  What would be really helpful would
> > > > be to narrow down where in OSD::heartbeat_entry() and heartbeat() things
> > > > are blocking, either based on the existing output, or by adding additional
> > > > dout lines at interesting points in time.
> > >
> > > I'll take a deeper look at my existing logs with
> > > that in mind; let me know if you'd like me to
> > > send you some.
> > >
> > > I have also been looking at map_lock, as it seems
> > > to be shared between the heartbeat and map update
> > > threads.
> > >
> > > Would instrumenting acquiring/releasing that lock
> > > be helpful?  Is there some other lock that may
> > > be more fruitful to instrument?  I can reproduce
> > > pretty reliably, so adding instrumentation is
> > > no problem.
> >
> > The heartbeat thread is doing a map_lock.try_get_read() because it
> > frequently is held by another thread, so that shouldn't ever block.
> >
> > The possibilities I see are:
> >  - peer_stat_lock
> >  - the monc->sub_want / renew_subs calls (monc has an internal lock),
> > although that code should only trigger with a single osd.  :/
> >  - heartbeat_lock itself could be held by another thread; i'd instrument
> > all locks/unlocks there, along with the wakeup in heartbeat().
> 
> If I did the instrumentation right, there's no sign that
> any of these locks are contended.
> 

Heh.  Evidently I didn't do the instrumentation right.
Or more specifically, I didn't look in the right places
for the result I needed.

Once I understood the code/logging enough to write a
script to look for delayed osd_ping message processing,
I found evidence that the heartbeat lock is contended:

osd.62.log:798249:2011-03-09 08:23:46.710315 7f361bb09940 -- 172.17.40.29:6820/28024 >> 172.17.40.32:6802/28608 pipe(0x25e3c40 sd=117 pgs=70 cs=1 l=0).reader got message 254 0x290c040 osd_ping(e5 as_of 5) v1
osd.62.log:798254:2011-03-09 08:23:46.710380 7f3633682940 -- 172.17.40.29:6820/28024 dispatch_entry pipe 0x25e3c40 dequeued 0x290c040
osd.62.log:798255:2011-03-09 08:23:46.710393 7f3633682940 -- 172.17.40.29:6820/28024 <== osd80 172.17.40.32:6802/28608 254 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (757971450 0 0) 0x290c040 con 0x25e3ec0
osd.62.log:798256:2011-03-09 08:23:46.710401 7f3633682940 osd62 5 heartbeat_dispatch 0x290c040
osd.62.log:798257:2011-03-09 08:23:46.710415 7f3633682940 osd62 5 handle_osd_ping from osd80 got stat stat(2011-03-09 08:23:46.703991 oprate=7.91906 qlen=4.5 recent_qlen=4.43333 rdlat=0 / 0 fshedin=0)
osd.62.log:798258:2011-03-09 08:23:46.710422 7f3633682940 osd62 5 handle_osd_ping wants heartbeat_lock
osd.62.log:804833:2011-03-09 08:23:59.549923 7f3633682940 osd62 5 handle_osd_ping got heartbeat_lock
osd.62.log:804834:2011-03-09 08:23:59.549940 7f3633682940 osd62 5 handle_osd_ping wants read on map_lock
osd.62.log:804835:2011-03-09 08:23:59.549947 7f3633682940 osd62 5 handle_osd_ping got read on map_lock
osd.62.log:804836:2011-03-09 08:23:59.549965 7f3633682940 osd62 5 _share_map_incoming osd80 172.17.40.32:6802/28608 5
osd.62.log:804837:2011-03-09 08:23:59.549980 7f3633682940 osd62 5 take_peer_stat wants peer_stat_lock
osd.62.log:804838:2011-03-09 08:23:59.549986 7f3633682940 osd62 5 take_peer_stat got peer_stat_lock
osd.62.log:804839:2011-03-09 08:23:59.550001 7f3633682940 osd62 5 take_peer_stat peer osd80 stat(2011-03-09 08:23:46.703991 oprate=7.91906 qlen=4.5 recent_qlen=4.43333 rdlat=0 / 0 fshedin=0)
osd.62.log:804840:2011-03-09 08:23:59.550009 7f3633682940 osd62 5 take_peer_stat dropping peer_stat_lock
osd.62.log:804841:2011-03-09 08:23:59.550036 7f3633682940 osd62 5 handle_osd_ping dropping read on map_lock
osd.62.log:804842:2011-03-09 08:23:59.550043 7f3633682940 osd62 5 handle_osd_ping dropping heartbeat_lock
osd.62.log:804843:2011-03-09 08:23:59.550062 7f3633682940 -- 172.17.40.29:6820/28024 dispatch_throttle_release 61 to dispatch throttler 26840/35000000
osd.62.log:804844:2011-03-09 08:23:59.550073 7f3633682940 -- 172.17.40.29:6820/28024 dispatch_entry done with 0x290c040 que_et 0.000078 op_et 12.839669 tot_et 12.839747

I still need to gather evidence on who is holding heartbeat_lock
in cases like this.  Still digging.....

I'm sorry it took me so long to find some evidence about what 
was going on.

-- Jim



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-09 16:02               ` Jim Schutt
@ 2011-03-09 17:07                 ` Gregory Farnum
  2011-03-09 18:36                   ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Gregory Farnum @ 2011-03-09 17:07 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Sage Weil, ceph-devel

On Wednesday, March 9, 2011 at 8:02 AM, Jim Schutt wrote:
> Heh. Evidently I didn't do the instrumentation right.
> Or more specifically, I didn't look in the right places
> for the result I needed.
> 
> Once I understood the code/logging enough to write a
> script to look for delayed osd_ping message processing,
> I found evidence that the heartbeat lock is contended:
> 
> osd.62.log:798249:2011-03-09 08:23:46.710315 7f361bb09940 -- 172.17.40.29:6820/28024 >> 172.17.40.32:6802/28608 pipe(0x25e3c40 sd=117 pgs=70 cs=1 l=0).reader got message 254 0x290c040 osd_ping(e5 as_of 5) v1
> osd.62.log:798254:2011-03-09 08:23:46.710380 7f3633682940 -- 172.17.40.29:6820/28024 dispatch_entry pipe 0x25e3c40 dequeued 0x290c040
> osd.62.log:798255:2011-03-09 08:23:46.710393 7f3633682940 -- 172.17.40.29:6820/28024 <== osd80 172.17.40.32:6802/28608 254 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (757971450 0 0) 0x290c040 con 0x25e3ec0
> osd.62.log:798256:2011-03-09 08:23:46.710401 7f3633682940 osd62 5 heartbeat_dispatch 0x290c040
> osd.62.log:798257:2011-03-09 08:23:46.710415 7f3633682940 osd62 5 handle_osd_ping from osd80 got stat stat(2011-03-09 08:23:46.703991 oprate=7.91906 qlen=4.5 recent_qlen=4.43333 rdlat=0 / 0 fshedin=0)
> osd.62.log:798258:2011-03-09 08:23:46.710422 7f3633682940 osd62 5 handle_osd_ping wants heartbeat_lock
> osd.62.log:804833:2011-03-09 08:23:59.549923 7f3633682940 osd62 5 handle_osd_ping got heartbeat_lock
> osd.62.log:804834:2011-03-09 08:23:59.549940 7f3633682940 osd62 5 handle_osd_ping wants read on map_lock
> osd.62.log:804835:2011-03-09 08:23:59.549947 7f3633682940 osd62 5 handle_osd_ping got read on map_lock
> osd.62.log:804836:2011-03-09 08:23:59.549965 7f3633682940 osd62 5 _share_map_incoming osd80 172.17.40.32:6802/28608 5
> osd.62.log:804837:2011-03-09 08:23:59.549980 7f3633682940 osd62 5 take_peer_stat wants peer_stat_lock
> osd.62.log:804838:2011-03-09 08:23:59.549986 7f3633682940 osd62 5 take_peer_stat got peer_stat_lock
> osd.62.log:804839:2011-03-09 08:23:59.550001 7f3633682940 osd62 5 take_peer_stat peer osd80 stat(2011-03-09 08:23:46.703991 oprate=7.91906 qlen=4.5 recent_qlen=4.43333 rdlat=0 / 0 fshedin=0)
> osd.62.log:804840:2011-03-09 08:23:59.550009 7f3633682940 osd62 5 take_peer_stat dropping peer_stat_lock
> osd.62.log:804841:2011-03-09 08:23:59.550036 7f3633682940 osd62 5 handle_osd_ping dropping read on map_lock
> osd.62.log:804842:2011-03-09 08:23:59.550043 7f3633682940 osd62 5 handle_osd_ping dropping heartbeat_lock
> osd.62.log:804843:2011-03-09 08:23:59.550062 7f3633682940 -- 172.17.40.29:6820/28024 dispatch_throttle_release 61 to dispatch throttler 26840/35000000
> osd.62.log:804844:2011-03-09 08:23:59.550073 7f3633682940 -- 172.17.40.29:6820/28024 dispatch_entry done with 0x290c040 que_et 0.000078 op_et 12.839669 tot_et 12.839747
> 
> I still need to gather evidence on who is holding heartbeat_lock
> in cases like this. Still digging.....
> 
> I'm sorry it took me so long to find some evidence about what 
> was going on.
> 
> -- Jim
> Are you going through map spam at this point? If so, I'd pay special attention to update_heartbeat_peers, which is going to iterate through each PG (...in the cluster. Hmm, I thought that was never supposed to happen) on every map update. This has been known to take a bit of time (how many PGs does your cluster have at this point?), although it may be a case of the debugging taking up more time than the actual processing here.
-Greg




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-09 17:07                 ` Gregory Farnum
@ 2011-03-09 18:36                   ` Jim Schutt
  2011-03-09 19:37                     ` Gregory Farnum
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-09 18:36 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, ceph-devel


On Wed, 2011-03-09 at 10:07 -0700, Gregory Farnum wrote:
> On Wednesday, March 9, 2011 at 8:02 AM, Jim Schutt wrote:
> > Heh. Evidently I didn't do the instrumentation right.
> > Or more specifically, I didn't look in the right places
> > for the result I needed.
> > 
> > Once I understood the code/logging enough to write a
> > script to look for delayed osd_ping message processing,
> > I found evidence that the heartbeat lock is contended:

[snip]
> > 
> > I still need to gather evidence on who is holding heartbeat_lock
> > in cases like this. Still digging.....
> > 
> > I'm sorry it took me so long to find some evidence about what 
> > was going on.
> > 
> > -- Jim
> > Are you going through map spam at this point? If so, I'd pay 
> special attention to update_heartbeat_peers, which is going to 
> iterate through each PG (...in the cluster. Hmm, I thought that 
> was never supposed to happen) on every map update. This has been 
> known to take a bit of time (how many PGs does your cluster have 
> at this point?), although it may be a case of the debugging 
> taking up more time than the actual processing here.

Here's another example with more debugging.  The
PG count during this interval is:

2011-03-09 10:35:58.306942    pg v379: 25344 pgs: 25344 active+clean; 12119 MB data, 12025 MB used, 44579 GB / 44787 GB avail
2011-03-09 10:36:42.177728    pg v462: 25344 pgs: 25344 active+clean; 46375 MB data, 72672 MB used, 44520 GB / 44787 GB avail

Check out the interval 10:36:23.473356 -- 10:36:27.922262

It looks to me like a heartbeat message submission is 
waiting on something?

I still need to see if all my stalls exhibit this pattern,
or if this is an isolated instance.

osd.0.log:812629:2011-03-09 10:36:23.472432 7f7e427e7940 -- 172.17.40.21:6802/17592 >> 172.17.40.29:6811/15398 pipe(0xf0c920 sd=44 pgs=89 cs=1 l=0).reader got message 267 0xfd0560 osd_ping(e5 as_of 5) v1
osd.0.log:812643:2011-03-09 10:36:23.472839 7f7e4fc00940 -- 172.17.40.21:6802/17592 dispatch_entry pipe 0xf0c920 dequeued 0xfd0560
osd.0.log:812644:2011-03-09 10:36:23.472866 7f7e4fc00940 -- 172.17.40.21:6802/17592 <== osd59 172.17.40.29:6811/15398 267 ==== osd_ping(e5 as_of 5) v1 ==== 61+0+0 (3590659439 0 0) 0xfd0560 con 0xee85a0
osd.0.log:812645:2011-03-09 10:36:23.472894 7f7e4fc00940 osd0 5 heartbeat_dispatch 0xfd0560
osd.0.log:812646:2011-03-09 10:36:23.472928 7f7e4fc00940 osd0 5 handle_osd_ping from osd59 got stat stat(2011-03-09 10:36:23.471405 oprate=1.57375 qlen=0 recent_qlen=1.66667 rdlat=0 / 0 fshedin=0)
osd.0.log:812647:2011-03-09 10:36:23.472943 7f7e4fc00940 osd0 5 handle_osd_ping wants heartbeat_lock
osd.0.log:812648:2011-03-09 10:36:23.473008 7f7e4c9f8940 osd0 5 update_osd_stat osd_stat(594 MB used, 463 GB avail, 466 GB total, peers [9,11,14,15,16,17,19,24,25,29,31,32,36,37,38,39,40,42,43,46,47,51,53,54,56,58,59,61,62,64,65,67,68,70,72,73,77,80,81,87,90,91,93,94]/[8,9,12,14,15,16,17,18,21,22,23,24,25,26,27,29,31,32,34,36,37,38,39,40,42,43,44,46,49,50,51,58,60,65,66,69,70,71,74,75,77,78,79,81,82,86,88,89,90,91,93])
osd.0.log:812649:2011-03-09 10:36:23.473046 7f7e4c9f8940 osd0 5 _refresh_my_stat stat(2011-03-09 10:36:23.472606 oprate=3.55328 qlen=0 recent_qlen=3.33333 rdlat=0 / 0 fshedin=0)
osd.0.log:812650:2011-03-09 10:36:23.473099 7f7e4c9f8940 osd0 5 heartbeat: stat(2011-03-09 10:36:23.472606 oprate=3.55328 qlen=0 recent_qlen=3.33333 rdlat=0 / 0 fshedin=0)
osd.0.log:812651:2011-03-09 10:36:23.473128 7f7e4c9f8940 osd0 5 heartbeat: osd_stat(594 MB used, 463 GB avail, 466 GB total, peers [9,11,14,15,16,17,19,24,25,29,31,32,36,37,38,39,40,42,43,46,47,51,53,54,56,58,59,61,62,64,65,67,68,70,72,73,77,80,81,87,90,91,93,94]/[8,9,12,14,15,16,17,18,21,22,23,24,25,26,27,29,31,32,34,36,37,38,39,40,42,43,44,46,49,50,51,58,60,65,66,69,70,71,74,75,77,78,79,81,82,86,88,89,90,91,93])
osd.0.log:812652:2011-03-09 10:36:23.473137 7f7e4c9f8940 osd0 5 heartbeat map_locked=1
osd.0.log:812653:2011-03-09 10:36:23.473149 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd8
osd.0.log:812654:2011-03-09 10:36:23.473161 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd8
osd.0.log:812655:2011-03-09 10:36:23.473177 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd8 172.17.40.22:6802/20244 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e446fbc10
osd.0.log:812656:2011-03-09 10:36:23.473195 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.22:6802/20244, have pipe.
osd.0.log:812657:2011-03-09 10:36:23.473216 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd9
osd.0.log:812658:2011-03-09 10:36:23.473230 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd9
osd.0.log:812659:2011-03-09 10:36:23.473247 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd9 172.17.40.22:6805/20349 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e441d7270
osd.0.log:812661:2011-03-09 10:36:23.473288 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.22:6805/20349, have pipe.
osd.0.log:812663:2011-03-09 10:36:23.473325 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd12
osd.0.log:812664:2011-03-09 10:36:23.473338 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd12
osd.0.log:812665:2011-03-09 10:36:23.473356 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd12 172.17.40.22:6814/20653 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4428ad40
osd.0.log:812893:2011-03-09 10:36:24.052769 7f7e5640d940 osd0 5 tick getting read lock on map_lock
osd.0.log:812894:2011-03-09 10:36:24.052779 7f7e5640d940 osd0 5 tick got read lock on map_lock
osd.0.log:812896:2011-03-09 10:36:24.052856 7f7e5640d940 osd0 5 tick wants heatbeat_lock
osd.0.log:813027:2011-03-09 10:36:24.507893 7f7e54c0a940 osd0 5 get_my_stat_for wants heartbeat_lock
osd.0.log:813992:2011-03-09 10:36:27.922262 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.22:6814/20653, have pipe.
osd.0.log:813993:2011-03-09 10:36:27.922299 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd14
osd.0.log:813994:2011-03-09 10:36:27.922309 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd14
osd.0.log:813995:2011-03-09 10:36:27.922322 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd14 172.17.40.22:6820/20861 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e442c9620
osd.0.log:813996:2011-03-09 10:36:27.922336 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.22:6820/20861, have pipe.
osd.0.log:813997:2011-03-09 10:36:27.922367 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd15
osd.0.log:813998:2011-03-09 10:36:27.922376 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd15
osd.0.log:813999:2011-03-09 10:36:27.922388 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd15 172.17.40.22:6823/20953 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e442bb420
osd.0.log:814000:2011-03-09 10:36:27.922402 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.22:6823/20953, have pipe.
osd.0.log:814001:2011-03-09 10:36:27.922412 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd16
osd.0.log:814002:2011-03-09 10:36:27.922421 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd16
osd.0.log:814003:2011-03-09 10:36:27.922432 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd16 172.17.40.23:6802/11814 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4427ef80
osd.0.log:814241:2011-03-09 10:36:27.930775 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.23:6802/11814, have pipe.
osd.0.log:814242:2011-03-09 10:36:27.930805 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd17
osd.0.log:814243:2011-03-09 10:36:27.930815 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd17
osd.0.log:814244:2011-03-09 10:36:27.930828 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd17 172.17.40.23:6805/11921 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e441797e0
osd.0.log:814245:2011-03-09 10:36:27.930840 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.23:6805/11921, have pipe.
osd.0.log:814246:2011-03-09 10:36:27.930855 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd18
osd.0.log:814247:2011-03-09 10:36:27.930863 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd18
osd.0.log:814248:2011-03-09 10:36:27.930873 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd18 172.17.40.23:6808/12026 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e442bac90
osd.0.log:814249:2011-03-09 10:36:27.930884 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.23:6808/12026, have pipe.
osd.0.log:814250:2011-03-09 10:36:27.930902 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd21
osd.0.log:814251:2011-03-09 10:36:27.930913 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd21
osd.0.log:814252:2011-03-09 10:36:27.930923 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd21 172.17.40.23:6817/12326 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e442bb260
osd.0.log:814423:2011-03-09 10:36:27.944270 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.23:6817/12326, have pipe.
osd.0.log:814424:2011-03-09 10:36:27.944291 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd22
osd.0.log:814425:2011-03-09 10:36:27.944318 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd22
osd.0.log:814426:2011-03-09 10:36:27.944330 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd22 172.17.40.23:6820/12418 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e441ded40
osd.0.log:814427:2011-03-09 10:36:27.944343 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.23:6820/12418, have pipe.
osd.0.log:814428:2011-03-09 10:36:27.944358 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd23
osd.0.log:814429:2011-03-09 10:36:27.944367 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd23
osd.0.log:814430:2011-03-09 10:36:27.944376 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd23 172.17.40.23:6823/12534 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e441afaa0
osd.0.log:814431:2011-03-09 10:36:27.944388 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.23:6823/12534, have pipe.
osd.0.log:814432:2011-03-09 10:36:27.944401 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd24
osd.0.log:814433:2011-03-09 10:36:27.944411 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd24
osd.0.log:814434:2011-03-09 10:36:27.944421 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd24 172.17.40.24:6802/7915 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4417fd50
osd.0.log:814435:2011-03-09 10:36:27.944431 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.24:6802/7915, have pipe.
osd.0.log:814436:2011-03-09 10:36:27.944448 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd25
osd.0.log:814437:2011-03-09 10:36:27.944457 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd25
osd.0.log:814438:2011-03-09 10:36:27.944468 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd25 172.17.40.24:6805/8007 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4473d410
osd.0.log:814439:2011-03-09 10:36:27.944479 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.24:6805/8007, have pipe.
osd.0.log:814440:2011-03-09 10:36:27.944494 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd26
osd.0.log:814441:2011-03-09 10:36:27.944503 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd26
osd.0.log:814442:2011-03-09 10:36:27.944514 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd26 172.17.40.24:6808/8112 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e441a70a0
osd.0.log:814443:2011-03-09 10:36:27.944525 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.24:6808/8112, have pipe.
osd.0.log:814444:2011-03-09 10:36:27.944538 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd27
osd.0.log:814445:2011-03-09 10:36:27.944547 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd27
osd.0.log:814446:2011-03-09 10:36:27.944557 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd27 172.17.40.24:6811/8215 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44080b70
osd.0.log:814447:2011-03-09 10:36:27.944567 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.24:6811/8215, have pipe.
osd.0.log:814448:2011-03-09 10:36:27.944582 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd29
osd.0.log:814449:2011-03-09 10:36:27.944592 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd29
osd.0.log:814450:2011-03-09 10:36:27.944602 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd29 172.17.40.24:6817/8412 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44231f10
osd.0.log:814451:2011-03-09 10:36:27.944613 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.24:6817/8412, have pipe.
osd.0.log:814452:2011-03-09 10:36:27.944627 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd31
osd.0.log:814453:2011-03-09 10:36:27.944635 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd31
osd.0.log:814454:2011-03-09 10:36:27.944646 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd31 172.17.40.24:6823/8620 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44064bd0
osd.0.log:814455:2011-03-09 10:36:27.944657 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.24:6823/8620, have pipe.
osd.0.log:814456:2011-03-09 10:36:27.944677 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd32
osd.0.log:814457:2011-03-09 10:36:27.944686 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd32
osd.0.log:814458:2011-03-09 10:36:27.944695 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd32 172.17.40.25:6802/16712 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e446fbdd0
osd.0.log:814459:2011-03-09 10:36:27.944706 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.25:6802/16712, have pipe.
osd.0.log:814460:2011-03-09 10:36:27.944722 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd34
osd.0.log:814461:2011-03-09 10:36:27.944744 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd34
osd.0.log:814462:2011-03-09 10:36:27.944755 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd34 172.17.40.25:6808/16908 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44222a70
osd.0.log:814463:2011-03-09 10:36:27.944765 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.25:6808/16908, have pipe.
osd.0.log:814464:2011-03-09 10:36:27.944781 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd36
osd.0.log:814465:2011-03-09 10:36:27.944790 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd36
osd.0.log:814466:2011-03-09 10:36:27.944799 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd36 172.17.40.25:6814/17105 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e441776d0
osd.0.log:814467:2011-03-09 10:36:27.944810 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.25:6814/17105, have pipe.
osd.0.log:814468:2011-03-09 10:36:27.944828 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd37
osd.0.log:814469:2011-03-09 10:36:27.944837 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd37
osd.0.log:814470:2011-03-09 10:36:27.944847 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd37 172.17.40.25:6817/17208 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4400c970
osd.0.log:814471:2011-03-09 10:36:27.944861 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.25:6817/17208, have pipe.
osd.0.log:814472:2011-03-09 10:36:27.944878 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd38
osd.0.log:814473:2011-03-09 10:36:27.944887 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd38
osd.0.log:814474:2011-03-09 10:36:27.944897 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd38 172.17.40.25:6820/17313 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e446f1b90
osd.0.log:814475:2011-03-09 10:36:27.944907 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.25:6820/17313, have pipe.
osd.0.log:814476:2011-03-09 10:36:27.944920 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd39
osd.0.log:814477:2011-03-09 10:36:27.944929 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd39
osd.0.log:814478:2011-03-09 10:36:27.944940 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd39 172.17.40.25:6823/17405 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e446cb210
osd.0.log:814701:2011-03-09 10:36:27.958449 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.25:6823/17405, have pipe.
osd.0.log:814702:2011-03-09 10:36:27.958490 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd40
osd.0.log:814703:2011-03-09 10:36:27.958501 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd40
osd.0.log:814704:2011-03-09 10:36:27.958514 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd40 172.17.40.27:6802/15995 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e446cb7b0
osd.0.log:814705:2011-03-09 10:36:27.958533 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.27:6802/15995, have pipe.
osd.0.log:814706:2011-03-09 10:36:27.958553 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd42
osd.0.log:814707:2011-03-09 10:36:27.958560 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd42
osd.0.log:814708:2011-03-09 10:36:27.958571 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd42 172.17.40.27:6808/16203 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e446cb970
osd.0.log:814709:2011-03-09 10:36:27.958582 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.27:6808/16203, have pipe.
osd.0.log:814710:2011-03-09 10:36:27.958595 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd43
osd.0.log:814711:2011-03-09 10:36:27.958605 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd43
osd.0.log:814712:2011-03-09 10:36:27.958614 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd43 172.17.40.27:6811/16295 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e446cbb30
osd.0.log:814713:2011-03-09 10:36:27.958625 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.27:6811/16295, have pipe.
osd.0.log:814714:2011-03-09 10:36:27.958654 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd44
osd.0.log:814715:2011-03-09 10:36:27.958665 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd44
osd.0.log:814716:2011-03-09 10:36:27.958681 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd44 172.17.40.27:6814/16403 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44173010
osd.0.log:814718:2011-03-09 10:36:27.958761 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.27:6814/16403, have pipe.
osd.0.log:814719:2011-03-09 10:36:27.958780 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd46
osd.0.log:814720:2011-03-09 10:36:27.958789 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd46
osd.0.log:814721:2011-03-09 10:36:27.958799 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd46 172.17.40.27:6820/16595 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44173250
osd.0.log:814722:2011-03-09 10:36:27.958809 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.27:6820/16595, have pipe.
osd.0.log:814723:2011-03-09 10:36:27.958823 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd49
osd.0.log:814724:2011-03-09 10:36:27.958832 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd49
osd.0.log:814725:2011-03-09 10:36:27.958843 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd49 172.17.40.28:6805/14276 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44173490
osd.0.log:814853:2011-03-09 10:36:27.978244 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.28:6805/14276, have pipe.
osd.0.log:814854:2011-03-09 10:36:27.978282 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd50
osd.0.log:814855:2011-03-09 10:36:27.978293 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd50
osd.0.log:814857:2011-03-09 10:36:27.978494 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd50 172.17.40.28:6808/14368 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44173810
osd.0.log:814858:2011-03-09 10:36:27.978526 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.28:6808/14368, have pipe.
osd.0.log:814859:2011-03-09 10:36:27.978560 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd51
osd.0.log:814860:2011-03-09 10:36:27.978571 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd51
osd.0.log:814861:2011-03-09 10:36:27.978582 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd51 172.17.40.28:6811/14484 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e441739d0
osd.0.log:814862:2011-03-09 10:36:27.978592 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.28:6811/14484, have pipe.
osd.0.log:814863:2011-03-09 10:36:27.978607 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd58
osd.0.log:814864:2011-03-09 10:36:27.978629 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd58
osd.0.log:814865:2011-03-09 10:36:27.978639 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd58 172.17.40.29:6808/15293 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44173b90
osd.0.log:814866:2011-03-09 10:36:27.978651 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.29:6808/15293, have pipe.
osd.0.log:814867:2011-03-09 10:36:27.978748 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd60
osd.0.log:814868:2011-03-09 10:36:27.978758 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd60
osd.0.log:814869:2011-03-09 10:36:27.978771 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd60 172.17.40.29:6814/15490 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44173d50
osd.0.log:814870:2011-03-09 10:36:27.978784 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.29:6814/15490, have pipe.
osd.0.log:814871:2011-03-09 10:36:27.978801 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd65
osd.0.log:814872:2011-03-09 10:36:27.978814 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd65
osd.0.log:814873:2011-03-09 10:36:27.978827 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd65 172.17.40.30:6805/17025 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4460d050
osd.0.log:814874:2011-03-09 10:36:27.978839 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.30:6805/17025, have pipe.
osd.0.log:814875:2011-03-09 10:36:27.978860 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd66
osd.0.log:814876:2011-03-09 10:36:27.978871 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd66
osd.0.log:814877:2011-03-09 10:36:27.978881 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd66 172.17.40.30:6808/17117 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4460d250
osd.0.log:814878:2011-03-09 10:36:27.978894 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.30:6808/17117, have pipe.
osd.0.log:814879:2011-03-09 10:36:27.978931 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd69
osd.0.log:814880:2011-03-09 10:36:27.978940 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd69
osd.0.log:814881:2011-03-09 10:36:27.978951 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd69 172.17.40.30:6817/17417 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4460d490
osd.0.log:814882:2011-03-09 10:36:27.978966 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.30:6817/17417, have pipe.
osd.0.log:814883:2011-03-09 10:36:27.978986 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd70
osd.0.log:814884:2011-03-09 10:36:27.978995 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd70
osd.0.log:814885:2011-03-09 10:36:27.979005 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd70 172.17.40.30:6820/17522 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4460d6d0
osd.0.log:814886:2011-03-09 10:36:27.979017 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.30:6820/17522, have pipe.
osd.0.log:814887:2011-03-09 10:36:27.979040 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd71
osd.0.log:814888:2011-03-09 10:36:27.979049 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd71
osd.0.log:814889:2011-03-09 10:36:27.979060 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd71 172.17.40.30:6823/17625 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4460d910
osd.0.log:814890:2011-03-09 10:36:27.979072 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.30:6823/17625, have pipe.
osd.0.log:814891:2011-03-09 10:36:27.979090 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd74
osd.0.log:814892:2011-03-09 10:36:27.979099 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd74
osd.0.log:814893:2011-03-09 10:36:27.979109 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd74 172.17.40.31:6808/15828 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4460db50
osd.0.log:814894:2011-03-09 10:36:27.979120 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.31:6808/15828, have pipe.
osd.0.log:814895:2011-03-09 10:36:27.979139 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd75
osd.0.log:814896:2011-03-09 10:36:27.979148 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd75
osd.0.log:814897:2011-03-09 10:36:27.979159 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd75 172.17.40.31:6811/15931 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4460ddb0
osd.0.log:814898:2011-03-09 10:36:27.979170 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.31:6811/15931, have pipe.
osd.0.log:814899:2011-03-09 10:36:27.979221 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd77
osd.0.log:814900:2011-03-09 10:36:27.979232 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd77
osd.0.log:814901:2011-03-09 10:36:27.979243 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd77 172.17.40.31:6817/16128 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44663010
osd.0.log:814902:2011-03-09 10:36:27.979255 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.31:6817/16128, have pipe.
osd.0.log:814903:2011-03-09 10:36:27.979272 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd78
osd.0.log:814904:2011-03-09 10:36:27.979281 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd78
osd.0.log:814905:2011-03-09 10:36:27.979292 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd78 172.17.40.31:6820/16233 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44663230
osd.0.log:814906:2011-03-09 10:36:27.979303 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.31:6820/16233, have pipe.
osd.0.log:814907:2011-03-09 10:36:27.979321 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd79
osd.0.log:814908:2011-03-09 10:36:27.979331 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd79
osd.0.log:814909:2011-03-09 10:36:27.979341 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd79 172.17.40.31:6823/16340 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44663490
osd.0.log:814910:2011-03-09 10:36:27.979354 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.31:6823/16340, have pipe.
osd.0.log:814911:2011-03-09 10:36:27.979375 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd81
osd.0.log:814912:2011-03-09 10:36:27.979384 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd81
osd.0.log:814913:2011-03-09 10:36:27.979395 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd81 172.17.40.32:6805/15965 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e446636f0
osd.0.log:814914:2011-03-09 10:36:27.979407 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.32:6805/15965, have pipe.
osd.0.log:814915:2011-03-09 10:36:27.979428 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd82
osd.0.log:814916:2011-03-09 10:36:27.979449 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd82
osd.0.log:814917:2011-03-09 10:36:27.979461 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd82 172.17.40.32:6808/16068 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44663950
osd.0.log:814918:2011-03-09 10:36:27.979475 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.32:6808/16068, have pipe.
osd.0.log:815022:2011-03-09 10:36:27.984720 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd86
osd.0.log:815023:2011-03-09 10:36:27.984741 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd86
osd.0.log:815024:2011-03-09 10:36:27.984756 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd86 172.17.40.32:6820/16464 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44663bb0
osd.0.log:815025:2011-03-09 10:36:27.984772 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.32:6820/16464, have pipe.
osd.0.log:815026:2011-03-09 10:36:27.984791 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd88
osd.0.log:815027:2011-03-09 10:36:27.984799 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd88
osd.0.log:815028:2011-03-09 10:36:27.984828 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd88 172.17.40.33:6802/23728 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e44663d70
osd.0.log:815029:2011-03-09 10:36:27.984839 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.33:6802/23728, have pipe.
osd.0.log:815030:2011-03-09 10:36:27.984854 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd89
osd.0.log:815031:2011-03-09 10:36:27.984862 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd89
osd.0.log:815032:2011-03-09 10:36:27.984872 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd89 172.17.40.33:6805/23820 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4466d010
osd.0.log:815033:2011-03-09 10:36:27.984882 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.33:6805/23820, have pipe.
osd.0.log:815034:2011-03-09 10:36:27.984895 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd90
osd.0.log:815035:2011-03-09 10:36:27.984903 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd90
osd.0.log:815036:2011-03-09 10:36:27.984912 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd90 172.17.40.33:6808/23912 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4466d250
osd.0.log:815037:2011-03-09 10:36:27.984922 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.33:6808/23912, have pipe.
osd.0.log:815038:2011-03-09 10:36:27.984936 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd91
osd.0.log:815039:2011-03-09 10:36:27.984943 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd91
osd.0.log:815040:2011-03-09 10:36:27.984952 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd91 172.17.40.33:6811/24028 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4466d490
osd.0.log:815041:2011-03-09 10:36:27.984962 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.33:6811/24028, have pipe.
osd.0.log:815042:2011-03-09 10:36:27.984975 7f7e4c9f8940 osd0 5 heartbeat allocating ping for osd93
osd.0.log:815043:2011-03-09 10:36:27.984983 7f7e4c9f8940 osd0 5 heartbeat sending ping to osd93
osd.0.log:815044:2011-03-09 10:36:27.984992 7f7e4c9f8940 -- 172.17.40.21:6802/17592 --> osd93 172.17.40.33:6817/24229 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f7e4466d6d0
osd.0.log:815045:2011-03-09 10:36:27.985002 7f7e4c9f8940 -- 172.17.40.21:6802/17592 submit_message osd_ping(e5 as_of 5) v1 remote, 172.17.40.33:6817/24229, have pipe.
osd.0.log:815046:2011-03-09 10:36:27.985014 7f7e4c9f8940 osd0 5 heartbeat check
osd.0.log:815047:2011-03-09 10:36:27.985048 7f7e4c9f8940 osd0 5 heartbeat lonely?
osd.0.log:815048:2011-03-09 10:36:27.985054 7f7e4c9f8940 osd0 5 heartbeat put map_lock
osd.0.log:815049:2011-03-09 10:36:27.985060 7f7e4c9f8940 osd0 5 heartbeat done, dropping heatbeat_lock
osd.0.log:815050:2011-03-09 10:36:27.985066 7f7e4c9f8940 osd0 5 heartbeat dropping peer_stat_lock
osd.0.log:815051:2011-03-09 10:36:27.985078 7f7e4c9f8940 osd0 5 heartbeat_entry sleeping via heatbeat_lock for 0.9
osd.0.log:815082:2011-03-09 10:36:27.985534 7f7e4fc00940 osd0 5 handle_osd_ping got heartbeat_lock
osd.0.log:815083:2011-03-09 10:36:27.985542 7f7e4fc00940 osd0 5 handle_osd_ping wants read on map_lock
osd.0.log:815084:2011-03-09 10:36:27.985548 7f7e4fc00940 osd0 5 handle_osd_ping got read on map_lock
osd.0.log:815085:2011-03-09 10:36:27.985560 7f7e4fc00940 osd0 5 _share_map_incoming osd59 172.17.40.29:6811/15398 5
osd.0.log:815086:2011-03-09 10:36:27.985569 7f7e4fc00940 osd0 5 take_peer_stat wants peer_stat_lock
osd.0.log:815087:2011-03-09 10:36:27.985575 7f7e4fc00940 osd0 5 take_peer_stat got peer_stat_lock
osd.0.log:815088:2011-03-09 10:36:27.985599 7f7e4fc00940 osd0 5 take_peer_stat peer osd59 stat(2011-03-09 10:36:23.471405 oprate=1.57375 qlen=0 recent_qlen=1.66667 rdlat=0 / 0 fshedin=0)
osd.0.log:815089:2011-03-09 10:36:27.985606 7f7e4fc00940 osd0 5 take_peer_stat dropping peer_stat_lock
osd.0.log:815090:2011-03-09 10:36:27.985614 7f7e4fc00940 osd0 5 handle_osd_ping dropping read on map_lock
osd.0.log:815091:2011-03-09 10:36:27.985620 7f7e4fc00940 osd0 5 handle_osd_ping dropping heartbeat_lock
osd.0.log:815092:2011-03-09 10:36:27.985634 7f7e4fc00940 -- 172.17.40.21:6802/17592 dispatch_throttle_release 61 to dispatch throttler 7625/35000000
osd.0.log:815093:2011-03-09 10:36:27.985643 7f7e4fc00940 -- 172.17.40.21:6802/17592 dispatch_entry done with 0xfd0560 que_et 0.000449 op_et 4.512749 tot_et 4.513198

-- Jim

> -Greg
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-09 18:36                   ` Jim Schutt
@ 2011-03-09 19:37                     ` Gregory Farnum
  2011-03-10 23:09                       ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Gregory Farnum @ 2011-03-09 19:37 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Sage Weil, ceph-devel


On Wednesday, March 9, 2011 at 10:36 AM, Jim Schutt wrote:
> Here's another example with more debugging. The
> PG count during this interval is:
> 
> 2011-03-09 10:35:58.306942 pg v379: 25344 pgs: 25344 active+clean; 12119 MB data, 12025 MB used, 44579 GB / 44787 GB avail
> 2011-03-09 10:36:42.177728 pg v462: 25344 pgs: 25344 active+clean; 46375 MB data, 72672 MB used, 44520 GB / 44787 GB avail
> 
> Check out the interval 10:36:23.473356 -- 10:36:27.922262
> 
> It looks to me like a heartbeat message submission is 
> waiting on something?

Yes, it sure does. The only thing that should block between those output messages is getting the messenger lock, which *ought* be fast. Either there are a lot of threads trying to send messages and the heartbeat thread is just getting unlucky, or there's a mistake in where and how the messenger locks (which is certainly possible, but in a brief audit it looks correct).
-Greg




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-09 19:37                     ` Gregory Farnum
@ 2011-03-10 23:09                       ` Jim Schutt
  2011-03-10 23:21                         ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-10 23:09 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, ceph-devel


On Wed, 2011-03-09 at 12:37 -0700, Gregory Farnum wrote:
> On Wednesday, March 9, 2011 at 10:36 AM, Jim Schutt wrote:
> > Here's another example with more debugging. The
> > PG count during this interval is:
> > 
> > 2011-03-09 10:35:58.306942 pg v379: 25344 pgs: 25344 active+clean; 12119 MB data, 12025 MB used, 44579 GB / 44787 GB avail
> > 2011-03-09 10:36:42.177728 pg v462: 25344 pgs: 25344 active+clean; 46375 MB data, 72672 MB used, 44520 GB / 44787 GB avail
> > 
> > Check out the interval 10:36:23.473356 -- 10:36:27.922262
> > 
> > It looks to me like a heartbeat message submission is 
> > waiting on something?
> 
> Yes, it sure does. The only thing that should block between those output 
> messages is getting the messenger lock, which *ought* be fast. Either 
> there are a lot of threads trying to send messages and the heartbeat 
> thread is just getting unlucky, or there's a mistake in where and how 
> the messenger locks (which is certainly possible, but in a brief 
> audit it looks correct).

Or, delete is broken on my systems.  With some extra diagnostics, 
I get many instances of this sort of thing:

osd.10.log:946307:2011-03-10 15:38:38.519444 7fe9e1170940 -- 172.17.40.22:6808/16890 --> osd17 172.17.40.23:6805/8181 -- osd_ping(e13 as_of 13) v1 -- ?+0 0x7fe9ac4041f0
osd.10.log:946348:2011-03-10 15:38:38.520124 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).writer encoding 310 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
osd.10.log:946349:2011-03-10 15:38:38.520142 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).writer sending 310 0x7fe9ac4041f0
osd.10.log:946350:2011-03-10 15:38:38.520156 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).write_message 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
osd.10.log:949167:2011-03-10 15:38:38.800447 7fe9c8ccc940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).reader got ack seq 310 >= 310 on 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
osd.10.log:954385:2011-03-10 15:38:46.184453 7fe9c8ccc940 RefCountedObject::put delete 0x7fe9ac4041f0 took 7.345873 secs!
osd.10.log:954386:2011-03-10 15:38:46.184471 7fe9c8ccc940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).handle_ack finished put on 0x7fe9ac4041f0

osd.10.log:954785:2011-03-10 15:38:46.192022 7fe9e1170940 -- 172.17.40.22:6808/16890 --> osd46 172.17.40.27:6820/12936 -- osd_ping(e13 as_of 13) v1 -- ?+0 0x7fe9b4823d30
osd.10.log:955206:2011-03-10 15:38:46.205457 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).writer encoding 322 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
osd.10.log:955207:2011-03-10 15:38:46.205480 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).writer sending 322 0x7fe9b4823d30
osd.10.log:955208:2011-03-10 15:38:46.205494 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).write_message 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
osd.10.log:960397:2011-03-10 15:38:46.833161 7fe9d0444940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).reader got ack seq 322 >= 322 on 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
osd.10.log:969858:2011-03-10 15:38:58.211206 7fe9d0444940 RefCountedObject::put delete 0x7fe9b4823d30 took 11.378036 secs!
osd.10.log:969859:2011-03-10 15:38:58.211219 7fe9d0444940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).handle_ack finished put on 0x7fe9b4823d30


Since handle_ack() is under pipe_lock, heartbeat() cannot
queue new osd_ping messages until Message::put() completes,
right?

It turns out my systems don't have tcmalloc.  Do you
think using it would help?

-- Jim

> -Greg




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-10 23:09                       ` Jim Schutt
@ 2011-03-10 23:21                         ` Sage Weil
  2011-03-10 23:32                           ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-10 23:21 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Thu, 10 Mar 2011, Jim Schutt wrote:
> On Wed, 2011-03-09 at 12:37 -0700, Gregory Farnum wrote:
> > On Wednesday, March 9, 2011 at 10:36 AM, Jim Schutt wrote:
> > > Here's another example with more debugging. The
> > > PG count during this interval is:
> > > 
> > > 2011-03-09 10:35:58.306942 pg v379: 25344 pgs: 25344 active+clean; 12119 MB data, 12025 MB used, 44579 GB / 44787 GB avail
> > > 2011-03-09 10:36:42.177728 pg v462: 25344 pgs: 25344 active+clean; 46375 MB data, 72672 MB used, 44520 GB / 44787 GB avail
> > > 
> > > Check out the interval 10:36:23.473356 -- 10:36:27.922262
> > > 
> > > It looks to me like a heartbeat message submission is 
> > > waiting on something?
> > 
> > Yes, it sure does. The only thing that should block between those output 
> > messages is getting the messenger lock, which *ought* be fast. Either 
> > there are a lot of threads trying to send messages and the heartbeat 
> > thread is just getting unlucky, or there's a mistake in where and how 
> > the messenger locks (which is certainly possible, but in a brief 
> > audit it looks correct).
> 
> Or, delete is broken on my systems.  With some extra diagnostics, 
> I get many instances of this sort of thing:
> 
> osd.10.log:946307:2011-03-10 15:38:38.519444 7fe9e1170940 -- 172.17.40.22:6808/16890 --> osd17 172.17.40.23:6805/8181 -- osd_ping(e13 as_of 13) v1 -- ?+0 0x7fe9ac4041f0
> osd.10.log:946348:2011-03-10 15:38:38.520124 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).writer encoding 310 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
> osd.10.log:946349:2011-03-10 15:38:38.520142 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).writer sending 310 0x7fe9ac4041f0
> osd.10.log:946350:2011-03-10 15:38:38.520156 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).write_message 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
> osd.10.log:949167:2011-03-10 15:38:38.800447 7fe9c8ccc940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).reader got ack seq 310 >= 310 on 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
> osd.10.log:954385:2011-03-10 15:38:46.184453 7fe9c8ccc940 RefCountedObject::put delete 0x7fe9ac4041f0 took 7.345873 secs!
> osd.10.log:954386:2011-03-10 15:38:46.184471 7fe9c8ccc940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).handle_ack finished put on 0x7fe9ac4041f0
> 
> osd.10.log:954785:2011-03-10 15:38:46.192022 7fe9e1170940 -- 172.17.40.22:6808/16890 --> osd46 172.17.40.27:6820/12936 -- osd_ping(e13 as_of 13) v1 -- ?+0 0x7fe9b4823d30
> osd.10.log:955206:2011-03-10 15:38:46.205457 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).writer encoding 322 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
> osd.10.log:955207:2011-03-10 15:38:46.205480 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).writer sending 322 0x7fe9b4823d30
> osd.10.log:955208:2011-03-10 15:38:46.205494 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).write_message 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
> osd.10.log:960397:2011-03-10 15:38:46.833161 7fe9d0444940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).reader got ack seq 322 >= 322 on 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
> osd.10.log:969858:2011-03-10 15:38:58.211206 7fe9d0444940 RefCountedObject::put delete 0x7fe9b4823d30 took 11.378036 secs!
> osd.10.log:969859:2011-03-10 15:38:58.211219 7fe9d0444940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).handle_ack finished put on 0x7fe9b4823d30
> 
> Since handle_ack() is under pipe_lock, heartbeat() cannot
> queue new osd_ping messages until Message::put() completes,
> right?

Right.

> It turns out my systems don't have tcmalloc.  Do you
> think using it would help?

Hmm, maybe.  I wouldn't expect this behavior from any allocator, though!

Can you drill down a bit further and see if either of these is 
responsible?

  virtual ~Message() { 
    assert(nref.read() == 0);
    if (connection)
      connection->put();
    if (throttler)
      throttler->put(payload.length() + middle.length() + data.length());
  }

(msg/Message.h)

Thanks!
sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-10 23:21                         ` Sage Weil
@ 2011-03-10 23:32                           ` Jim Schutt
  2011-03-10 23:40                             ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-10 23:32 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Thu, 2011-03-10 at 16:21 -0700, Sage Weil wrote:
> On Thu, 10 Mar 2011, Jim Schutt wrote:
> > On Wed, 2011-03-09 at 12:37 -0700, Gregory Farnum wrote:
> > > On Wednesday, March 9, 2011 at 10:36 AM, Jim Schutt wrote:
> > > > Here's another example with more debugging. The
> > > > PG count during this interval is:
> > > > 
> > > > 2011-03-09 10:35:58.306942 pg v379: 25344 pgs: 25344 active+clean; 12119 MB data, 12025 MB used, 44579 GB / 44787 GB avail
> > > > 2011-03-09 10:36:42.177728 pg v462: 25344 pgs: 25344 active+clean; 46375 MB data, 72672 MB used, 44520 GB / 44787 GB avail
> > > > 
> > > > Check out the interval 10:36:23.473356 -- 10:36:27.922262
> > > > 
> > > > It looks to me like a heartbeat message submission is 
> > > > waiting on something?
> > > 
> > > Yes, it sure does. The only thing that should block between those output 
> > > messages is getting the messenger lock, which *ought* be fast. Either 
> > > there are a lot of threads trying to send messages and the heartbeat 
> > > thread is just getting unlucky, or there's a mistake in where and how 
> > > the messenger locks (which is certainly possible, but in a brief 
> > > audit it looks correct).
> > 
> > Or, delete is broken on my systems.  With some extra diagnostics, 
> > I get many instances of this sort of thing:
> > 
> > osd.10.log:946307:2011-03-10 15:38:38.519444 7fe9e1170940 -- 172.17.40.22:6808/16890 --> osd17 172.17.40.23:6805/8181 -- osd_ping(e13 as_of 13) v1 -- ?+0 0x7fe9ac4041f0
> > osd.10.log:946348:2011-03-10 15:38:38.520124 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).writer encoding 310 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
> > osd.10.log:946349:2011-03-10 15:38:38.520142 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).writer sending 310 0x7fe9ac4041f0
> > osd.10.log:946350:2011-03-10 15:38:38.520156 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).write_message 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
> > osd.10.log:949167:2011-03-10 15:38:38.800447 7fe9c8ccc940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).reader got ack seq 310 >= 310 on 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
> > osd.10.log:954385:2011-03-10 15:38:46.184453 7fe9c8ccc940 RefCountedObject::put delete 0x7fe9ac4041f0 took 7.345873 secs!
> > osd.10.log:954386:2011-03-10 15:38:46.184471 7fe9c8ccc940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).handle_ack finished put on 0x7fe9ac4041f0
> > 
> > osd.10.log:954785:2011-03-10 15:38:46.192022 7fe9e1170940 -- 172.17.40.22:6808/16890 --> osd46 172.17.40.27:6820/12936 -- osd_ping(e13 as_of 13) v1 -- ?+0 0x7fe9b4823d30
> > osd.10.log:955206:2011-03-10 15:38:46.205457 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).writer encoding 322 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
> > osd.10.log:955207:2011-03-10 15:38:46.205480 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).writer sending 322 0x7fe9b4823d30
> > osd.10.log:955208:2011-03-10 15:38:46.205494 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).write_message 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
> > osd.10.log:960397:2011-03-10 15:38:46.833161 7fe9d0444940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).reader got ack seq 322 >= 322 on 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
> > osd.10.log:969858:2011-03-10 15:38:58.211206 7fe9d0444940 RefCountedObject::put delete 0x7fe9b4823d30 took 11.378036 secs!
> > osd.10.log:969859:2011-03-10 15:38:58.211219 7fe9d0444940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).handle_ack finished put on 0x7fe9b4823d30
> > 
> > Since handle_ack() is under pipe_lock, heartbeat() cannot
> > queue new osd_ping messages until Message::put() completes,
> > right?
> 
> Right.
> 
> > It turns out my systems don't have tcmalloc.  Do you
> > think using it would help?
> 
> Hmm, maybe.  I wouldn't expect this behavior from any allocator, though!
> 
> Can you drill down a bit further and see if either of these is 
> responsible?
> 
>   virtual ~Message() { 
>     assert(nref.read() == 0);
>     if (connection)
>       connection->put();
>     if (throttler)
>       throttler->put(payload.length() + middle.length() + data.length());
>   }
> 
> (msg/Message.h)

Hmmm, this is the patch I'm running to produce above.
It seems pretty definitive to me; am I missing something?

(I moved handle_ack() implementation into .cc to make
dout work via debug osd setting.)

From c103fc342eec412a041188031aff484c2fd3feea Mon Sep 17 00:00:00 2001
From: Jim Schutt <jaschut@sandia.gov>
Date: Thu, 10 Mar 2011 16:26:43 -0700
Subject: [PATCH] Instrument ack handling.

---
 src/msg/Message.h          |    8 +++++++-
 src/msg/SimpleMessenger.cc |   17 +++++++++++++++++
 src/msg/SimpleMessenger.h  |   15 +--------------
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/src/msg/Message.h b/src/msg/Message.h
index 3758b1b..ac32b94 100644
--- a/src/msg/Message.h
+++ b/src/msg/Message.h
@@ -154,8 +154,14 @@ struct RefCountedObject {
   }
   void put() {
     //generic_dout(0) << "RefCountedObject::put " << this << " " << nref.read() << " -> " << (nref.read() - 1) << dendl;
-    if (nref.dec() == 0)
+    if (nref.dec() == 0) {
+      utime_t s = g_clock.now();
       delete this;
+      utime_t e = g_clock.now();
+      if (e - s > 0.5) {
+	generic_dout(1) << "RefCountedObject::put delete " << this << " took " << e - s << " secs!" << dendl;
+      }
+    }
   }
 };
 
diff --git a/src/msg/SimpleMessenger.cc b/src/msg/SimpleMessenger.cc
index 7df3d44..a86ced8 100644
--- a/src/msg/SimpleMessenger.cc
+++ b/src/msg/SimpleMessenger.cc
@@ -2243,6 +2243,23 @@ int SimpleMessenger::Pipe::write_message(Message *m)
   goto out;
 }
 
+/* Clean up sent list */
+void SimpleMessenger::Pipe::handle_ack(uint64_t seq)
+{
+  dout(15) << "reader got ack seq " << seq << dendl;
+  // trim sent list
+  while (!sent.empty() &&
+	 sent.front()->get_seq() <= seq) {
+    Message *m = sent.front();
+    sent.pop_front();
+    dout(10) << "reader got ack seq "
+	     << seq << " >= " << m->get_seq() << " on " << m << " " << *m << dendl;
+    m->put();
+    dout(20) << "handle_ack finished put on " << m << dendl;
+  }
+}
+
+
 
 /********************************************
  * SimpleMessenger
diff --git a/src/msg/SimpleMessenger.h b/src/msg/SimpleMessenger.h
index d6ee0df..4031836 100644
--- a/src/msg/SimpleMessenger.h
+++ b/src/msg/SimpleMessenger.h
@@ -174,20 +174,7 @@ private:
     void fail();
 
     void was_session_reset();
-
-    /* Clean up sent list */
-    void handle_ack(uint64_t seq) {
-      dout(15) << "reader got ack seq " << seq << dendl;
-      // trim sent list
-      while (!sent.empty() &&
-          sent.front()->get_seq() <= seq) {
-        Message *m = sent.front();
-        sent.pop_front();
-        dout(10) << "reader got ack seq "
-            << seq << " >= " << m->get_seq() << " on " << m << " " << *m << dendl;
-        m->put();
-      }
-    }
+    void handle_ack(uint64_t seq);
 
     // threads
     class Reader : public Thread {
-- 
1.6.6

-- Jim

> 
> Thanks!
> sage
> 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-10 23:32                           ` Jim Schutt
@ 2011-03-10 23:40                             ` Sage Weil
  2011-03-11 14:51                               ` Jim Schutt
  2011-03-11 18:26                               ` Jim Schutt
  0 siblings, 2 replies; 94+ messages in thread
From: Sage Weil @ 2011-03-10 23:40 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Thu, 10 Mar 2011, Jim Schutt wrote:
> 
> On Thu, 2011-03-10 at 16:21 -0700, Sage Weil wrote:
> > On Thu, 10 Mar 2011, Jim Schutt wrote:
> > > On Wed, 2011-03-09 at 12:37 -0700, Gregory Farnum wrote:
> > > > On Wednesday, March 9, 2011 at 10:36 AM, Jim Schutt wrote:
> > > > > Here's another example with more debugging. The
> > > > > PG count during this interval is:
> > > > > 
> > > > > 2011-03-09 10:35:58.306942 pg v379: 25344 pgs: 25344 active+clean; 12119 MB data, 12025 MB used, 44579 GB / 44787 GB avail
> > > > > 2011-03-09 10:36:42.177728 pg v462: 25344 pgs: 25344 active+clean; 46375 MB data, 72672 MB used, 44520 GB / 44787 GB avail
> > > > > 
> > > > > Check out the interval 10:36:23.473356 -- 10:36:27.922262
> > > > > 
> > > > > It looks to me like a heartbeat message submission is 
> > > > > waiting on something?
> > > > 
> > > > Yes, it sure does. The only thing that should block between those output 
> > > > messages is getting the messenger lock, which *ought* be fast. Either 
> > > > there are a lot of threads trying to send messages and the heartbeat 
> > > > thread is just getting unlucky, or there's a mistake in where and how 
> > > > the messenger locks (which is certainly possible, but in a brief 
> > > > audit it looks correct).
> > > 
> > > Or, delete is broken on my systems.  With some extra diagnostics, 
> > > I get many instances of this sort of thing:
> > > 
> > > osd.10.log:946307:2011-03-10 15:38:38.519444 7fe9e1170940 -- 172.17.40.22:6808/16890 --> osd17 172.17.40.23:6805/8181 -- osd_ping(e13 as_of 13) v1 -- ?+0 0x7fe9ac4041f0
> > > osd.10.log:946348:2011-03-10 15:38:38.520124 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).writer encoding 310 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
> > > osd.10.log:946349:2011-03-10 15:38:38.520142 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).writer sending 310 0x7fe9ac4041f0
> > > osd.10.log:946350:2011-03-10 15:38:38.520156 7fe9c83c3940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).write_message 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
> > > osd.10.log:949167:2011-03-10 15:38:38.800447 7fe9c8ccc940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).reader got ack seq 310 >= 310 on 0x7fe9ac4041f0 osd_ping(e13 as_of 13) v1
> > > osd.10.log:954385:2011-03-10 15:38:46.184453 7fe9c8ccc940 RefCountedObject::put delete 0x7fe9ac4041f0 took 7.345873 secs!
> > > osd.10.log:954386:2011-03-10 15:38:46.184471 7fe9c8ccc940 -- 172.17.40.22:6808/16890 >> 172.17.40.23:6805/8181 pipe(0x16b24c0 sd=133 pgs=106 cs=1 l=0).handle_ack finished put on 0x7fe9ac4041f0
> > > 
> > > osd.10.log:954785:2011-03-10 15:38:46.192022 7fe9e1170940 -- 172.17.40.22:6808/16890 --> osd46 172.17.40.27:6820/12936 -- osd_ping(e13 as_of 13) v1 -- ?+0 0x7fe9b4823d30
> > > osd.10.log:955206:2011-03-10 15:38:46.205457 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).writer encoding 322 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
> > > osd.10.log:955207:2011-03-10 15:38:46.205480 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).writer sending 322 0x7fe9b4823d30
> > > osd.10.log:955208:2011-03-10 15:38:46.205494 7fe9d0949940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).write_message 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
> > > osd.10.log:960397:2011-03-10 15:38:46.833161 7fe9d0444940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).reader got ack seq 322 >= 322 on 0x7fe9b4823d30 osd_ping(e13 as_of 13) v1
> > > osd.10.log:969858:2011-03-10 15:38:58.211206 7fe9d0444940 RefCountedObject::put delete 0x7fe9b4823d30 took 11.378036 secs!
> > > osd.10.log:969859:2011-03-10 15:38:58.211219 7fe9d0444940 -- 172.17.40.22:6808/16890 >> 172.17.40.27:6820/12936 pipe(0x16477c0 sd=99 pgs=74 cs=1 l=0).handle_ack finished put on 0x7fe9b4823d30
> > > 
> > > Since handle_ack() is under pipe_lock, heartbeat() cannot
> > > queue new osd_ping messages until Message::put() completes,
> > > right?
> > 
> > Right.
> > 
> > > It turns out my systems don't have tcmalloc.  Do you
> > > think using it would help?
> > 
> > Hmm, maybe.  I wouldn't expect this behavior from any allocator, though!
> > 
> > Can you drill down a bit further and see if either of these is 
> > responsible?
> > 
> >   virtual ~Message() { 
> >     assert(nref.read() == 0);
> >     if (connection)
> >       connection->put();
> >     if (throttler)
> >       throttler->put(payload.length() + middle.length() + data.length());
> >   }
> > 
> > (msg/Message.h)
> 
> Hmmm, this is the patch I'm running to produce above.
> It seems pretty definitive to me; am I missing something?

	delete this;

is calling the virtual destructor ~MOSDPing(), and then ~Message(), and 
only then releasing the memory to the allocator.  ~MOSDPing doesn't do 
anything, but ~Message adjusts the throttler (which involves a mutex that 
*shouldn't* be contended :) and a connection->put(), which calls 
~Connection() and then releases memory.

My money is on the throttler, but let's see!

sage


> 
> (I moved handle_ack() implementation into .cc to make
> dout work via debug osd setting.)
> 
> >From c103fc342eec412a041188031aff484c2fd3feea Mon Sep 17 00:00:00 2001
> From: Jim Schutt <jaschut@sandia.gov>
> Date: Thu, 10 Mar 2011 16:26:43 -0700
> Subject: [PATCH] Instrument ack handling.
> 
> ---
>  src/msg/Message.h          |    8 +++++++-
>  src/msg/SimpleMessenger.cc |   17 +++++++++++++++++
>  src/msg/SimpleMessenger.h  |   15 +--------------
>  3 files changed, 25 insertions(+), 15 deletions(-)
> 
> diff --git a/src/msg/Message.h b/src/msg/Message.h
> index 3758b1b..ac32b94 100644
> --- a/src/msg/Message.h
> +++ b/src/msg/Message.h
> @@ -154,8 +154,14 @@ struct RefCountedObject {
>    }
>    void put() {
>      //generic_dout(0) << "RefCountedObject::put " << this << " " << nref.read() << " -> " << (nref.read() - 1) << dendl;
> -    if (nref.dec() == 0)
> +    if (nref.dec() == 0) {
> +      utime_t s = g_clock.now();
>        delete this;
> +      utime_t e = g_clock.now();
> +      if (e - s > 0.5) {
> +	generic_dout(1) << "RefCountedObject::put delete " << this << " took " << e - s << " secs!" << dendl;
> +      }
> +    }
>    }
>  };
>  
> diff --git a/src/msg/SimpleMessenger.cc b/src/msg/SimpleMessenger.cc
> index 7df3d44..a86ced8 100644
> --- a/src/msg/SimpleMessenger.cc
> +++ b/src/msg/SimpleMessenger.cc
> @@ -2243,6 +2243,23 @@ int SimpleMessenger::Pipe::write_message(Message *m)
>    goto out;
>  }
>  
> +/* Clean up sent list */
> +void SimpleMessenger::Pipe::handle_ack(uint64_t seq)
> +{
> +  dout(15) << "reader got ack seq " << seq << dendl;
> +  // trim sent list
> +  while (!sent.empty() &&
> +	 sent.front()->get_seq() <= seq) {
> +    Message *m = sent.front();
> +    sent.pop_front();
> +    dout(10) << "reader got ack seq "
> +	     << seq << " >= " << m->get_seq() << " on " << m << " " << *m << dendl;
> +    m->put();
> +    dout(20) << "handle_ack finished put on " << m << dendl;
> +  }
> +}
> +
> +
>  
>  /********************************************
>   * SimpleMessenger
> diff --git a/src/msg/SimpleMessenger.h b/src/msg/SimpleMessenger.h
> index d6ee0df..4031836 100644
> --- a/src/msg/SimpleMessenger.h
> +++ b/src/msg/SimpleMessenger.h
> @@ -174,20 +174,7 @@ private:
>      void fail();
>  
>      void was_session_reset();
> -
> -    /* Clean up sent list */
> -    void handle_ack(uint64_t seq) {
> -      dout(15) << "reader got ack seq " << seq << dendl;
> -      // trim sent list
> -      while (!sent.empty() &&
> -          sent.front()->get_seq() <= seq) {
> -        Message *m = sent.front();
> -        sent.pop_front();
> -        dout(10) << "reader got ack seq "
> -            << seq << " >= " << m->get_seq() << " on " << m << " " << *m << dendl;
> -        m->put();
> -      }
> -    }
> +    void handle_ack(uint64_t seq);
>  
>      // threads
>      class Reader : public Thread {
> -- 
> 1.6.6
> 
> -- Jim
> 
> > 
> > Thanks!
> > sage
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-10 23:40                             ` Sage Weil
@ 2011-03-11 14:51                               ` Jim Schutt
  2011-03-11 18:26                               ` Jim Schutt
  1 sibling, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-03-11 14:51 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Thu, 2011-03-10 at 16:40 -0700, Sage Weil wrote:
> > > Hmm, maybe.  I wouldn't expect this behavior from any allocator,
> though!
> > >
> > > Can you drill down a bit further and see if either of these is
> > > responsible?
> > >
> > >   virtual ~Message() {
> > >     assert(nref.read() == 0);
> > >     if (connection)
> > >       connection->put();
> > >     if (throttler)
> > >       throttler->put(payload.length() + middle.length() +
> data.length());
> > >   }
> > >
> > > (msg/Message.h)
> >
> > Hmmm, this is the patch I'm running to produce above.
> > It seems pretty definitive to me; am I missing something?
> 
>         delete this;
> 
> is calling the virtual destructor ~MOSDPing(), and then ~Message(),
> and
> only then releasing the memory to the allocator. 

Doh!!!  Sorry, I should have been doing more thinking
and less typing last night.

>  ~MOSDPing doesn't do
> anything, but ~Message adjusts the throttler (which involves a mutex
> that
> *shouldn't* be contended :) and a connection->put(), which calls
> ~Connection() and then releases memory.
> 
> My money is on the throttler, but let's see!

OK, I see what you're after.....

-- Jim

> 
> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-10 23:40                             ` Sage Weil
  2011-03-11 14:51                               ` Jim Schutt
@ 2011-03-11 18:26                               ` Jim Schutt
  2011-03-11 18:37                                 ` Jim Schutt
  2011-03-11 18:37                                 ` Sage Weil
  1 sibling, 2 replies; 94+ messages in thread
From: Jim Schutt @ 2011-03-11 18:26 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Thu, 2011-03-10 at 16:40 -0700, Sage Weil wrote:
> > > Hmm, maybe.  I wouldn't expect this behavior from any allocator, though!
> > >
> > > Can you drill down a bit further and see if either of these is
> > > responsible?
> > >
> > >   virtual ~Message() {
> > >     assert(nref.read() == 0);
> > >     if (connection)
> > >       connection->put();
> > >     if (throttler)
> > >       throttler->put(payload.length() + middle.length() + data.length());
> > >   }
> > >
> > > (msg/Message.h)
> >
> > Hmmm, this is the patch I'm running to produce above.
> > It seems pretty definitive to me; am I missing something?
> 
>         delete this;
> 
> is calling the virtual destructor ~MOSDPing(), and then ~Message(), and
> only then releasing the memory to the allocator.  ~MOSDPing doesn't do
> anything, but ~Message adjusts the throttler (which involves a mutex that
> *shouldn't* be contended :) and a connection->put(), which calls
> ~Connection() and then releases memory.
> 
> My money is on the throttler, but let's see!

I added this patch:

--- a/src/msg/Message.h
+++ b/src/msg/Message.h
@@ -158,8 +158,8 @@ struct RefCountedObject {
       utime_t s = g_clock.now();
       delete this;
       utime_t e = g_clock.now();
-      if (e - s > 0.5) {
-       generic_dout(1) << "RefCountedObject::put delete " << this << " took " << e - s << " secs!" << dendl;
+      if (double(e - s) > 0.5) {
+       generic_dout(1) << "RefCountedObject::put delete " << this << " took " << double(e - s) << " secs!" << dendl;
       }
     }
   }
@@ -304,10 +304,22 @@ public:
 protected:
   virtual ~Message() { 
     assert(nref.read() == 0);
-    if (connection)
+    if (connection) {
+      utime_t s = g_clock.now();
       connection->put();
-    if (throttler)
+      utime_t e = g_clock.now();
+      if (double(e - s) > 0.25) {
+       generic_dout(1) << "~Message() " << this << " connection->put took " << double(e - s) << " secs!" << dendl;
+      }
+    }
+    if (throttler) {
+      utime_t s = g_clock.now();
       throttler->put(payload.length() + middle.length() + data.length());
+      utime_t e = g_clock.now();
+      if (double(e - s) > 0.25) {
+       generic_dout(1) << "~Message() " << this << " throttler->put took " << double(e - s) << " secs!" << dendl;
+      }
+    }
   }
 public:
   Connection *get_connection() { return connection; }



I got these hits:
# egrep -Hn --color -e "throttler->put took" osd.*.log
osd.16.log:1029077:2011-03-11 10:56:05.898264 7f30a7398940 ~Message() 0x2214ea0 throttler->put took 0.39543 secs!
osd.41.log:1401766:2011-03-11 10:56:51.110989 7ffad771e940 ~Message() 0x7ffaa7c12c40 throttler->put took 0.28708 secs!
osd.41.log:1494336:2011-03-11 10:57:07.748022 7ffad9123940 ~Message() 0x1adc8c0 throttler->put took 0.87857 secs!

Here are the corresponding messages:

osd.16.log:1011274:2011-03-11 10:55:48.805427 7f3089c9e940 -- 172.17.40.23:6800/22969 >> 172.17.40.68:0/1519385907 pipe(0x20c9930 sd=161 pgs=103 cs=1 l=1).reader got message 1 0x2214ea0 osd_op(client4212.1:39 10000004a4b.00000026 [write 0~4194304 [1@-1]] 0.2815 RETRY snapc 1=[])
osd.16.log:1027713:2011-03-11 10:56:05.679653 7f30a9d9f940 -- 172.17.40.23:6800/22969 dispatch_entry pipe 0x20c9930 dequeued 0x2214ea0
osd.16.log:1027714:2011-03-11 10:56:05.679666 7f30a9d9f940 -- 172.17.40.23:6800/22969 <== client4212 172.17.40.68:0/1519385907 1 ==== osd_op(client4212.1:39 10000004a4b.00000026 [write 0~4194304 [1@-1]] 0.2815 RETRY snapc 1=[]) ==== 128+0+4194304 (4010875753 0 0) 0x2214ea0 con 0x28b9e00
osd.16.log:1027717:2011-03-11 10:56:05.679688 7f30a9d9f940 osd16 14 _dispatch 0x2214ea0 osd_op(client4212.1:39 10000004a4b.00000026 [write 0~4194304 [1@-1]] 0.2815 RETRY snapc 1=[])
osd.16.log:1027718:2011-03-11 10:56:05.679703 7f30a9d9f940 osd16 14 require_same_or_newer_map 14 (i am 14) 0x2214ea0
osd.16.log:1027722:2011-03-11 10:56:05.679768 7f30a9d9f940 osd16 14 pg[0.815( v 14'6 (0'0,14'6] n=6 ec=3 les=6 3/3/3) [16,83] r=0 luod=14'5 lcod 14'5 mlcod 0'0 active+clean] enqueue_op 0x2214ea0 osd_op(client4212.1:39 10000004a4b.00000026 [write 0~4194304 [1@-1]] 0.2815 RETRY snapc 1=[])
osd.16.log:1027725:2011-03-11 10:56:05.679793 7f30a9d9f940 -- 172.17.40.23:6800/22969 dispatch_entry done with 0x2214ea0 que_et 16.874237 op_et 0.000117 tot_et 16.874354
osd.16.log:1029077:2011-03-11 10:56:05.898264 7f30a7398940 ~Message() 0x2214ea0 throttler->put took 0.39543 secs!
osd.16.log:1029078:2011-03-11 10:56:05.898287 7f30a7398940 osd16 14 dequeue_op 0x2214ea0 finish

osd.41.log:1400466:2011-03-11 10:56:50.926741 7ffab79cb940 -- 172.17.40.27:6803/25508 >> 172.17.40.97:0/893403283 pipe(0x7ffabc6eba50 sd=156 pgs=194 cs=1 l=1).reader got message 2 0x7ffaa7c12c40 osd_op(client4251.1:250 1000000cf3d.000000f4 [write 0~4194304 [1@-1]] 0.2c5b RETRY snapc 1=[])
osd.41.log:1401371:2011-03-11 10:56:51.069569 7ffad9924940 -- 172.17.40.27:6803/25508 dispatch_entry pipe 0x7ffabc6eba50 dequeued 0x7ffaa7c12c40
osd.41.log:1401372:2011-03-11 10:56:51.069584 7ffad9924940 -- 172.17.40.27:6803/25508 <== client4251 172.17.40.97:0/893403283 2 ==== osd_op(client4251.1:250 1000000cf3d.000000f4 [write 0~4194304 [1@-1]] 0.2c5b RETRY snapc 1=[]) ==== 128+0+4194304 (4206015051 0 0) 0x7ffaa7c12c40 con 0x7ffabc1c6640
osd.41.log:1401375:2011-03-11 10:56:51.069608 7ffad9924940 osd41 14 _dispatch 0x7ffaa7c12c40 osd_op(client4251.1:250 1000000cf3d.000000f4 [write 0~4194304 [1@-1]] 0.2c5b RETRY snapc 1=[])
osd.41.log:1401376:2011-03-11 10:56:51.069615 7ffad9924940 osd41 14 require_same_or_newer_map 14 (i am 14) 0x7ffaa7c12c40
osd.41.log:1401380:2011-03-11 10:56:51.069675 7ffad9924940 osd41 14 pg[0.c5b( v 14'8 (14'2,14'8] n=8 ec=3 les=5 3/3/3) [41,55] r=0 mlcod 14'3 active+clean] enqueue_op 0x7ffaa7c12c40 osd_op(client4251.1:250 1000000cf3d.000000f4 [write 0~4194304 [1@-1]] 0.2c5b RETRY snapc 1=[])
osd.41.log:1401383:2011-03-11 10:56:51.069704 7ffad9924940 -- 172.17.40.27:6803/25508 dispatch_entry done with 0x7ffaa7c12c40 que_et 0.142841 op_et 0.000110 tot_et 0.142951
osd.41.log:1401766:2011-03-11 10:56:51.110989 7ffad771e940 ~Message() 0x7ffaa7c12c40 throttler->put took 0.28708 secs!
osd.41.log:1401782:2011-03-11 10:56:51.111711 7ffad771e940 osd41 14 dequeue_op 0x7ffaa7c12c40 finish

osd.41.log:1406682:2011-03-11 10:56:52.368878 7ffab55a9940 -- 172.17.40.27:6803/25508 >> 172.17.40.49:0/530169199 pipe(0x1b03a90 sd=209 pgs=137 cs=1 l=1).reader got message 6 0x1adc8c0 osd_op(client4200.1:460 10000000bbb.000001c5 [write 0~4194304 [1@-1]] 0.a5b1 RETRY snapc 1=[])
osd.41.log:1439271:2011-03-11 10:56:59.443761 7ffad9924940 -- 172.17.40.27:6803/25508 dispatch_entry pipe 0x1b03a90 dequeued 0x1adc8c0
osd.41.log:1439272:2011-03-11 10:56:59.443782 7ffad9924940 -- 172.17.40.27:6803/25508 <== client4200 172.17.40.49:0/530169199 6 ==== osd_op(client4200.1:460 10000000bbb.000001c5 [write 0~4194304 [1@-1]] 0.a5b1 RETRY snapc 1=[]) ==== 128+0+4194304 (3016272799 0 0) 0x1adc8c0 con 0x8114be0
osd.41.log:1439275:2011-03-11 10:56:59.443806 7ffad9924940 osd41 21 _dispatch 0x1adc8c0 osd_op(client4200.1:460 10000000bbb.000001c5 [write 0~4194304 [1@-1]] 0.a5b1 RETRY snapc 1=[])
osd.41.log:1439276:2011-03-11 10:56:59.443816 7ffad9924940 osd41 21 require_same_or_newer_map 15 (i am 21) 0x1adc8c0
osd.41.log:1439306:2011-03-11 10:56:59.449276 7ffad9924940 -- 172.17.40.27:6803/25508 dispatch_entry done with 0x1adc8c0 que_et 7.074903 op_et 0.005482 tot_et 7.080385
osd.41.log:1493847:2011-03-11 10:57:07.658838 7ffad9123940 osd41 28 _dispatch 0x1adc8c0 osd_op(client4200.1:460 10000000bbb.000001c5 [write 0~4194304 [1@-1]] 0.a5b1 RETRY snapc 1=[])
osd.41.log:1493848:2011-03-11 10:57:07.658848 7ffad9123940 osd41 28 require_same_or_newer_map 15 (i am 28) 0x1adc8c0
osd.41.log:1494336:2011-03-11 10:57:07.748022 7ffad9123940 ~Message() 0x1adc8c0 throttler->put took 0.87857 secs!
osd.41.log:1494337:2011-03-11 10:57:07.748043 7ffad9123940 RefCountedObject::put delete 0x1adc8c0 took 0.87917 secs!


So none of those were osd_ping messages.

But, I still had lots of delayed acks.  Here's a couple more examples:

osd.0.log:960713:2011-03-11 10:55:32.117721 7fc4cb7fe940 -- 172.17.40.21:6802/28363 --> osd74 172.17.40.31:6808/24916 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7fc4cc16f270
osd.0.log:960756:2011-03-11 10:55:32.118395 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer encoding 289 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
osd.0.log:960757:2011-03-11 10:55:32.118409 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer sending 289 0x7fc4cc16f270
osd.0.log:960758:2011-03-11 10:55:32.118422 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).write_message 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
osd.0.log:963163:2011-03-11 10:55:32.941413 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).reader got ack seq 289 >= 289 on 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
osd.0.log:964273:2011-03-11 10:55:33.447526 7fc4c61b6940 RefCountedObject::put delete 0x7fc4cc16f270 took 5.06013 secs!
osd.0.log:964274:2011-03-11 10:55:33.447538 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).handle_ack finished put on 0x7fc4cc16f270

osd.0.log:964311:2011-03-11 10:55:33.448063 7fc4cb7fe940 -- 172.17.40.21:6802/28363 --> osd77 172.17.40.31:6817/25216 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7fc4cc2f9a90
osd.0.log:964356:2011-03-11 10:55:33.448883 7fc4ba5a5940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6817/25216 pipe(0x27abca0 sd=108 pgs=81 cs=1 l=0).writer encoding 277 0x7fc4cc2f9a90 osd_ping(e14 as_of 14) v1
osd.0.log:964357:2011-03-11 10:55:33.448896 7fc4ba5a5940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6817/25216 pipe(0x27abca0 sd=108 pgs=81 cs=1 l=0).writer sending 277 0x7fc4cc2f9a90
osd.0.log:964358:2011-03-11 10:55:33.448907 7fc4ba5a5940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6817/25216 pipe(0x27abca0 sd=108 pgs=81 cs=1 l=0).write_message 0x7fc4cc2f9a90 osd_ping(e14 as_of 14) v1
osd.0.log:972337:2011-03-11 10:55:34.976054 7fc4ba8a8940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6817/25216 pipe(0x27abca0 sd=108 pgs=81 cs=1 l=0).reader got ack seq 278 >= 277 on 0x7fc4cc2f9a90 osd_ping(e14 as_of 14) v1
osd.0.log:977785:2011-03-11 10:55:45.119599 7fc4ba8a8940 RefCountedObject::put delete 0x7fc4cc2f9a90 took 11.4353 secs!
osd.0.log:977786:2011-03-11 10:55:45.119612 7fc4ba8a8940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6817/25216 pipe(0x27abca0 sd=108 pgs=81 cs=1 l=0).handle_ack finished put on 0x7fc4cc2f9a90

Seems like this pretty much rules out anything but the memory allocator,
but maybe I'm still missing something?

-- Jim

> 
> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 18:26                               ` Jim Schutt
@ 2011-03-11 18:37                                 ` Jim Schutt
  2011-03-11 18:37                                 ` Sage Weil
  1 sibling, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-03-11 18:37 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Fri, 2011-03-11 at 11:26 -0700, Jim Schutt wrote:
> 
> Seems like this pretty much rules out anything but the memory allocator,
> but maybe I'm still missing something?

Again with the typing instead of the thinking.

Off to look at the bufferlist destructor, and anything
else a Message holds.

-- Jim

> 
> -- Jim
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 18:26                               ` Jim Schutt
  2011-03-11 18:37                                 ` Jim Schutt
@ 2011-03-11 18:37                                 ` Sage Weil
  2011-03-11 18:51                                   ` Jim Schutt
  2011-03-11 21:13                                   ` Jim Schutt
  1 sibling, 2 replies; 94+ messages in thread
From: Sage Weil @ 2011-03-11 18:37 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Fri, 11 Mar 2011, Jim Schutt wrote:
> So none of those were osd_ping messages.
> 
> But, I still had lots of delayed acks.  Here's a couple more examples:
> 
> osd.0.log:960713:2011-03-11 10:55:32.117721 7fc4cb7fe940 -- 172.17.40.21:6802/28363 --> osd74 172.17.40.31:6808/24916 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7fc4cc16f270
> osd.0.log:960756:2011-03-11 10:55:32.118395 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer encoding 289 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> osd.0.log:960757:2011-03-11 10:55:32.118409 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer sending 289 0x7fc4cc16f270
> osd.0.log:960758:2011-03-11 10:55:32.118422 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).write_message 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1

This bit has me confused:

> osd.0.log:963163:2011-03-11 10:55:32.941413 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).reader got ack seq 289 >= 289 on 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> osd.0.log:964273:2011-03-11 10:55:33.447526 7fc4c61b6940 RefCountedObject::put delete 0x7fc4cc16f270 took 5.06013 secs!
> osd.0.log:964274:2011-03-11 10:55:33.447538 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).handle_ack finished put on 0x7fc4cc16f270

It looks like ~ .5 seconds has gone by for that thread, but the ::put 
debug says 5 seconds.  It happens between the 'got ack seq' and 'finished 
put' lines, though, right?  

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 18:37                                 ` Sage Weil
@ 2011-03-11 18:51                                   ` Jim Schutt
  2011-03-11 19:09                                     ` Gregory Farnum
  2011-03-11 21:13                                   ` Jim Schutt
  1 sibling, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-11 18:51 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Fri, 2011-03-11 at 11:37 -0700, Sage Weil wrote:
> On Fri, 11 Mar 2011, Jim Schutt wrote:
> > So none of those were osd_ping messages.
> > 
> > But, I still had lots of delayed acks.  Here's a couple more examples:
> > 
> > osd.0.log:960713:2011-03-11 10:55:32.117721 7fc4cb7fe940 -- 172.17.40.21:6802/28363 --> osd74 172.17.40.31:6808/24916 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7fc4cc16f270
> > osd.0.log:960756:2011-03-11 10:55:32.118395 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer encoding 289 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > osd.0.log:960757:2011-03-11 10:55:32.118409 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer sending 289 0x7fc4cc16f270
> > osd.0.log:960758:2011-03-11 10:55:32.118422 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).write_message 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> 
> This bit has me confused:
> 
> > osd.0.log:963163:2011-03-11 10:55:32.941413 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).reader got ack seq 289 >= 289 on 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > osd.0.log:964273:2011-03-11 10:55:33.447526 7fc4c61b6940 RefCountedObject::put delete 0x7fc4cc16f270 took 5.06013 secs!
> > osd.0.log:964274:2011-03-11 10:55:33.447538 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).handle_ack finished put on 0x7fc4cc16f270
> 
> It looks like ~ .5 seconds has gone by for that thread, but the ::put 
> debug says 5 seconds.  It happens between the 'got ack seq' and 'finished 
> put' lines, though, right?  

I'm also confused.  Here's the code I ran:

  void put() {
    //generic_dout(0) << "RefCountedObject::put " << this << " " << nref.read() << " -> " << (nref.read() - 1) << dendl;
    if (nref.dec() == 0) {
      utime_t s = g_clock.now();
      delete this;
      utime_t e = g_clock.now();
      if (double(e - s) > 0.5) {
	generic_dout(1) << "RefCountedObject::put delete " << this << " took " << double(e - s) << " secs!" << dendl;
      }
    }
  }

I added those double casts, because I had a similar problem with my throttler->put
test: without the casts, it was firing but the reported delay was less than
0.25 sec.  Adding the casts stopped that - I haven't yet checked into why.

Still checking for what I'm missing...

-- Jim

> 
> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 18:51                                   ` Jim Schutt
@ 2011-03-11 19:09                                     ` Gregory Farnum
  2011-03-11 19:13                                       ` Yehuda Sadeh Weinraub
  2011-03-11 19:16                                       ` Jim Schutt
  0 siblings, 2 replies; 94+ messages in thread
From: Gregory Farnum @ 2011-03-11 19:09 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Sage Weil, ceph-devel


On Friday, March 11, 2011 at 10:51 AM, Jim Schutt wrote:
> I'm also confused. Here's the code I ran:
> 
>  void put() {
>  //generic_dout(0) << "RefCountedObject::put " << this << " " << nref.read() << " -> " << (nref.read() - 1) << dendl;
>  if (nref.dec() == 0) {
>  utime_t s = g_clock.now();
>  delete this;
>  utime_t e = g_clock.now();
>  if (double(e - s) > 0.5) {
>  generic_dout(1) << "RefCountedObject::put delete " << this << " took " << double(e - s) << " secs!" << dendl;
>  }
>  }
>  }
> 
> I added those double casts, because I had a similar problem with my throttler->put
> test: without the casts, it was firing but the reported delay was less than
> 0.25 sec. Adding the casts stopped that - I haven't yet checked into why.
> 
> Still checking for what I'm missing...
> 
> -- Jim

Heh -- it turns out that operator double() is a little bit broken -- it divides by the wrong constant! Pushed the (very simple) fix in 3fb4fd8612b7a05f7d89cfd0b48f765c79830f95 to the stable branch.
This would be putting everything into the wrong order of magnitude. Half second delete issues are a lot more believable (to me at least!) than 5 second ones, and I wouldn't be surprised if they were the fault of the default memory allocator combined with the throttler getting bogged down. (Apparently even tcmalloc will do a better job of memory management if memory is freed from the same thread it's allocated in, and almost none of our memory use is polite in that way.)
-Greg




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 19:09                                     ` Gregory Farnum
@ 2011-03-11 19:13                                       ` Yehuda Sadeh Weinraub
  2011-03-11 19:17                                         ` Yehuda Sadeh Weinraub
  2011-03-11 19:16                                       ` Jim Schutt
  1 sibling, 1 reply; 94+ messages in thread
From: Yehuda Sadeh Weinraub @ 2011-03-11 19:13 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Jim Schutt, Sage Weil, ceph-devel

On Fri, Mar 11, 2011 at 11:09 AM, Gregory Farnum
<gregory.farnum@dreamhost.com> wrote:
>>>  utime_t e = g_clock.now();
>>  if (double(e - s) > 0.5) {
>

shouldn't that number be configurable?

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 19:09                                     ` Gregory Farnum
  2011-03-11 19:13                                       ` Yehuda Sadeh Weinraub
@ 2011-03-11 19:16                                       ` Jim Schutt
  1 sibling, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-03-11 19:16 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, ceph-devel


On Fri, 2011-03-11 at 12:09 -0700, Gregory Farnum wrote:
> On Friday, March 11, 2011 at 10:51 AM, Jim Schutt wrote:
> > I'm also confused. Here's the code I ran:
> > 
> >  void put() {
> >  //generic_dout(0) << "RefCountedObject::put " << this << " " << nref.read() << " -> " << (nref.read() - 1) << dendl;
> >  if (nref.dec() == 0) {
> >  utime_t s = g_clock.now();
> >  delete this;
> >  utime_t e = g_clock.now();
> >  if (double(e - s) > 0.5) {
> >  generic_dout(1) << "RefCountedObject::put delete " << this << " took " << double(e - s) << " secs!" << dendl;
> >  }
> >  }
> >  }
> > 
> > I added those double casts, because I had a similar problem with my throttler->put
> > test: without the casts, it was firing but the reported delay was less than
> > 0.25 sec. Adding the casts stopped that - I haven't yet checked into why.
> > 
> > Still checking for what I'm missing...
> > 
> > -- Jim
> 
> Heh -- it turns out that operator double() is a little bit broken -- it divides by the wrong constant! Pushed the (very simple) fix in 3fb4fd8612b7a05f7d89cfd0b48f765c79830f95 to the stable branch.
> This would be putting everything into the wrong order of magnitude. Half second delete issues are a lot more believable (to me at least!) than 5 second ones, and I wouldn't be surprised if they were the fault of the default memory allocator combined with the throttler getting bogged down. (Apparently even tcmalloc will do a better job of memory management if memory is freed from the same thread it's allocated in, and almost none of our memory use is polite in that way.)
> -Greg
> 
Re-running my tests....

-- Jim

> 
> 
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 19:13                                       ` Yehuda Sadeh Weinraub
@ 2011-03-11 19:17                                         ` Yehuda Sadeh Weinraub
  0 siblings, 0 replies; 94+ messages in thread
From: Yehuda Sadeh Weinraub @ 2011-03-11 19:17 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Jim Schutt, Sage Weil, ceph-devel

On Fri, Mar 11, 2011 at 11:13 AM, Yehuda Sadeh Weinraub
<yehudasa@gmail.com> wrote:
> On Fri, Mar 11, 2011 at 11:09 AM, Gregory Farnum
> <gregory.farnum@dreamhost.com> wrote:
>>>>  utime_t e = g_clock.now();
>>>  if (double(e - s) > 0.5) {
>>
>
> shouldn't that number be configurable?
>
oh, sorry.. it's just an output check.. forget it!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 18:37                                 ` Sage Weil
  2011-03-11 18:51                                   ` Jim Schutt
@ 2011-03-11 21:13                                   ` Jim Schutt
  2011-03-11 21:37                                     ` Sage Weil
  1 sibling, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-11 21:13 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Fri, 2011-03-11 at 11:37 -0700, Sage Weil wrote:
> On Fri, 11 Mar 2011, Jim Schutt wrote:
> > So none of those were osd_ping messages.
> > 
> > But, I still had lots of delayed acks.  Here's a couple more examples:
> > 
> > osd.0.log:960713:2011-03-11 10:55:32.117721 7fc4cb7fe940 -- 172.17.40.21:6802/28363 --> osd74 172.17.40.31:6808/24916 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7fc4cc16f270
> > osd.0.log:960756:2011-03-11 10:55:32.118395 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer encoding 289 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > osd.0.log:960757:2011-03-11 10:55:32.118409 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer sending 289 0x7fc4cc16f270
> > osd.0.log:960758:2011-03-11 10:55:32.118422 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).write_message 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> 
> This bit has me confused:
> 
> > osd.0.log:963163:2011-03-11 10:55:32.941413 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).reader got ack seq 289 >= 289 on 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > osd.0.log:964273:2011-03-11 10:55:33.447526 7fc4c61b6940 RefCountedObject::put delete 0x7fc4cc16f270 took 5.06013 secs!
> > osd.0.log:964274:2011-03-11 10:55:33.447538 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).handle_ack finished put on 0x7fc4cc16f270
> 
> It looks like ~ .5 seconds has gone by for that thread, but the ::put 
> debug says 5 seconds.  It happens between the 'got ack seq' and 'finished 
> put' lines, though, right?  

OK, with Greg's utime_t double fix, I git no hits on

# egrep -Hn --color -e "->put took" osd.*.log


But, I still get lots of stalled :RefCountedObject::put delete:

# grep -Hn RefCountedObject::put osd.*.log | egrep "took [1-9]\." | wc -l
8911

# grep -Hn RefCountedObject::put osd.*.log | egrep "took [0-9][0-9]\." | wc -l
415

Here's a few egregious examples:

osd.15.log:1182213:2011-03-11 12:38:05.008947 7fa2e99a1940 -- 172.17.40.22:6823/10039 --> osd58 172.17.40.29:6808/6327 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7fa2c8958850
osd.15.log:1182548:2011-03-11 12:38:05.033223 7fa2e32f2940 -- 172.17.40.22:6823/10039 >> 172.17.40.29:6808/6327 pipe(0x7fa2e40203c0 sd=19 pgs=59 cs=1 l=0).writer encoding 294 0x7fa2c8958850 osd_ping(e5 as_of 5) v1
osd.15.log:1182549:2011-03-11 12:38:05.033243 7fa2e32f2940 -- 172.17.40.22:6823/10039 >> 172.17.40.29:6808/6327 pipe(0x7fa2e40203c0 sd=19 pgs=59 cs=1 l=0).writer sending 294 0x7fa2c8958850
osd.15.log:1182550:2011-03-11 12:38:05.033257 7fa2e32f2940 -- 172.17.40.22:6823/10039 >> 172.17.40.29:6808/6327 pipe(0x7fa2e40203c0 sd=19 pgs=59 cs=1 l=0).write_message 0x7fa2c8958850 osd_ping(e5 as_of 5) v1
osd.15.log:1184485:2011-03-11 12:38:05.240959 7fa2daf6f940 -- 172.17.40.22:6823/10039 >> 172.17.40.29:6808/6327 pipe(0x7fa2e40203c0 sd=19 pgs=59 cs=1 l=0).reader got ack seq 294 >= 294 on 0x7fa2c8958850 osd_ping(e5 as_of 5) v1
osd.15.log:1190478:2011-03-11 12:38:17.930663 7fa2daf6f940 RefCountedObject::put delete 0x7fa2c8958850 took 12.6896 secs!
osd.15.log:1190479:2011-03-11 12:38:17.930684 7fa2daf6f940 -- 172.17.40.22:6823/10039 >> 172.17.40.29:6808/6327 pipe(0x7fa2e40203c0 sd=19 pgs=59 cs=1 l=0).handle_ack finished put on 0x7fa2c8958850

osd.34.log:1082690:2011-03-11 12:38:32.845454 7f8983d33940 -- 172.17.40.25:6808/8345 --> osd87 172.17.40.32:6823/8229 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f897c4ca050
osd.34.log:1082754:2011-03-11 12:38:32.850109 7f8973777940 -- 172.17.40.25:6808/8345 >> 172.17.40.32:6823/8229 pipe(0x7f897c21bd40 sd=102 pgs=52 cs=1 l=0).writer encoding 326 0x7f897c4ca050 osd_ping(e5 as_of 5) v1
osd.34.log:1082755:2011-03-11 12:38:32.850123 7f8973777940 -- 172.17.40.25:6808/8345 >> 172.17.40.32:6823/8229 pipe(0x7f897c21bd40 sd=102 pgs=52 cs=1 l=0).writer sending 326 0x7f897c4ca050
osd.34.log:1082756:2011-03-11 12:38:32.850135 7f8973777940 -- 172.17.40.25:6808/8345 >> 172.17.40.32:6823/8229 pipe(0x7f897c21bd40 sd=102 pgs=52 cs=1 l=0).write_message 0x7f897c4ca050 osd_ping(e5 as_of 5) v1
osd.34.log:1083617:2011-03-11 12:38:33.055826 7f8972c6c940 -- 172.17.40.25:6808/8345 >> 172.17.40.32:6823/8229 pipe(0x7f897c21bd40 sd=102 pgs=52 cs=1 l=0).reader got ack seq 326 >= 326 on 0x7f897c4ca050 osd_ping(e5 as_of 5) v1
osd.34.log:1094501:2011-03-11 12:38:57.023129 7f8972c6c940 RefCountedObject::put delete 0x7f897c4ca050 took 23.9673 secs!
osd.34.log:1094503:2011-03-11 12:38:57.023162 7f8972c6c940 -- 172.17.40.25:6808/8345 >> 172.17.40.32:6823/8229 pipe(0x7f897c21bd40 sd=102 pgs=52 cs=1 l=0).handle_ack finished put on 0x7f897c4ca050

osd.36.log:1020000:2011-03-11 12:38:31.508458 7f76551b6940 -- 172.17.40.25:6814/8529 --> osd61 172.17.40.29:6817/6627 -- osd_ping(e5 as_of 5) v1 -- ?+0 0x7f762842cef0
osd.36.log:1020336:2011-03-11 12:38:31.725684 7f764cbcb940 -- 172.17.40.25:6814/8529 >> 172.17.40.29:6817/6627 pipe(0x7f765001a9a0 sd=39 pgs=43 cs=1 l=0).writer encoding 314 0x7f762842cef0 osd_ping(e5 as_of 5) v1
osd.36.log:1020337:2011-03-11 12:38:31.725697 7f764cbcb940 -- 172.17.40.25:6814/8529 >> 172.17.40.29:6817/6627 pipe(0x7f765001a9a0 sd=39 pgs=43 cs=1 l=0).writer sending 314 0x7f762842cef0
osd.36.log:1020338:2011-03-11 12:38:31.725708 7f764cbcb940 -- 172.17.40.25:6814/8529 >> 172.17.40.29:6817/6627 pipe(0x7f765001a9a0 sd=39 pgs=43 cs=1 l=0).write_message 0x7f762842cef0 osd_ping(e5 as_of 5) v1
osd.36.log:1025062:2011-03-11 12:38:33.451183 7f764cdcd940 -- 172.17.40.25:6814/8529 >> 172.17.40.29:6817/6627 pipe(0x7f765001a9a0 sd=39 pgs=43 cs=1 l=0).reader got ack seq 314 >= 314 on 0x7f762842cef0 osd_ping(e5 as_of 5) v1
osd.36.log:1038223:2011-03-11 12:38:57.171276 7f764cdcd940 RefCountedObject::put delete 0x7f762842cef0 took 23.7201 secs!
osd.36.log:1038224:2011-03-11 12:38:57.171288 7f764cdcd940 -- 172.17.40.25:6814/8529 >> 172.17.40.29:6817/6627 pipe(0x7f765001a9a0 sd=39 pgs=43 cs=1 l=0).handle_ack finished put on 0x7f762842cef0

-- Jim

> 
> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 21:13                                   ` Jim Schutt
@ 2011-03-11 21:37                                     ` Sage Weil
  2011-03-11 22:21                                       ` Jim Schutt
  2011-03-30 21:26                                       ` Jim Schutt
  0 siblings, 2 replies; 94+ messages in thread
From: Sage Weil @ 2011-03-11 21:37 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Fri, 11 Mar 2011, Jim Schutt wrote:
> 
> On Fri, 2011-03-11 at 11:37 -0700, Sage Weil wrote:
> > On Fri, 11 Mar 2011, Jim Schutt wrote:
> > > So none of those were osd_ping messages.
> > > 
> > > But, I still had lots of delayed acks.  Here's a couple more examples:
> > > 
> > > osd.0.log:960713:2011-03-11 10:55:32.117721 7fc4cb7fe940 -- 172.17.40.21:6802/28363 --> osd74 172.17.40.31:6808/24916 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7fc4cc16f270
> > > osd.0.log:960756:2011-03-11 10:55:32.118395 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer encoding 289 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > > osd.0.log:960757:2011-03-11 10:55:32.118409 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer sending 289 0x7fc4cc16f270
> > > osd.0.log:960758:2011-03-11 10:55:32.118422 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).write_message 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > 
> > This bit has me confused:
> > 
> > > osd.0.log:963163:2011-03-11 10:55:32.941413 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).reader got ack seq 289 >= 289 on 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > > osd.0.log:964273:2011-03-11 10:55:33.447526 7fc4c61b6940 RefCountedObject::put delete 0x7fc4cc16f270 took 5.06013 secs!
> > > osd.0.log:964274:2011-03-11 10:55:33.447538 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).handle_ack finished put on 0x7fc4cc16f270
> > 
> > It looks like ~ .5 seconds has gone by for that thread, but the ::put 
> > debug says 5 seconds.  It happens between the 'got ack seq' and 'finished 
> > put' lines, though, right?  
> 
> OK, with Greg's utime_t double fix, I git no hits on
> 
> # egrep -Hn --color -e "->put took" osd.*.log
> 
> 
> But, I still get lots of stalled :RefCountedObject::put delete:
> 
> # grep -Hn RefCountedObject::put osd.*.log | egrep "took [1-9]\." | wc -l
> 8911
> 
> # grep -Hn RefCountedObject::put osd.*.log | egrep "took [0-9][0-9]\." | wc -l
> 415

Hmm!  That does seem to point at the allocator, doesn't it.

Other threads are doing work during this long interval?  Including freeing 
memory, presumably, since basically everything uses the heap one way or 
another.  If it's the allocator, it's somehow affecting one thread only, 
which is pretty crazy.

Is it difficult for you to try this with tcmalloc?  That'll tell us 
something.

One other possibility would be to try to catch this "in the act" and send 
it a SIGABRT to get a core dump.  Then we can look in more detail at what 
this (and other) threads are up to.  I'm not sure how easy this is to 
catch on a particular node...

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 21:37                                     ` Sage Weil
@ 2011-03-11 22:21                                       ` Jim Schutt
  2011-03-11 22:26                                         ` Jim Schutt
  2011-03-30 21:26                                       ` Jim Schutt
  1 sibling, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-11 22:21 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Fri, 2011-03-11 at 14:37 -0700, Sage Weil wrote:
> On Fri, 11 Mar 2011, Jim Schutt wrote:
> > 
> > On Fri, 2011-03-11 at 11:37 -0700, Sage Weil wrote:
> > > On Fri, 11 Mar 2011, Jim Schutt wrote:
> > > > So none of those were osd_ping messages.
> > > > 
> > > > But, I still had lots of delayed acks.  Here's a couple more examples:
> > > > 
> > > > osd.0.log:960713:2011-03-11 10:55:32.117721 7fc4cb7fe940 -- 172.17.40.21:6802/28363 --> osd74 172.17.40.31:6808/24916 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7fc4cc16f270
> > > > osd.0.log:960756:2011-03-11 10:55:32.118395 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer encoding 289 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > > > osd.0.log:960757:2011-03-11 10:55:32.118409 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer sending 289 0x7fc4cc16f270
> > > > osd.0.log:960758:2011-03-11 10:55:32.118422 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).write_message 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > > 
> > > This bit has me confused:
> > > 
> > > > osd.0.log:963163:2011-03-11 10:55:32.941413 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).reader got ack seq 289 >= 289 on 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > > > osd.0.log:964273:2011-03-11 10:55:33.447526 7fc4c61b6940 RefCountedObject::put delete 0x7fc4cc16f270 took 5.06013 secs!
> > > > osd.0.log:964274:2011-03-11 10:55:33.447538 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).handle_ack finished put on 0x7fc4cc16f270
> > > 
> > > It looks like ~ .5 seconds has gone by for that thread, but the ::put 
> > > debug says 5 seconds.  It happens between the 'got ack seq' and 'finished 
> > > put' lines, though, right?  
> > 
> > OK, with Greg's utime_t double fix, I git no hits on
> > 
> > # egrep -Hn --color -e "->put took" osd.*.log
> > 
> > 
> > But, I still get lots of stalled :RefCountedObject::put delete:
> > 
> > # grep -Hn RefCountedObject::put osd.*.log | egrep "took [1-9]\." | wc -l
> > 8911
> > 
> > # grep -Hn RefCountedObject::put osd.*.log | egrep "took [0-9][0-9]\." | wc -l
> > 415
> 
> Hmm!  That does seem to point at the allocator, doesn't it.
> 
> Other threads are doing work during this long interval?  Including freeing 
> memory, presumably, since basically everything uses the heap one way or 
> another.  If it's the allocator, it's somehow affecting one thread only, 
> which is pretty crazy.

As an example,  consider this period:

osd.34.log:1083828:2011-03-11 12:38:33.091345 7f8972f6f940 -- 172.17.40.25:6808/8345 >> 172.17.40.30:6814/8599 pipe(0x7f897c186c70 sd=109 pgs=64 cs=1 l=0).reader got ack seq 326 >= 326 on 0x7f897c47cd00 osd_ping(e5 as_of 5) v1
osd.34.log:1096768:2011-03-11 12:38:57.143061 7f8972f6f940 RefCountedObject::put delete 0x7f897c47cd00 took 23.9321 secs!
osd.34.log:1096769:2011-03-11 12:38:57.143079 7f8972f6f940 -- 172.17.40.25:6808/8345 >> 172.17.40.30:6814/8599 pipe(0x7f897c186c70 sd=109 pgs=64 cs=1 l=0).handle_ack finished put on 0x7f897c47cd00

In that period 225 unique threads ran, 10887 lines of logging
were generated, of which 9985 contained "pipe".  So, lots of
network activity.  6881 lines of log contained "reader", 
1715 lines of log contained "writer".

And, one other instance of a stalled delete:

# egrep -Hn -B 1000000 "^2011-03-11 12:38:57.023162" osd.34.log | egrep -A 1000000 "^osd[^ ]*011-03-11 12:38:33.055826" | grep "put delete"
osd.34.log-1094490-2011-03-11 12:38:57.022896 7f8975c9c940 RefCountedObject::put delete 0x7f897c3484e0 took 16.0578 secs!
osd.34.log-1094501-2011-03-11 12:38:57.023129 7f8972c6c940 RefCountedObject::put delete 0x7f897c4ca050 took 23.9673 secs!

In fact, check this out:

osd.34.log:1021637:2011-03-11 12:38:16.680723 7f897a2e2940 RefCountedObject::put delete 0x7f897c27ed20 took 0.697538 secs!
osd.34.log:1021641:2011-03-11 12:38:16.680776 7f8982327940 RefCountedObject::put delete 0x7f897c3a4450 took 0.67412 secs!
osd.34.log:1021649:2011-03-11 12:38:16.680899 7f896e727940 RefCountedObject::put delete 0x7f897c414ad0 took 0.654786 secs!
osd.34.log:1021667:2011-03-11 12:38:16.681150 7f8972666940 RefCountedObject::put delete 0x7f897c26a810 took 0.71042 secs!
osd.34.log:1021681:2011-03-11 12:38:16.681374 7f897b9f9940 RefCountedObject::put delete 0x7f897c27a530 took 0.659059 secs!
osd.34.log:1021692:2011-03-11 12:38:16.681535 7f8981115940 RefCountedObject::put delete 0x7f897c414050 took 0.712788 secs!
osd.34.log:1021717:2011-03-11 12:38:16.681886 7f8972e6e940 RefCountedObject::put delete 0x7f897c39c1d0 took 0.712481 secs!
osd.34.log:1021744:2011-03-11 12:38:16.682309 7f8980408940 RefCountedObject::put delete 0x7f897c436050 took 0.718925 secs!
osd.34.log:1041356:2011-03-11 12:38:20.835840 7f8972969940 RefCountedObject::put delete 0x27e4960 took 0.653973 secs!
osd.34.log:1041376:2011-03-11 12:38:20.838095 7f8972666940 RefCountedObject::put delete 0x7f894c09e740 took 0.749359 secs!
osd.34.log:1041476:2011-03-11 12:38:20.848764 7f8972060940 RefCountedObject::put delete 0x2ce42e0 took 0.819086 secs!
osd.34.log:1041732:2011-03-11 12:38:20.861206 7f8981418940 RefCountedObject::put delete 0x7f897c3d1530 took 0.566482 secs!
osd.34.log:1068451:2011-03-11 12:38:28.167331 7f896fa3a940 RefCountedObject::put delete 0x7f895482f400 took 1.70164 secs!
osd.34.log:1094490:2011-03-11 12:38:57.022896 7f8975c9c940 RefCountedObject::put delete 0x7f897c3484e0 took 16.0578 secs!
osd.34.log:1094501:2011-03-11 12:38:57.023129 7f8972c6c940 RefCountedObject::put delete 0x7f897c4ca050 took 23.9673 secs!
osd.34.log:1095007:2011-03-11 12:38:57.041197 7f8975999940 RefCountedObject::put delete 0x7f897c13aa20 took 18.3308 secs!
osd.34.log:1095020:2011-03-11 12:38:57.041461 7f8971e5e940 RefCountedObject::put delete 0x7f897c4ca210 took 23.9684 secs!
osd.34.log:1095023:2011-03-11 12:38:57.041506 7f8970b4b940 RefCountedObject::put delete 0x7f897c4ca790 took 23.9683 secs!
osd.34.log:1095028:2011-03-11 12:38:57.041585 7f8973474940 RefCountedObject::put delete 0x7f897c4ca5d0 took 23.9673 secs!
osd.34.log:1095042:2011-03-11 12:38:57.041870 7f8979ddd940 RefCountedObject::put delete 0x7f897c4c4dc0 took 23.9619 secs!
osd.34.log:1095047:2011-03-11 12:38:57.041955 7f897b9f9940 RefCountedObject::put delete 0x7f897c4ca3d0 took 23.9488 secs!
osd.34.log:1095050:2011-03-11 12:38:57.042000 7f897a2e2940 RefCountedObject::put delete 0x7f897c47cac0 took 23.8742 secs!
osd.34.log:1095053:2011-03-11 12:38:57.042048 7f8979fdf940 RefCountedObject::put delete 0x7f897c4c44c0 took 23.7631 secs!
osd.34.log:1095055:2011-03-11 12:38:57.042083 7f8982327940 RefCountedObject::put delete 0x7f897c3e96d0 took 17.3037 secs!
osd.34.log:1095058:2011-03-11 12:38:57.042139 7f8978bcb940 RefCountedObject::put delete 0x7f894c8894a0 took 15.7804 secs!
osd.34.log:1095062:2011-03-11 12:38:57.042210 7f8975898940 RefCountedObject::put delete 0x7f894c00c010 took 15.5001 secs!
osd.34.log:1095284:2011-03-11 12:38:57.050616 7f897b6f6940 RefCountedObject::put delete 0x7f897c32e050 took 17.5251 secs!
osd.34.log:1095302:2011-03-11 12:38:57.050974 7f8973171940 RefCountedObject::put delete 0x7f897c4c4b80 took 16.4301 secs!
osd.34.log:1095425:2011-03-11 12:38:57.052941 7f8980408940 RefCountedObject::put delete 0x7f897c0ef9f0 took 13.3016 secs!
osd.34.log:1095433:2011-03-11 12:38:57.053154 7f8981418940 RefCountedObject::put delete 0x7f897c4c4700 took 23.7634 secs!
osd.34.log:1095438:2011-03-11 12:38:57.053243 7f897bafa940 RefCountedObject::put delete 0x7f897c4c42c0 took 23.737 secs!
osd.34.log:1095444:2011-03-11 12:38:57.053343 7f896e828940 RefCountedObject::put delete 0x7f894c4010a0 took 15.5752 secs!
osd.34.log:1095748:2011-03-11 12:38:57.087529 7f89769a9940 RefCountedObject::put delete 0x7f897c251aa0 took 15.4771 secs!
osd.34.log:1095770:2011-03-11 12:38:57.088979 7f896ea2a940 RefCountedObject::put delete 0x7f897c24a550 took 15.6479 secs!
osd.34.log:1095795:2011-03-11 12:38:57.089647 7f89797d7940 RefCountedObject::put delete 0x7f897c4c4940 took 23.9601 secs!
osd.34.log:1095852:2011-03-11 12:38:57.091027 7f89777b7940 RefCountedObject::put delete 0x7f894c83d640 took 12.1841 secs!
osd.34.log:1096768:2011-03-11 12:38:57.143061 7f8972f6f940 RefCountedObject::put delete 0x7f897c47cd00 took 23.9321 secs!
osd.34.log:1097052:2011-03-11 12:38:57.147997 7f898080c940 RefCountedObject::put delete 0x7f894c4142a0 took 16.1565 secs!


> 
> Is it difficult for you to try this with tcmalloc?  That'll tell us 
> something.

I don't think so.  But, I'm out next week so won't get to
it until Mar 21 :(

> 
> One other possibility would be to try to catch this "in the act" and send 
> it a SIGABRT to get a core dump.  Then we can look in more detail at what 
> this (and other) threads are up to.  I'm not sure how easy this is to 
> catch on a particular node...

So it occurs to me that one call to Message::put() entails many 
calls to buffer::ptr::release(), depending on what the message 
is, right?  Maybe time the "delete _raw" in there and assert() 
if it's too long?

-- Jim

> 
> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 22:21                                       ` Jim Schutt
@ 2011-03-11 22:26                                         ` Jim Schutt
  2011-03-11 22:45                                           ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-11 22:26 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Fri, 2011-03-11 at 15:21 -0700, Jim Schutt wrote:
> On Fri, 2011-03-11 at 14:37 -0700, Sage Weil wrote:
> > On Fri, 11 Mar 2011, Jim Schutt wrote:
> > > 
> > > On Fri, 2011-03-11 at 11:37 -0700, Sage Weil wrote:
> > > > On Fri, 11 Mar 2011, Jim Schutt wrote:
> > > > > So none of those were osd_ping messages.
> > > > > 
> > > > > But, I still had lots of delayed acks.  Here's a couple more examples:
> > > > > 
> > > > > osd.0.log:960713:2011-03-11 10:55:32.117721 7fc4cb7fe940 -- 172.17.40.21:6802/28363 --> osd74 172.17.40.31:6808/24916 -- osd_ping(e14 as_of 14) v1 -- ?+0 0x7fc4cc16f270
> > > > > osd.0.log:960756:2011-03-11 10:55:32.118395 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer encoding 289 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > > > > osd.0.log:960757:2011-03-11 10:55:32.118409 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).writer sending 289 0x7fc4cc16f270
> > > > > osd.0.log:960758:2011-03-11 10:55:32.118422 7fc4c96eb940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).write_message 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > > > 
> > > > This bit has me confused:
> > > > 
> > > > > osd.0.log:963163:2011-03-11 10:55:32.941413 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).reader got ack seq 289 >= 289 on 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
> > > > > osd.0.log:964273:2011-03-11 10:55:33.447526 7fc4c61b6940 RefCountedObject::put delete 0x7fc4cc16f270 took 5.06013 secs!
> > > > > osd.0.log:964274:2011-03-11 10:55:33.447538 7fc4c61b6940 -- 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53 pgs=47 cs=1 l=0).handle_ack finished put on 0x7fc4cc16f270
> > > > 
> > > > It looks like ~ .5 seconds has gone by for that thread, but the ::put 
> > > > debug says 5 seconds.  It happens between the 'got ack seq' and 'finished 
> > > > put' lines, though, right?  
> > > 
> > > OK, with Greg's utime_t double fix, I git no hits on
> > > 
> > > # egrep -Hn --color -e "->put took" osd.*.log
> > > 
> > > 
> > > But, I still get lots of stalled :RefCountedObject::put delete:
> > > 
> > > # grep -Hn RefCountedObject::put osd.*.log | egrep "took [1-9]\." | wc -l
> > > 8911
> > > 
> > > # grep -Hn RefCountedObject::put osd.*.log | egrep "took [0-9][0-9]\." | wc -l
> > > 415
> > 
> > Hmm!  That does seem to point at the allocator, doesn't it.
> > 
> > Other threads are doing work during this long interval?  Including freeing 
> > memory, presumably, since basically everything uses the heap one way or 
> > another.  If it's the allocator, it's somehow affecting one thread only, 
> > which is pretty crazy.
> 
> As an example,  consider this period:
> 
> osd.34.log:1083828:2011-03-11 12:38:33.091345 7f8972f6f940 -- 172.17.40.25:6808/8345 >> 172.17.40.30:6814/8599 pipe(0x7f897c186c70 sd=109 pgs=64 cs=1 l=0).reader got ack seq 326 >= 326 on 0x7f897c47cd00 osd_ping(e5 as_of 5) v1
> osd.34.log:1096768:2011-03-11 12:38:57.143061 7f8972f6f940 RefCountedObject::put delete 0x7f897c47cd00 took 23.9321 secs!
> osd.34.log:1096769:2011-03-11 12:38:57.143079 7f8972f6f940 -- 172.17.40.25:6808/8345 >> 172.17.40.30:6814/8599 pipe(0x7f897c186c70 sd=109 pgs=64 cs=1 l=0).handle_ack finished put on 0x7f897c47cd00
> 
> In that period 225 unique threads ran, 10887 lines of logging
> were generated, of which 9985 contained "pipe".  So, lots of
> network activity.  6881 lines of log contained "reader", 
> 1715 lines of log contained "writer".
> 
> And, one other instance of a stalled delete:
> 
> # egrep -Hn -B 1000000 "^2011-03-11 12:38:57.023162" osd.34.log | egrep -A 1000000 "^osd[^ ]*011-03-11 12:38:33.055826" | grep "put delete"
> osd.34.log-1094490-2011-03-11 12:38:57.022896 7f8975c9c940 RefCountedObject::put delete 0x7f897c3484e0 took 16.0578 secs!
> osd.34.log-1094501-2011-03-11 12:38:57.023129 7f8972c6c940 RefCountedObject::put delete 0x7f897c4ca050 took 23.9673 secs!


> 
> 
> > 
> > Is it difficult for you to try this with tcmalloc?  That'll tell us 
> > something.
> 
> I don't think so.  But, I'm out next week so won't get to
> it until Mar 21 :(
> 
> > 
> > One other possibility would be to try to catch this "in the act" and send 
> > it a SIGABRT to get a core dump.  Then we can look in more detail at what 
> > this (and other) threads are up to.  I'm not sure how easy this is to 
> > catch on a particular node...
> 
> So it occurs to me that one call to Message::put() entails many 
> calls to buffer::ptr::release(), depending on what the message 
> is, right?  Maybe time the "delete _raw" in there and assert() 
> if it's too long?

Also, any chance all incoming data is causing buffer_total_alloc
to be contended?  I don't have libatomic_ops either, so that
atomic_t is implemented via a pthread_spinlock_t, right?
How to check that?

-- Jim

> 
> -- Jim
> 
> > 
> > sage
> > 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 22:26                                         ` Jim Schutt
@ 2011-03-11 22:45                                           ` Sage Weil
  2011-03-11 23:29                                             ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-11 22:45 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Fri, 11 Mar 2011, Jim Schutt wrote:
> > So it occurs to me that one call to Message::put() entails many 
> > calls to buffer::ptr::release(), depending on what the message 
> > is, right?  Maybe time the "delete _raw" in there and assert() 
> > if it's too long?
> 
> Also, any chance all incoming data is causing buffer_total_alloc
> to be contended?  I don't have libatomic_ops either, so that
> atomic_t is implemented via a pthread_spinlock_t, right?
> How to check that?

Hmm, it could be.  I pushed a nobuffer branch that compiles out the 
buffer_total_alloc accounting, if you want to give that a go.

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 22:45                                           ` Sage Weil
@ 2011-03-11 23:29                                             ` Jim Schutt
  0 siblings, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-03-11 23:29 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel


On Fri, 2011-03-11 at 15:45 -0700, Sage Weil wrote:
> On Fri, 11 Mar 2011, Jim Schutt wrote:
> > > So it occurs to me that one call to Message::put() entails many 
> > > calls to buffer::ptr::release(), depending on what the message 
> > > is, right?  Maybe time the "delete _raw" in there and assert() 
> > > if it's too long?
> > 
> > Also, any chance all incoming data is causing buffer_total_alloc
> > to be contended?  I don't have libatomic_ops either, so that
> > atomic_t is implemented via a pthread_spinlock_t, right?
> > How to check that?
> 
> Hmm, it could be.  I pushed a nobuffer branch that compiles out the 
> buffer_total_alloc accounting, if you want to give that a go.

That seems to have helped, although it's not a complete solution.

I still got some OSDs failed, but since I use

        osd min down reporters = 3
        osd min down reports = 2

only 1 OSD got marked down; it noticed quickly and marked
itself up, and my 64-client dd finished.  That's new for
me at 96 OSDs.

I saw this on this run:

# grep -Hn RefCountedObject::put osd.*.log | egrep "took [0-9][0-9]\." | wc -l 
192

# grep -Hn RefCountedObject::put osd.*.log | egrep "took [1-9]\." | wc -l
12578

which compares to a previous run in an earlier email:


> > > # grep -Hn RefCountedObject::put osd.*.log | egrep "took [1-9]\." | wc -l
> > > 8911
> > > 
> > > # grep -Hn RefCountedObject::put osd.*.log | egrep "took [0-9][0-9]\." | wc -l
> > > 415

-- Jim

> 
> sage
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-11 21:37                                     ` Sage Weil
  2011-03-11 22:21                                       ` Jim Schutt
@ 2011-03-30 21:26                                       ` Jim Schutt
  2011-03-30 21:55                                         ` Sage Weil
  1 sibling, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-30 21:26 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

Hi Sage,

Sage Weil wrote:
> On Fri, 11 Mar 2011, Jim Schutt wrote:
>> On Fri, 2011-03-11 at 11:37 -0700, Sage Weil wrote:
>>> On Fri, 11 Mar 2011, Jim Schutt wrote:
>>>> So none of those were osd_ping messages.
>>>>
>>>> But, I still had lots of delayed acks.  Here's a couple more examples:


> 
> Hmm!  That does seem to point at the allocator, doesn't it.
> 
> Other threads are doing work during this long interval?  Including freeing 
> memory, presumably, since basically everything uses the heap one way or 
> another.  If it's the allocator, it's somehow affecting one thread only, 
> which is pretty crazy.
> 
> Is it difficult for you to try this with tcmalloc?  That'll tell us 
> something.

I finally had a chance to rerun this testing, using
tcmalloc (from google-perftools v1.7) and libatomic_ops (v1.2-2)
against current next branch (commit a2ec936a7cd1c).

I still get lots of slow RefCountedObject::put calls.

> 
> One other possibility would be to try to catch this "in the act" and send 
> it a SIGABRT to get a core dump.  Then we can look in more detail at what 
> this (and other) threads are up to.  I'm not sure how easy this is to 
> catch on a particular node...

I'll try this next, assuming that using an assert
in RefCountedObject::put that() "delete this" takes
less than, say, 1 second will catch the state of
the other threads at an interesting place.

Does that sound OK?

-- Jim

> 
> sage
> 
> 


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-30 21:26                                       ` Jim Schutt
@ 2011-03-30 21:55                                         ` Sage Weil
  2011-03-31 14:16                                           ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-30 21:55 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Wed, 30 Mar 2011, Jim Schutt wrote:
> > Hmm!  That does seem to point at the allocator, doesn't it.
> > 
> > Other threads are doing work during this long interval?  Including freeing
> > memory, presumably, since basically everything uses the heap one way or
> > another.  If it's the allocator, it's somehow affecting one thread only,
> > which is pretty crazy.
> > 
> > Is it difficult for you to try this with tcmalloc?  That'll tell us
> > something.
> 
> I finally had a chance to rerun this testing, using
> tcmalloc (from google-perftools v1.7) and libatomic_ops (v1.2-2)
> against current next branch (commit a2ec936a7cd1c).
> 
> I still get lots of slow RefCountedObject::put calls.

Argh...

We've verified there's no swapping going on, right?  The allocator isn't 
touching cold pages and waiting for them to be swapped in or something?

> > One other possibility would be to try to catch this "in the act" and send it
> > a SIGABRT to get a core dump.  Then we can look in more detail at what this
> > (and other) threads are up to.  I'm not sure how easy this is to catch on a
> > particular node...
> 
> I'll try this next, assuming that using an assert
> in RefCountedObject::put that() "delete this" takes
> less than, say, 1 second will catch the state of
> the other threads at an interesting place.
> 
> Does that sound OK?

I was actually suggesting we try to make it core dump inside the "delete 
this" and watching for a stall in progress and then sending SIGABRT to 
dump core in the act.  That way we verify it really is in the allocator 
(and maybe even see where).  That's a bit harder to set up, though!  

Dumping right after may still yield some useful info, but I'm less 
hopeful...

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-30 21:55                                         ` Sage Weil
@ 2011-03-31 14:16                                           ` Jim Schutt
  2011-03-31 16:25                                             ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-31 14:16 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

Sage Weil wrote:
> On Wed, 30 Mar 2011, Jim Schutt wrote:
>>> Hmm!  That does seem to point at the allocator, doesn't it.
>>>
>>> Other threads are doing work during this long interval?  Including freeing
>>> memory, presumably, since basically everything uses the heap one way or
>>> another.  If it's the allocator, it's somehow affecting one thread only,
>>> which is pretty crazy.
>>>
>>> Is it difficult for you to try this with tcmalloc?  That'll tell us
>>> something.
>> I finally had a chance to rerun this testing, using
>> tcmalloc (from google-perftools v1.7) and libatomic_ops (v1.2-2)
>> against current next branch (commit a2ec936a7cd1c).
>>
>> I still get lots of slow RefCountedObject::put calls.
> 
> Argh...
> 
> We've verified there's no swapping going on, right?  The allocator isn't 
> touching cold pages and waiting for them to be swapped in or something?

There's no swap configured on these nodes ;)

> 
>>> One other possibility would be to try to catch this "in the act" and send it
>>> a SIGABRT to get a core dump.  Then we can look in more detail at what this
>>> (and other) threads are up to.  I'm not sure how easy this is to catch on a
>>> particular node...
>> I'll try this next, assuming that using an assert
>> in RefCountedObject::put that() "delete this" takes
>> less than, say, 1 second will catch the state of
>> the other threads at an interesting place.
>>
>> Does that sound OK?
> 
> I was actually suggesting we try to make it core dump inside the "delete 
> this" and watching for a stall in progress and then sending SIGABRT to 
> dump core in the act.  That way we verify it really is in the allocator 
> (and maybe even see where).  That's a bit harder to set up, though!  

Right, I couldn't think of how to automate that stall detection
during the stall, rather than after.  At least, I couldn't
think of how to do it without incurring possibly excessive
overhead, say by starting a timer on every "delete this".

> 
> Dumping right after may still yield some useful info, but I'm less 
> hopeful...

I thought I might try turning off all debugging, except a notice
that the "delete this" took too long.  This is easy to do, and
would tell us if allocator activity in support of debugging is
affecting operations.  It doesn't lead to any ideas for
improving the situation, though :/

Also, since I built tcmalloc from source, I thought I might
try to figure out what operation is taking too long there.
I'm hoping Ceph logging redirection is set up so that stdout
or stderr from tcmalloc would show up in my log files?

-- Jim

> 
> sage
> 
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-31 14:16                                           ` Jim Schutt
@ 2011-03-31 16:25                                             ` Sage Weil
  2011-03-31 17:00                                               ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-31 16:25 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Thu, 31 Mar 2011, Jim Schutt wrote:
> > I was actually suggesting we try to make it core dump inside the "delete
> > this" and watching for a stall in progress and then sending SIGABRT to dump
> > core in the act.  That way we verify it really is in the allocator (and
> > maybe even see where).  That's a bit harder to set up, though!  
> 
> Right, I couldn't think of how to automate that stall detection
> during the stall, rather than after.  At least, I couldn't
> think of how to do it without incurring possibly excessive
> overhead, say by starting a timer on every "delete this".

Yeah.  I wonder if dumping core on a cosd right when it gets marked down 
would do the trick?  That should catch it ~20 seconds or whatever in the 
stall.  By watching for the "osdfoo marked down" messages from ceph -w?

> > Dumping right after may still yield some useful info, but I'm less
> > hopeful...
> 
> I thought I might try turning off all debugging, except a notice
> that the "delete this" took too long.  This is easy to do, and
> would tell us if allocator activity in support of debugging is
> affecting operations.  It doesn't lead to any ideas for
> improving the situation, though :/
> 
> Also, since I built tcmalloc from source, I thought I might
> try to figure out what operation is taking too long there.
> I'm hoping Ceph logging redirection is set up so that stdout
> or stderr from tcmalloc would show up in my log files?

Not with the default logging stuff.  However, you can run the daemons with 
'-d' and they will stay in the foreground and log to stderr.  Or -f will 
send the ceph logs to their usual locations, but the daemon won't fork and 
you can redirect stdout/stderr (with any tcmalloc stuff) wherever you 
like.

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-31 16:25                                             ` Sage Weil
@ 2011-03-31 17:00                                               ` Jim Schutt
  2011-03-31 17:10                                                 ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-31 17:00 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

Sage Weil wrote:
> On Thu, 31 Mar 2011, Jim Schutt wrote:
>>> I was actually suggesting we try to make it core dump inside the "delete
>>> this" and watching for a stall in progress and then sending SIGABRT to dump
>>> core in the act.  That way we verify it really is in the allocator (and
>>> maybe even see where).  That's a bit harder to set up, though!  
>> Right, I couldn't think of how to automate that stall detection
>> during the stall, rather than after.  At least, I couldn't
>> think of how to do it without incurring possibly excessive
>> overhead, say by starting a timer on every "delete this".
> 
> Yeah.  I wonder if dumping core on a cosd right when it gets marked down 
> would do the trick?  That should catch it ~20 seconds or whatever in the 
> stall.  By watching for the "osdfoo marked down" messages from ceph -w?

What about making Cond::Wait() use pthread_cond_timedwait()
with a suitable timeout value, say 10 seconds, and asserting
on timeout?  Do you think there would be many legitimate 10
second delays in OSD processing?

If you think that's not a useful idea, I'll try something
as you suggest.  Since the trigger is most likely on a
different node from where I need to send the signal, I'm a
little worried that the ssh connect time will delay things
enough so that the core files won't be useful.

But I'll try it if we can't come up with something that
has a higher probability of success.

> 
>>> Dumping right after may still yield some useful info, but I'm less
>>> hopeful...
>> I thought I might try turning off all debugging, except a notice
>> that the "delete this" took too long.  This is easy to do, and
>> would tell us if allocator activity in support of debugging is
>> affecting operations.  It doesn't lead to any ideas for
>> improving the situation, though :/
>>

Hmmph.  Less debugging output seemed to make this worse, if
it changed anything at all.

-- Jim



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-31 17:00                                               ` Jim Schutt
@ 2011-03-31 17:10                                                 ` Jim Schutt
  2011-03-31 17:24                                                   ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-31 17:10 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

Jim Schutt wrote:
> Sage Weil wrote:
>> On Thu, 31 Mar 2011, Jim Schutt wrote:
>>>> I was actually suggesting we try to make it core dump inside the 
>>>> "delete
>>>> this" and watching for a stall in progress and then sending SIGABRT 
>>>> to dump
>>>> core in the act.  That way we verify it really is in the allocator (and
>>>> maybe even see where).  That's a bit harder to set up, though!  
>>> Right, I couldn't think of how to automate that stall detection
>>> during the stall, rather than after.  At least, I couldn't
>>> think of how to do it without incurring possibly excessive
>>> overhead, say by starting a timer on every "delete this".
>>
>> Yeah.  I wonder if dumping core on a cosd right when it gets marked 
>> down would do the trick?  That should catch it ~20 seconds or whatever 
>> in the stall.  By watching for the "osdfoo marked down" messages from 
>> ceph -w?
> 
> What about making Cond::Wait() use pthread_cond_timedwait()
> with a suitable timeout value, say 10 seconds, and asserting
> on timeout?  Do you think there would be many legitimate 10
> second delays in OSD processing?
> 

Or, I could make a Cond::WaitIntervalOrAbort(), and
use it just on the pipe lock, since that's the source
of the trouble.  Sound useful?

-- Jim

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-31 17:10                                                 ` Jim Schutt
@ 2011-03-31 17:24                                                   ` Sage Weil
  2011-03-31 18:08                                                     ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-31 17:24 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Thu, 31 Mar 2011, Jim Schutt wrote:
> Jim Schutt wrote:
> > Sage Weil wrote:
> > > On Thu, 31 Mar 2011, Jim Schutt wrote:
> > > > > I was actually suggesting we try to make it core dump inside the
> > > > > "delete
> > > > > this" and watching for a stall in progress and then sending SIGABRT to
> > > > > dump
> > > > > core in the act.  That way we verify it really is in the allocator
> > > > > (and
> > > > > maybe even see where).  That's a bit harder to set up, though!  
> > > > Right, I couldn't think of how to automate that stall detection
> > > > during the stall, rather than after.  At least, I couldn't
> > > > think of how to do it without incurring possibly excessive
> > > > overhead, say by starting a timer on every "delete this".
> > > 
> > > Yeah.  I wonder if dumping core on a cosd right when it gets marked down
> > > would do the trick?  That should catch it ~20 seconds or whatever in the
> > > stall.  By watching for the "osdfoo marked down" messages from ceph -w?
> > 
> > What about making Cond::Wait() use pthread_cond_timedwait()
> > with a suitable timeout value, say 10 seconds, and asserting
> > on timeout?  Do you think there would be many legitimate 10
> > second delays in OSD processing?
> > 
> 
> Or, I could make a Cond::WaitIntervalOrAbort(), and
> use it just on the pipe lock, since that's the source
> of the trouble.  Sound useful?

Yeah that sounds like the way to go.. then you can hand pick the site(s) 
that is/are waiting a long time in this case and switch those to 
WaitIntervalOrAbort?  Hopefully the cond timer will go off despite 
whatever badness is going on in delete this...

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-31 17:24                                                   ` Sage Weil
@ 2011-03-31 18:08                                                     ` Jim Schutt
  2011-03-31 18:41                                                       ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-03-31 18:08 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

Sage Weil wrote:
> On Thu, 31 Mar 2011, Jim Schutt wrote:
>> Jim Schutt wrote:
>>> Sage Weil wrote:
>>>> On Thu, 31 Mar 2011, Jim Schutt wrote:
>>>>>> I was actually suggesting we try to make it core dump inside the
>>>>>> "delete
>>>>>> this" and watching for a stall in progress and then sending SIGABRT to
>>>>>> dump
>>>>>> core in the act.  That way we verify it really is in the allocator
>>>>>> (and
>>>>>> maybe even see where).  That's a bit harder to set up, though!  
>>>>> Right, I couldn't think of how to automate that stall detection
>>>>> during the stall, rather than after.  At least, I couldn't
>>>>> think of how to do it without incurring possibly excessive
>>>>> overhead, say by starting a timer on every "delete this".
>>>> Yeah.  I wonder if dumping core on a cosd right when it gets marked down
>>>> would do the trick?  That should catch it ~20 seconds or whatever in the
>>>> stall.  By watching for the "osdfoo marked down" messages from ceph -w?
>>> What about making Cond::Wait() use pthread_cond_timedwait()
>>> with a suitable timeout value, say 10 seconds, and asserting
>>> on timeout?  Do you think there would be many legitimate 10
>>> second delays in OSD processing?
>>>
>> Or, I could make a Cond::WaitIntervalOrAbort(), and
>> use it just on the pipe lock, since that's the source
>> of the trouble.  Sound useful?
> 
> Yeah that sounds like the way to go.. then you can hand pick the site(s) 
> that is/are waiting a long time in this case and switch those to 
> WaitIntervalOrAbort?  Hopefully the cond timer will go off despite 
> whatever badness is going on in delete this...

Actually, it occurs to me Wait() isn't what I'm after:
that is used to wait some unknown time for some event.

I think instead I need to use TryLock() on the pipe_lock
in submit_message(), in a loop with a suitable sleep,
say 100us, and assert when it takes too long to acquire
the lock.

So, maybe add a Mutex::LockOrAbort(), and use it in
submit_message()?

submit_message() is intended to return immediately, no?
And the issue is caused by heartbeat() being unable to
queue messages, so this sounds to me to be a useful
test.

Does that seem to have low enough overhead to
be useful?

-- Jim

> 
> sage
> 
> 


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-31 18:08                                                     ` Jim Schutt
@ 2011-03-31 18:41                                                       ` Sage Weil
  2011-04-01 22:38                                                         ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-03-31 18:41 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Thu, 31 Mar 2011, Jim Schutt wrote:
> Sage Weil wrote:
> > On Thu, 31 Mar 2011, Jim Schutt wrote:
> > > Jim Schutt wrote:
> > > > Sage Weil wrote:
> > > > > On Thu, 31 Mar 2011, Jim Schutt wrote:
> > > > > > > I was actually suggesting we try to make it core dump inside the
> > > > > > > "delete
> > > > > > > this" and watching for a stall in progress and then sending
> > > > > > > SIGABRT to
> > > > > > > dump
> > > > > > > core in the act.  That way we verify it really is in the allocator
> > > > > > > (and
> > > > > > > maybe even see where).  That's a bit harder to set up, though!  
> > > > > > Right, I couldn't think of how to automate that stall detection
> > > > > > during the stall, rather than after.  At least, I couldn't
> > > > > > think of how to do it without incurring possibly excessive
> > > > > > overhead, say by starting a timer on every "delete this".
> > > > > Yeah.  I wonder if dumping core on a cosd right when it gets marked
> > > > > down
> > > > > would do the trick?  That should catch it ~20 seconds or whatever in
> > > > > the
> > > > > stall.  By watching for the "osdfoo marked down" messages from ceph
> > > > > -w?
> > > > What about making Cond::Wait() use pthread_cond_timedwait()
> > > > with a suitable timeout value, say 10 seconds, and asserting
> > > > on timeout?  Do you think there would be many legitimate 10
> > > > second delays in OSD processing?
> > > > 
> > > Or, I could make a Cond::WaitIntervalOrAbort(), and
> > > use it just on the pipe lock, since that's the source
> > > of the trouble.  Sound useful?
> > 
> > Yeah that sounds like the way to go.. then you can hand pick the site(s)
> > that is/are waiting a long time in this case and switch those to
> > WaitIntervalOrAbort?  Hopefully the cond timer will go off despite whatever
> > badness is going on in delete this...
> 
> Actually, it occurs to me Wait() isn't what I'm after:
> that is used to wait some unknown time for some event.
> 
> I think instead I need to use TryLock() on the pipe_lock
> in submit_message(), in a loop with a suitable sleep,
> say 100us, and assert when it takes too long to acquire
> the lock.
> 
> So, maybe add a Mutex::LockOrAbort(), and use it in
> submit_message()?
> 
> submit_message() is intended to return immediately, no?
> And the issue is caused by heartbeat() being unable to
> queue messages, so this sounds to me to be a useful
> test.
> 
> Does that seem to have low enough overhead to
> be useful?

Yeah, that sounds right!

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-03-31 18:41                                                       ` Sage Weil
@ 2011-04-01 22:38                                                         ` Jim Schutt
  0 siblings, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-04-01 22:38 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

[-- Attachment #1: Type: text/plain, Size: 1719 bytes --]

Sage Weil wrote:
> On Thu, 31 Mar 2011, Jim Schutt wrote:

>> I think instead I need to use TryLock() on the pipe_lock
>> in submit_message(), in a loop with a suitable sleep,
>> say 100us, and assert when it takes too long to acquire
>> the lock.
>>
>> So, maybe add a Mutex::LockOrAbort(), and use it in
>> submit_message()?
>>
>> submit_message() is intended to return immediately, no?
>> And the issue is caused by heartbeat() being unable to
>> queue messages, so this sounds to me to be a useful
>> test.
>>
>> Does that seem to have low enough overhead to
>> be useful?
> 
> Yeah, that sounds right!

I gave this a try with LockOrAbort using a 5 second
timeout.

When you ignore all the threads waiting on a condition
variable, or in poll, this is what is left:

# egrep -v "__poll|pthread_cond_|__lll_lock_wait" thread-ids.txt
   395 Thread 18894  0x00007f60f804dd83 in do_writev (fd=11, vector=0x7f60f328fd00, count=5) at ../sysdeps/unix/sysv/linux/writev.c:46
   350 Thread 23189  0x00007f60f8ee9f2b in sendmsg () from /lib64/libpthread.so.0
   69 Thread 20612  0x00007f60f8ee9f2b in sendmsg () from /lib64/libpthread.so.0
   41 Thread 20474  0x00007f60f8ee9f2b in sendmsg () from /lib64/libpthread.so.0
   17 Thread 20649  0x00007f60f8ee9f2b in sendmsg () from /lib64/libpthread.so.0
* 1 Thread 20155  0x00007f60f8eea9dd in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41

I attached some text files with gdb output for
the stack trace of the aborting thread, a list of the
thread ids, and all the thread traces.

But, I haven't learned anything from this yet that helps
figure out the cause of the delay.

Can you think of anything I should try?

-- Jim

> 
> sage
> 
> 


[-- Attachment #2: thread-abort.txt.bz2 --]
[-- Type: application/x-bzip, Size: 929 bytes --]

[-- Attachment #3: thread-ids.txt.bz2 --]
[-- Type: application/x-bzip, Size: 2790 bytes --]

[-- Attachment #4: thread-stacks.txt.bz2 --]
[-- Type: application/x-bzip, Size: 11416 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-02-17  0:50     ` Sage Weil
  2011-02-17  0:54       ` Sage Weil
@ 2011-04-08 16:23       ` Jim Schutt
  2011-04-08 20:50         ` Sage Weil
  1 sibling, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-04-08 16:23 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

Hi Sage,

Sage Weil wrote:
> On Wed, 16 Feb 2011, Jim Schutt wrote:
>> On Wed, 2011-02-16 at 14:40 -0700, Gregory Farnum wrote:
>>> On Wednesday, February 16, 2011 at 1:25 PM, Jim Schutt wrote:
>>>> Hi,
>>>>
>>>> I've been testing v0.24.3 w/ 64 clients against
>>>> 1 mon, 1 mds, 96 osds. Under heavy write load I
>>>> see:
>>>>  [WRN] map e7 wrongly marked me down or wrong addr
>>>>
>>>> I was able to sort through the logs and discover that when
>>>> this happens I have large gaps (10 seconds or more) in osd
>>>> heatbeat processing. In those heartbeat gaps I've discovered
>>>> long periods (5-15 seconds) where an osd logs nothing, even
>>>> though I am running with debug osd/filestore/journal = 20.
>>>>
>>>> Is this a known issue?
>>> You're running on btrfs? 
>> Yep.
> 
> Are the cosd log files on the same btrfs volume as the btrfs data, or 
> elsewhere?  The heartbeat thread takes some pains to avoid any locks that 
> may be contented and do avoid any disk io, so in theory a btrfs stall 
> shouldn't affect anything.  We may have missed something.. do you have a 
> log showing this in action?

In the end, after all the various things I've tried, I think
that the root cause of this is relatively simple: I don't
have enough CPU cycles available on my servers to do the
amount of OSD processing required to service my client
load, given the number of OSDs per server I'm running.

With too much work and not enough cycles to do it, the
one real-time component of Ceph, heartbeat processing,
eventually must miss its deadline (no heartbeat "observed"
in osd_heartbeat_grace seconds), since it requires work
done by components (messengers, memory allocation system)
that don't provide real-time guarantees.

All of my experiences on this make perfect sense when
viewed from this perspective.

For example, when working with tcmalloc, I learned I
could compile it with CXXFLAGS=-DTCMALLOC_LARGE_PAGES,
which causes tcmalloc to allow objects up to 256k in
its thread caches, rather than the default 32k.  So I
used that in combination with a 256k stripe width, on
the theory that deallocating messages would mostly only
interact with the thread cache, but it didn't help.

When looking at thread stacks generated by my
Mutex::LockOrAbort trick with a 5 sec wait to acquire
the pipe_lock, I often saw threads waiting on the
DoutLocker mutex.  Since lots of Ceph debugging output
happens with other locks being held, debugging might
thus slow things down out of proportion to the processing
required to generate the log messages.  Yet, when I
configured no debugging, I saw no improvement; it might
be that things got a little worse.  This now makes sense
to me in light of my above hypothesis about not enough
available CPU cycles - there's still too much work to
do, even with no cycles spent on debugging output.

What I didn't see very often in my thread stacks were
stack frames from tcmalloc.  This doesn't make sense
to me if the memory allocation subsystem is the root
cause of my problem, but makes perfect sense if there's
not enough CPU cycles: not so much time is spent
deallocating memory, so it is caught in the act less
often by LockOrAbort.

What finally seemed to help at avoiding missed heartbeats
in my configuration was the following combination:
turning off debugging, running with these throttling paramters:
         osd client message size cap = 14000000
         client oc size =              14000000
         client oc max dirty =         35000000
         filestore queue max bytes =   35000000
         journal queue max bytes =     35000000
         ms dispatch throttle bytes =  14000000
         objector inflight op bytes =  35000000
and using a 512k stripe width.

Evidently keeping a relatively small amount of data in
flight, in smaller chunks, allowed heartbeat processing to
hit its mark more often.  But, it only delayed things, it
didn't solve the problem.  This makes sense to me if the
root cause is that I don't have enough CPU cycles available
per OSD, because I didn't change the offered load.

So, in the short term I guess I need to run fewer cosd
instances per server.

If my analysis above is correct, do you think anything
can be gained by running the heartbeat and heartbeat
dispatcher threads as SCHED_RR threads?  Since tick() runs
heartbeat_check(), that would also need to be SCHED_RR,
or the heartbeats could arrive on time, but not checked
until it was too late.

Anyway, please let me know what you think of the above.

Thanks -- Jim

> 
> sage
> 
> 


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-04-08 16:23       ` Jim Schutt
@ 2011-04-08 20:50         ` Sage Weil
  2011-04-08 22:11           ` Jim Schutt
  0 siblings, 1 reply; 94+ messages in thread
From: Sage Weil @ 2011-04-08 20:50 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Fri, 8 Apr 2011, Jim Schutt wrote:
> Hi Sage,
> 
> Sage Weil wrote:
> > On Wed, 16 Feb 2011, Jim Schutt wrote:
> > > On Wed, 2011-02-16 at 14:40 -0700, Gregory Farnum wrote:
> > > > On Wednesday, February 16, 2011 at 1:25 PM, Jim Schutt wrote:
> > > > > Hi,
> > > > > 
> > > > > I've been testing v0.24.3 w/ 64 clients against
> > > > > 1 mon, 1 mds, 96 osds. Under heavy write load I
> > > > > see:
> > > > >  [WRN] map e7 wrongly marked me down or wrong addr
> > > > > 
> > > > > I was able to sort through the logs and discover that when
> > > > > this happens I have large gaps (10 seconds or more) in osd
> > > > > heatbeat processing. In those heartbeat gaps I've discovered
> > > > > long periods (5-15 seconds) where an osd logs nothing, even
> > > > > though I am running with debug osd/filestore/journal = 20.
> > > > > 
> > > > > Is this a known issue?
> > > > You're running on btrfs? 
> > > Yep.
> > 
> > Are the cosd log files on the same btrfs volume as the btrfs data, or
> > elsewhere?  The heartbeat thread takes some pains to avoid any locks that
> > may be contented and do avoid any disk io, so in theory a btrfs stall
> > shouldn't affect anything.  We may have missed something.. do you have a log
> > showing this in action?
> 
> In the end, after all the various things I've tried, I think
> that the root cause of this is relatively simple: I don't
> have enough CPU cycles available on my servers to do the
> amount of OSD processing required to service my client
> load, given the number of OSDs per server I'm running.
> 
> With too much work and not enough cycles to do it, the
> one real-time component of Ceph, heartbeat processing,
> eventually must miss its deadline (no heartbeat "observed"
> in osd_heartbeat_grace seconds), since it requires work
> done by components (messengers, memory allocation system)
> that don't provide real-time guarantees.
> 
> All of my experiences on this make perfect sense when
> viewed from this perspective.
> 
> For example, when working with tcmalloc, I learned I
> could compile it with CXXFLAGS=-DTCMALLOC_LARGE_PAGES,
> which causes tcmalloc to allow objects up to 256k in
> its thread caches, rather than the default 32k.  So I
> used that in combination with a 256k stripe width, on
> the theory that deallocating messages would mostly only
> interact with the thread cache, but it didn't help.
> 
> When looking at thread stacks generated by my
> Mutex::LockOrAbort trick with a 5 sec wait to acquire
> the pipe_lock, I often saw threads waiting on the
> DoutLocker mutex.  Since lots of Ceph debugging output
> happens with other locks being held, debugging might
> thus slow things down out of proportion to the processing
> required to generate the log messages.  Yet, when I
> configured no debugging, I saw no improvement; it might
> be that things got a little worse.  This now makes sense
> to me in light of my above hypothesis about not enough
> available CPU cycles - there's still too much work to
> do, even with no cycles spent on debugging output.
> 
> What I didn't see very often in my thread stacks were
> stack frames from tcmalloc.  This doesn't make sense
> to me if the memory allocation subsystem is the root
> cause of my problem, but makes perfect sense if there's
> not enough CPU cycles: not so much time is spent
> deallocating memory, so it is caught in the act less
> often by LockOrAbort.
> 
> What finally seemed to help at avoiding missed heartbeats
> in my configuration was the following combination:
> turning off debugging, running with these throttling paramters:
>         osd client message size cap = 14000000
>         client oc size =              14000000
>         client oc max dirty =         35000000
>         filestore queue max bytes =   35000000
>         journal queue max bytes =     35000000
>         ms dispatch throttle bytes =  14000000
>         objector inflight op bytes =  35000000
> and using a 512k stripe width.
> 
> Evidently keeping a relatively small amount of data in
> flight, in smaller chunks, allowed heartbeat processing to
> hit its mark more often.  But, it only delayed things, it
> didn't solve the problem.  This makes sense to me if the
> root cause is that I don't have enough CPU cycles available
> per OSD, because I didn't change the offered load.
> 
> So, in the short term I guess I need to run fewer cosd
> instances per server.

There is one other thing to look at, and that's the number of threads used 
by each cosd process.  Have you tried setting

	osd op threads = 1

(or even 0, although I haven't tested that recently).  That will limit the 
number of concurrent IOs in flight to the fs.  Setting it to 0 will avoid 
using a thread pool at all and will process the IO in the message dispatch 
thread (though we haven't tested that recently so there may be issues).

I would also be interested in seeing a system level profile (oprofile?) to 
see where CPU time is being spent.  There are likely low hanging fruit in 
the OSD that would reduce CPU overhead.

I guess the other thing that would help to confirm this is to just halve 
the number of OSDs on your machines in a test and see if the problem goes 
away.

> If my analysis above is correct, do you think anything
> can be gained by running the heartbeat and heartbeat
> dispatcher threads as SCHED_RR threads?  Since tick() runs
> heartbeat_check(), that would also need to be SCHED_RR,
> or the heartbeats could arrive on time, but not checked
> until it was too late.

That sounds worth trying.  I don't care much about the tick() thread, 
though... if the machine is loady and we can't check heartbeats that is at 
least fail-safe.  And hopefully other nodes are able to catch the slow 
guy.

In the meantime, it may also be prudent for us to lower our queue size 
thresholds.  The current numbers were all pulled out of a hat (100MB? 
Sure!).

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-04-08 20:50         ` Sage Weil
@ 2011-04-08 22:11           ` Jim Schutt
  2011-04-08 23:10             ` Colin McCabe
                               ` (2 more replies)
  0 siblings, 3 replies; 94+ messages in thread
From: Jim Schutt @ 2011-04-08 22:11 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

Sage Weil wrote:
> On Fri, 8 Apr 2011, Jim Schutt wrote:
>> Hi Sage,
>>
>> Sage Weil wrote:
>>> On Wed, 16 Feb 2011, Jim Schutt wrote:
>>>> On Wed, 2011-02-16 at 14:40 -0700, Gregory Farnum wrote:
>>>>> On Wednesday, February 16, 2011 at 1:25 PM, Jim Schutt wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've been testing v0.24.3 w/ 64 clients against
>>>>>> 1 mon, 1 mds, 96 osds. Under heavy write load I
>>>>>> see:
>>>>>>  [WRN] map e7 wrongly marked me down or wrong addr
>>>>>>
>>>>>> I was able to sort through the logs and discover that when
>>>>>> this happens I have large gaps (10 seconds or more) in osd
>>>>>> heatbeat processing. In those heartbeat gaps I've discovered
>>>>>> long periods (5-15 seconds) where an osd logs nothing, even
>>>>>> though I am running with debug osd/filestore/journal = 20.
>>>>>>
>>>>>> Is this a known issue?
>>>>> You're running on btrfs? 
>>>> Yep.
>>> Are the cosd log files on the same btrfs volume as the btrfs data, or
>>> elsewhere?  The heartbeat thread takes some pains to avoid any locks that
>>> may be contented and do avoid any disk io, so in theory a btrfs stall
>>> shouldn't affect anything.  We may have missed something.. do you have a log
>>> showing this in action?
>> In the end, after all the various things I've tried, I think
>> that the root cause of this is relatively simple: I don't
>> have enough CPU cycles available on my servers to do the
>> amount of OSD processing required to service my client
>> load, given the number of OSDs per server I'm running.
>>
>> With too much work and not enough cycles to do it, the
>> one real-time component of Ceph, heartbeat processing,
>> eventually must miss its deadline (no heartbeat "observed"
>> in osd_heartbeat_grace seconds), since it requires work
>> done by components (messengers, memory allocation system)
>> that don't provide real-time guarantees.
>>
>> All of my experiences on this make perfect sense when
>> viewed from this perspective.
>>
>> For example, when working with tcmalloc, I learned I
>> could compile it with CXXFLAGS=-DTCMALLOC_LARGE_PAGES,
>> which causes tcmalloc to allow objects up to 256k in
>> its thread caches, rather than the default 32k.  So I
>> used that in combination with a 256k stripe width, on
>> the theory that deallocating messages would mostly only
>> interact with the thread cache, but it didn't help.
>>
>> When looking at thread stacks generated by my
>> Mutex::LockOrAbort trick with a 5 sec wait to acquire
>> the pipe_lock, I often saw threads waiting on the
>> DoutLocker mutex.  Since lots of Ceph debugging output
>> happens with other locks being held, debugging might
>> thus slow things down out of proportion to the processing
>> required to generate the log messages.  Yet, when I
>> configured no debugging, I saw no improvement; it might
>> be that things got a little worse.  This now makes sense
>> to me in light of my above hypothesis about not enough
>> available CPU cycles - there's still too much work to
>> do, even with no cycles spent on debugging output.
>>
>> What I didn't see very often in my thread stacks were
>> stack frames from tcmalloc.  This doesn't make sense
>> to me if the memory allocation subsystem is the root
>> cause of my problem, but makes perfect sense if there's
>> not enough CPU cycles: not so much time is spent
>> deallocating memory, so it is caught in the act less
>> often by LockOrAbort.
>>
>> What finally seemed to help at avoiding missed heartbeats
>> in my configuration was the following combination:
>> turning off debugging, running with these throttling paramters:
>>         osd client message size cap = 14000000
>>         client oc size =              14000000
>>         client oc max dirty =         35000000
>>         filestore queue max bytes =   35000000
>>         journal queue max bytes =     35000000
>>         ms dispatch throttle bytes =  14000000
>>         objector inflight op bytes =  35000000
>> and using a 512k stripe width.
>>
>> Evidently keeping a relatively small amount of data in
>> flight, in smaller chunks, allowed heartbeat processing to
>> hit its mark more often.  But, it only delayed things, it
>> didn't solve the problem.  This makes sense to me if the
>> root cause is that I don't have enough CPU cycles available
>> per OSD, because I didn't change the offered load.
>>
>> So, in the short term I guess I need to run fewer cosd
>> instances per server.
> 
> There is one other thing to look at, and that's the number of threads used 
> by each cosd process.  Have you tried setting
> 
> 	osd op threads = 1
> 
> (or even 0, although I haven't tested that recently).  That will limit the 
> number of concurrent IOs in flight to the fs.  Setting it to 0 will avoid 
> using a thread pool at all and will process the IO in the message dispatch 
> thread (though we haven't tested that recently so there may be issues).

I'll try this 2nd, since it's easy.

> 
> I would also be interested in seeing a system level profile (oprofile?) to 
> see where CPU time is being spent.  There are likely low hanging fruit in 
> the OSD that would reduce CPU overhead.

This will take me a little while, since I need to learn
about the tools.  But since I need to learn about them
anyway, that's a good thing.

> 
> I guess the other thing that would help to confirm this is to just halve 
> the number of OSDs on your machines in a test and see if the problem goes 
> away.

I was going to try this first, exactly because it seems like
a definitive test.

> 
>> If my analysis above is correct, do you think anything
>> can be gained by running the heartbeat and heartbeat
>> dispatcher threads as SCHED_RR threads?  Since tick() runs
>> heartbeat_check(), that would also need to be SCHED_RR,
>> or the heartbeats could arrive on time, but not checked
>> until it was too late.
> 
> That sounds worth trying.  I don't care much about the tick() thread, 
> though... if the machine is loady and we can't check heartbeats that is at 
> least fail-safe.  And hopefully other nodes are able to catch the slow 
> guy.

The reason I mentioned tick(), though, is that I was hoping to avoid
causing extra work checking PGs (and maybe starting on reduplication,
depending on how long things drag on?), during the interval until an
OSD notices that it was marked down.  I.e. I want to avoid triggering
a cascade of erroneous failure diagnoses, because I worry about what
happens if there's a real hardware failure in that window.  E.g., I
don't yet understand all the algorithms enough to predict what will
happen if the OSD holding the primary copy of some object is
erroneously marked down, and then a disk for the OSD holding
the replica fails.  It seems like there has to be a window there
until the primary OSD is marked up again where read requests on
such objects cannot complete, assuming 2-way replication?

> 
> In the meantime, it may also be prudent for us to lower our queue size 
> thresholds.  The current numbers were all pulled out of a hat (100MB? 
> Sure!).

:)

FWIW, I wanted to check my reasoning for "ms dispatch throttle bytes"
with you.  I thought I found in the code it was used in the rep op
dispatcher thread, so since I am doing 2-way replication, on average
every OSD should see one rep op for each client write op it sees.
According to that theory, I should set
   (ms dispatch throttle bytes) = (replication level - 1)*(osd client message size cap)
Does that sound right to you?

Thanks -- Jim

> 
> sage
> 
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-04-08 22:11           ` Jim Schutt
@ 2011-04-08 23:10             ` Colin McCabe
  2011-04-11 14:41               ` Jim Schutt
  2011-04-11 20:14             ` Jim Schutt
  2011-04-11 21:18             ` Jim Schutt
  2 siblings, 1 reply; 94+ messages in thread
From: Colin McCabe @ 2011-04-08 23:10 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Sage Weil, Gregory Farnum, ceph-devel

On Fri, Apr 8, 2011 at 3:11 PM, Jim Schutt <jaschut@sandia.gov> wrote:
> Sage Weil wrote:
>
>>
>> I would also be interested in seeing a system level profile (oprofile?) to
>> see where CPU time is being spent.  There are likely low hanging fruit in
>> the OSD that would reduce CPU overhead.
>
> This will take me a little while, since I need to learn
> about the tools.  But since I need to learn about them
> anyway, that's a good thing.

oprofile is surprisingly easy to get started with. We have a wiki page about it:

http://ceph.newdream.net/wiki/Cpu_profiling

>
>>
>> I guess the other thing that would help to confirm this is to just halve
>> the number of OSDs on your machines in a test and see if the problem goes
>> away.
>
> I was going to try this first, exactly because it seems like
> a definitive test.
>
>>
>>> If my analysis above is correct, do you think anything
>>> can be gained by running the heartbeat and heartbeat
>>> dispatcher threads as SCHED_RR threads?  Since tick() runs
>>> heartbeat_check(), that would also need to be SCHED_RR,
>>> or the heartbeats could arrive on time, but not checked
>>> until it was too late.

Thanks for the ideas. However, I doubt that making the OSD::tick()
thread SCHED_RR would really work.

The OSD::tick() code is taking locks all over the place. Since a bunch
of other threads besides the tick thread can be holding those locks,
this would soon result in priority inversion. Not to mention,
heartbeat_messenger has its own thread(s) which actually perform the
work of sending the heartbeat messages.

cheers,
Colin
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-04-08 23:10             ` Colin McCabe
@ 2011-04-11 14:41               ` Jim Schutt
  2011-04-11 16:25                 ` Sage Weil
  0 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-04-11 14:41 UTC (permalink / raw)
  To: Colin McCabe; +Cc: Sage Weil, Gregory Farnum, ceph-devel

Colin McCabe wrote:
> On Fri, Apr 8, 2011 at 3:11 PM, Jim Schutt <jaschut@sandia.gov> wrote:
>> Sage Weil wrote:
>>
>>> I would also be interested in seeing a system level profile (oprofile?) to
>>> see where CPU time is being spent.  There are likely low hanging fruit in
>>> the OSD that would reduce CPU overhead.
>> This will take me a little while, since I need to learn
>> about the tools.  But since I need to learn about them
>> anyway, that's a good thing.
> 
> oprofile is surprisingly easy to get started with. We have a wiki page about it:
> 
> http://ceph.newdream.net/wiki/Cpu_profiling

Cool, thanks.

> 
>>> I guess the other thing that would help to confirm this is to just halve
>>> the number of OSDs on your machines in a test and see if the problem goes
>>> away.
>> I was going to try this first, exactly because it seems like
>> a definitive test.
>>
>>>> If my analysis above is correct, do you think anything
>>>> can be gained by running the heartbeat and heartbeat
>>>> dispatcher threads as SCHED_RR threads?  Since tick() runs
>>>> heartbeat_check(), that would also need to be SCHED_RR,
>>>> or the heartbeats could arrive on time, but not checked
>>>> until it was too late.
> 
> Thanks for the ideas. However, I doubt that making the OSD::tick()
> thread SCHED_RR would really work.
> 
> The OSD::tick() code is taking locks all over the place. Since a bunch
> of other threads besides the tick thread can be holding those locks,
> this would soon result in priority inversion. Not to mention,
> heartbeat_messenger has its own thread(s) which actually perform the
> work of sending the heartbeat messages.

Yes, I think I understand.

-- Jim

> 
> cheers,
> Colin
> 
> 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-04-11 14:41               ` Jim Schutt
@ 2011-04-11 16:25                 ` Sage Weil
  0 siblings, 0 replies; 94+ messages in thread
From: Sage Weil @ 2011-04-11 16:25 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Colin McCabe, Gregory Farnum, ceph-devel

On Mon, 11 Apr 2011, Jim Schutt wrote:
> > > > I guess the other thing that would help to confirm this is to just halve
> > > > the number of OSDs on your machines in a test and see if the problem
> > > > goes
> > > > away.
> > > I was going to try this first, exactly because it seems like
> > > a definitive test.
> > > 
> > > > > If my analysis above is correct, do you think anything
> > > > > can be gained by running the heartbeat and heartbeat
> > > > > dispatcher threads as SCHED_RR threads?  Since tick() runs
> > > > > heartbeat_check(), that would also need to be SCHED_RR,
> > > > > or the heartbeats could arrive on time, but not checked
> > > > > until it was too late.
> > 
> > Thanks for the ideas. However, I doubt that making the OSD::tick()
> > thread SCHED_RR would really work.
> > 
> > The OSD::tick() code is taking locks all over the place. Since a bunch
> > of other threads besides the tick thread can be holding those locks,
> > this would soon result in priority inversion. Not to mention,
> > heartbeat_messenger has its own thread(s) which actually perform the
> > work of sending the heartbeat messages.
> 
> Yes, I think I understand.

We could set the priority for those threads as well, but I'm not sure that 
really addresses the problem: we may end up with a situation where cosd is 
responding to heartbeats but not doing useful work.  At some point you 
have to consider highly degraded service a failure.

Let's see if we can fix it without adjusting priorities first!

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-04-08 22:11           ` Jim Schutt
  2011-04-08 23:10             ` Colin McCabe
@ 2011-04-11 20:14             ` Jim Schutt
  2011-04-11 21:18             ` Jim Schutt
  2 siblings, 0 replies; 94+ messages in thread
From: Jim Schutt @ 2011-04-11 20:14 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Sage Weil, Gregory Farnum, ceph-devel

Jim Schutt wrote:
> Sage Weil wrote:

> 
>>
>> I guess the other thing that would help to confirm this is to just 
>> halve the number of OSDs on your machines in a test and see if the 
>> problem goes away.
> 
> I was going to try this first, exactly because it seems like
> a definitive test.
> 

FWIW, I've done some testing on a file system using 48 OSDs
rather than 96.

With the 96-OSD version of this test (12 servers, 8 OSD/server),
with 64 clients writing a total of 128 GiB data, I usually see
multiple instances (5-6, or more, is common) of OSDs getting
marked down, noticing they were wrongly marked down, and coming back.

With the 48-OSD version of the file system (12 servers, 4 OSD/server)
I ran multiple tests, totaling several TiB data, and experienced
exactly one instance on an OSD being wrongly marked down.

-- Jim


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-04-08 22:11           ` Jim Schutt
  2011-04-08 23:10             ` Colin McCabe
  2011-04-11 20:14             ` Jim Schutt
@ 2011-04-11 21:18             ` Jim Schutt
  2011-04-11 23:23               ` Sage Weil
  2 siblings, 1 reply; 94+ messages in thread
From: Jim Schutt @ 2011-04-11 21:18 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Sage Weil, Gregory Farnum, ceph-devel

Jim Schutt wrote:
> Sage Weil wrote:
>> On Fri, 8 Apr 2011, Jim Schutt wrote:

>>> So, in the short term I guess I need to run fewer cosd
>>> instances per server.
>>
>> There is one other thing to look at, and that's the number of threads 
>> used by each cosd process.  Have you tried setting
>>
>>     osd op threads = 1
>>
>> (or even 0, although I haven't tested that recently).  That will limit 
>> the number of concurrent IOs in flight to the fs.  Setting it to 0 
>> will avoid using a thread pool at all and will process the IO in the 
>> message dispatch thread (though we haven't tested that recently so 
>> there may be issues).
> 
> I'll try this 2nd, since it's easy.
> 
      osd op threads = 0

didn't work for me at all - 20 of 96 OSDs aborted almost
immediately after startup.

      osd op threads = 1

didn't work very well either - one of my servers went OOM,
which hasn't happened since I started using my restricted
buffering parameters.

It really does seem like I'm just trying to do too much
work on each server.  If I back off to 4 OSDs/server on
my  hardware, there's a few percent idle cycles, making
interacting with it much more pleasant.

-- Jim


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: cosd multi-second stalls cause "wrongly marked me down"
  2011-04-11 21:18             ` Jim Schutt
@ 2011-04-11 23:23               ` Sage Weil
  0 siblings, 0 replies; 94+ messages in thread
From: Sage Weil @ 2011-04-11 23:23 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Gregory Farnum, ceph-devel

On Mon, 11 Apr 2011, Jim Schutt wrote:
> Jim Schutt wrote:
> > Sage Weil wrote:
> > > On Fri, 8 Apr 2011, Jim Schutt wrote:
> 
> > > > So, in the short term I guess I need to run fewer cosd
> > > > instances per server.
> > > 
> > > There is one other thing to look at, and that's the number of threads used
> > > by each cosd process.  Have you tried setting
> > > 
> > >     osd op threads = 1
> > > 
> > > (or even 0, although I haven't tested that recently).  That will limit the
> > > number of concurrent IOs in flight to the fs.  Setting it to 0 will avoid
> > > using a thread pool at all and will process the IO in the message dispatch
> > > thread (though we haven't tested that recently so there may be issues).
> > 
> > I'll try this 2nd, since it's easy.
> > 
>      osd op threads = 0
> 
> didn't work for me at all - 20 of 96 OSDs aborted almost
> immediately after startup.
> 
>      osd op threads = 1
> 
> didn't work very well either - one of my servers went OOM,
> which hasn't happened since I started using my restricted
> buffering parameters.

Debugging this turned up a refcounting leak that meant PGs were never 
freed.  That's fixed (and osd op threads = 0 and 1 work in my limited 
tests), but there may be other PG lifecycle related issues now that 
refcounting actually works.  :)

sage

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2011-04-11 23:19 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-16 21:25 cosd multi-second stalls cause "wrongly marked me down" Jim Schutt
2011-02-16 21:37 ` Wido den Hollander
2011-02-16 21:51   ` Jim Schutt
2011-02-16 21:40 ` Gregory Farnum
2011-02-16 21:50   ` Jim Schutt
2011-02-17  0:50     ` Sage Weil
2011-02-17  0:54       ` Sage Weil
2011-02-17 15:46         ` Jim Schutt
2011-02-17 16:11           ` Sage Weil
2011-02-17 23:31             ` Jim Schutt
2011-02-18  7:13               ` Sage Weil
2011-02-18 17:04                 ` Jim Schutt
2011-02-18 17:15                 ` Gregory Farnum
2011-02-18 18:41                 ` Jim Schutt
2011-02-18 19:07                 ` Colin McCabe
2011-02-18 20:48                   ` Jim Schutt
2011-02-18 20:58                     ` Sage Weil
2011-02-18 21:09                       ` Jim Schutt
2011-03-09 16:02               ` Jim Schutt
2011-03-09 17:07                 ` Gregory Farnum
2011-03-09 18:36                   ` Jim Schutt
2011-03-09 19:37                     ` Gregory Farnum
2011-03-10 23:09                       ` Jim Schutt
2011-03-10 23:21                         ` Sage Weil
2011-03-10 23:32                           ` Jim Schutt
2011-03-10 23:40                             ` Sage Weil
2011-03-11 14:51                               ` Jim Schutt
2011-03-11 18:26                               ` Jim Schutt
2011-03-11 18:37                                 ` Jim Schutt
2011-03-11 18:37                                 ` Sage Weil
2011-03-11 18:51                                   ` Jim Schutt
2011-03-11 19:09                                     ` Gregory Farnum
2011-03-11 19:13                                       ` Yehuda Sadeh Weinraub
2011-03-11 19:17                                         ` Yehuda Sadeh Weinraub
2011-03-11 19:16                                       ` Jim Schutt
2011-03-11 21:13                                   ` Jim Schutt
2011-03-11 21:37                                     ` Sage Weil
2011-03-11 22:21                                       ` Jim Schutt
2011-03-11 22:26                                         ` Jim Schutt
2011-03-11 22:45                                           ` Sage Weil
2011-03-11 23:29                                             ` Jim Schutt
2011-03-30 21:26                                       ` Jim Schutt
2011-03-30 21:55                                         ` Sage Weil
2011-03-31 14:16                                           ` Jim Schutt
2011-03-31 16:25                                             ` Sage Weil
2011-03-31 17:00                                               ` Jim Schutt
2011-03-31 17:10                                                 ` Jim Schutt
2011-03-31 17:24                                                   ` Sage Weil
2011-03-31 18:08                                                     ` Jim Schutt
2011-03-31 18:41                                                       ` Sage Weil
2011-04-01 22:38                                                         ` Jim Schutt
2011-02-23 17:52             ` Jim Schutt
2011-02-23 18:12               ` Gregory Farnum
2011-02-23 18:54                 ` Sage Weil
2011-02-23 19:12                   ` Gregory Farnum
2011-02-23 19:23                 ` Jim Schutt
2011-02-23 20:27                   ` Gregory Farnum
2011-03-02  0:53                   ` Sage Weil
2011-03-02 15:21                     ` Jim Schutt
2011-03-02 17:10                       ` Sage Weil
2011-03-02 20:54                         ` Jim Schutt
2011-03-02 21:45                           ` Sage Weil
2011-03-02 21:59                             ` Jim Schutt
2011-03-02 22:57                               ` Jim Schutt
2011-03-02 23:20                                 ` Gregory Farnum
2011-03-02 23:25                                   ` Jim Schutt
2011-03-02 23:33                                     ` Gregory Farnum
2011-03-03  2:26                                 ` Colin McCabe
2011-03-03 20:03                                   ` Jim Schutt
2011-03-03 20:47                                     ` Jim Schutt
2011-03-03 20:55                                       ` Yehuda Sadeh Weinraub
2011-03-03 21:45                                         ` Jim Schutt
2011-03-03 22:22                                           ` Sage Weil
2011-03-03 22:34                                             ` Jim Schutt
2011-03-03 21:53                                         ` Colin McCabe
2011-03-03 23:06                                           ` Jim Schutt
2011-03-03 23:30                                             ` Colin McCabe
2011-03-03 23:37                                               ` Jim Schutt
2011-03-03  5:03                                 ` Sage Weil
2011-03-03 16:35                                   ` Jim Schutt
2011-03-03 17:28                                   ` Jim Schutt
2011-03-03 18:04                                     ` Sage Weil
2011-03-03 18:42                                       ` Jim Schutt
2011-03-03 18:51                                         ` Sage Weil
2011-03-03 19:39                                           ` Jim Schutt
2011-04-08 16:23       ` Jim Schutt
2011-04-08 20:50         ` Sage Weil
2011-04-08 22:11           ` Jim Schutt
2011-04-08 23:10             ` Colin McCabe
2011-04-11 14:41               ` Jim Schutt
2011-04-11 16:25                 ` Sage Weil
2011-04-11 20:14             ` Jim Schutt
2011-04-11 21:18             ` Jim Schutt
2011-04-11 23:23               ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.