All of lore.kernel.org
 help / color / mirror / Atom feed
* strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
@ 2015-04-22  8:40 Alexandre DERUMIER
       [not found] ` <1232683114.515741917.1429692005478.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-22  8:40 UTC (permalink / raw)
  To: ceph-devel, ceph-users

Hi,

I was doing some benchmarks,
I have found an strange behaviour.

Using fio with rbd engine, I was able to reach around 100k iops.
(osd datas in linux buffer, iostat show 0% disk access)

then after restarting all osd daemons,

the same fio benchmark show now around 300k iops.
(osd datas in linux buffer, iostat show 0% disk access)


any ideas?




before restarting osd
---------------------
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
...
fio-2.2.7-10-g51e9
Starting 10 processes
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 iops] [eta 14m:45s]
fio: terminating on signal 2

rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 22 10:00:04 2015
  read : io=11558MB, bw=451487KB/s, iops=112871, runt= 26215msec
    slat (usec): min=5, max=3685, avg=16.89, stdev=17.38
    clat (usec): min=5, max=62584, avg=2695.80, stdev=5351.23
     lat (usec): min=109, max=62598, avg=2712.68, stdev=5350.42
    clat percentiles (usec):
     |  1.00th=[  155],  5.00th=[  183], 10.00th=[  205], 20.00th=[  247],
     | 30.00th=[  294], 40.00th=[  354], 50.00th=[  446], 60.00th=[  660],
     | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
     | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
     | 99.99th=[47360]
    bw (KB  /s): min=23928, max=154416, per=10.07%, avg=45462.82, stdev=28809.95
    lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=20.79%
    lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03%
    lat (msec) : 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37%
    lat (msec) : 100=0.01%
  cpu          : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710
  IO depths    : 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, >=64=0.0%
     issued    : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, mint=26215msec, maxt=26215msec

Disk stats (read/write):
  sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
[root@ceph1-3 fiorbd]# ./fio fiorbd
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32




AFTER RESTARTING OSDS
----------------------
[root@ceph1-3 fiorbd]# ./fio fiorbd
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
...
fio-2.2.7-10-g51e9
Starting 10 processes
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 iops] [eta 01h:09m:27s]
fio: terminating on signal 2

rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 22 10:02:28 2015
  read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt=  7389msec
    slat (usec): min=5, max=3406, avg=26.59, stdev=40.35
    clat (usec): min=8, max=684328, avg=930.43, stdev=6419.12
     lat (usec): min=154, max=684342, avg=957.02, stdev=6419.28
    clat percentiles (usec):
     |  1.00th=[  243],  5.00th=[  314], 10.00th=[  366], 20.00th=[  450],
     | 30.00th=[  524], 40.00th=[  604], 50.00th=[  692], 60.00th=[  796],
     | 70.00th=[  924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
     | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
     | 99.99th=[436224]
    bw (KB  /s): min=32614, max=143160, per=10.19%, avg=108076.46, stdev=28263.82
    lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=1.23%
    lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84%
    lat (msec) : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01%
    lat (msec) : 250=0.01%, 500=0.02%, 750=0.01%
  cpu          : usr=44.06%, sys=11.26%, ctx=642620, majf=0, minf=6832
  IO depths    : 1=0.1%, 2=0.5%, 4=2.0%, 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0%
     issued    : total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s, mint=7389msec, maxt=7389msec

Disk stats (read/write):
  sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%




CEPH LOG
--------

before restarting osd
----------------------

2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 298 MB/s rd, 76465 op/s
2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 333 MB/s rd, 85355 op/s
2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 343 MB/s rd, 87932 op/s
2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 328 MB/s rd, 84151 op/s
2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 237 MB/s rd, 60855 op/s
2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 144 MB/s rd, 36935 op/s
2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 321 MB/s rd, 82334 op/s
2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 368 MB/s rd, 94211 op/s
2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 244 MB/s rd, 62644 op/s
2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 175 MB/s rd, 44997 op/s
2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 122 MB/s rd, 31259 op/s
2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229 MB/s rd, 58674 op/s
2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 271 MB/s rd, 69501 op/s
2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 211 MB/s rd, 54020 op/s
2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 164 MB/s rd, 42001 op/s
2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 134 MB/s rd, 34380 op/s
2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 293 MB/s rd, 75213 op/s
2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 337 MB/s rd, 86353 op/s
2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229 MB/s rd, 58839 op/s
2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 152 MB/s rd, 39117 op/s


restarting osd
---------------

2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster [INF] osd.0 marked itself down
2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster [INF] osdmap e849: 9 osds: 8 up, 9 in
2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 kB/s rd, 130 op/s
2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster [INF] osdmap e850: 9 osds: 8 up, 9 in
2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster [INF] osd.1 marked itself down
2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster [INF] osdmap e851: 9 osds: 7 up, 9 in
2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster [INF] osdmap e852: 9 osds: 7 up, 9 in
2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster [INF] osd.2 marked itself down
2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster [INF] osdmap e853: 9 osds: 6 up, 9 in
2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster [INF] osdmap e854: 9 osds: 6 up, 9 in
2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster [INF] osd.1 10.7.0.152:6800/14848 boot
2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster [INF] osdmap e855: 9 osds: 7 up, 9 in
2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster [INF] osd.3 marked itself down
2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster [INF] osdmap e856: 9 osds: 6 up, 9 in
2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster [INF] osd.6 marked itself down
2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster [INF] osd.2 10.7.0.152:6812/15410 boot
2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster [INF] osdmap e857: 9 osds: 6 up, 9 in
2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster [INF] osdmap e858: 9 osds: 6 up, 9 in
2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster [INF] osd.3 10.7.0.152:6816/15971 boot
2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster [INF] osdmap e859: 9 osds: 7 up, 9 in
2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster [INF] osd.7 marked itself down
2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster [INF] osd.0 10.7.0.152:6804/14481 boot
2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster [INF] osdmap e860: 9 osds: 7 up, 9 in
2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster [INF] osd.6 10.7.0.153:6808/21955 boot
2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster [INF] osdmap e861: 9 osds: 8 up, 9 in
2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 stale+active+remapped, 498 stale+active+clean, 2 peering, 28 active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster [INF] osd.8 marked itself down
2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster [INF] osdmap e862: 9 osds: 7 up, 9 in
2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 stale+active+remapped, 587 stale+active+clean, 2 peering, 11 active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster [INF] osd.7 10.7.0.153:6804/22536 boot
2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster [INF] osdmap e863: 9 osds: 8 up, 9 in
2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 stale+active+remapped, 503 stale+active+clean, 4 active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster [INF] osdmap e864: 9 osds: 8 up, 9 in
2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 stale+active+remapped, 503 stale+active+clean, 4 active+undersized+degraded+remapped, 5 peering, 13 active


AFTER OSD RESTART
------------------


2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 786 MB/s rd, 196 kop/s
2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1578 MB/s rd, 394 kop/s
2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 932 MB/s rd, 233 kop/s
2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 627 MB/s rd, 156 kop/s
2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1034 MB/s rd, 258 kop/s
2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 529 MB/s rd, 132 kop/s
2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 770 MB/s rd, 192 kop/s
2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1358 MB/s rd, 339 kop/s
2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 649 MB/s rd, 162 kop/s
2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 580 MB/s rd, 145 kop/s
2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 962 MB/s rd, 240 kop/s
2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 506 MB/s rd, 126 kop/s
2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 774 MB/s rd, 193 kop/s
2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1363 MB/s rd, 340 kop/s
2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 662 MB/s rd, 165 kop/s
2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 593 MB/s rd, 148 kop/s
2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 938 MB/s rd, 234 kop/s


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found] ` <1232683114.515741917.1429692005478.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2015-04-22  9:01   ` Alexandre DERUMIER
       [not found]     ` <990277439.516588145.1429693262093.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-22  9:01 UTC (permalink / raw)
  To: ceph-devel, ceph-users

I wonder if it could be numa related,

I'm using centos 7.1,
and auto numa balacning is enabled

cat /proc/sys/kernel/numa_balancing   = 1

Maybe osd daemon access to buffer on wrong numa node.

I'll try to reproduce the problem


----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com>
Envoyé: Mercredi 22 Avril 2015 10:40:05
Objet: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi, 

I was doing some benchmarks, 
I have found an strange behaviour. 

Using fio with rbd engine, I was able to reach around 100k iops. 
(osd datas in linux buffer, iostat show 0% disk access) 

then after restarting all osd daemons, 

the same fio benchmark show now around 300k iops. 
(osd datas in linux buffer, iostat show 0% disk access) 


any ideas? 




before restarting osd 
--------------------- 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 
... 
fio-2.2.7-10-g51e9 
Starting 10 processes 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 iops] [eta 14m:45s] 
fio: terminating on signal 2 

rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 22 10:00:04 2015 
read : io=11558MB, bw=451487KB/s, iops=112871, runt= 26215msec 
slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 
clat (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 
lat (usec): min=109, max=62598, avg=2712.68, stdev=5350.42 
clat percentiles (usec): 
| 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
| 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
| 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
| 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
| 99.99th=[47360] 
bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, stdev=28809.95 
lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=20.79% 
lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% 
lat (msec) : 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% 
lat (msec) : 100=0.01% 
cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 
IO depths : 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, >=64=0.0% 
issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, mint=26215msec, maxt=26215msec 

Disk stats (read/write): 
sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
[root@ceph1-3 fiorbd]# ./fio fiorbd 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 




AFTER RESTARTING OSDS 
---------------------- 
[root@ceph1-3 fiorbd]# ./fio fiorbd 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 
... 
fio-2.2.7-10-g51e9 
Starting 10 processes 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 iops] [eta 01h:09m:27s] 
fio: terminating on signal 2 

rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 22 10:02:28 2015 
read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt= 7389msec 
slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
clat (usec): min=8, max=684328, avg=930.43, stdev=6419.12 
lat (usec): min=154, max=684342, avg=957.02, stdev=6419.28 
clat percentiles (usec): 
| 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
| 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
| 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
| 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
| 99.99th=[436224] 
bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, stdev=28263.82 
lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=1.23% 
lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% 
lat (msec) : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% 
lat (msec) : 250=0.01%, 500=0.02%, 750=0.01% 
cpu : usr=44.06%, sys=11.26%, ctx=642620, majf=0, minf=6832 
IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% 
issued : total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 

Disk stats (read/write): 
sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 




CEPH LOG 
-------- 

before restarting osd 
---------------------- 

2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 298 MB/s rd, 76465 op/s 
2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 333 MB/s rd, 85355 op/s 
2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 343 MB/s rd, 87932 op/s 
2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 328 MB/s rd, 84151 op/s 
2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 237 MB/s rd, 60855 op/s 
2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 144 MB/s rd, 36935 op/s 
2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 321 MB/s rd, 82334 op/s 
2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 368 MB/s rd, 94211 op/s 
2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 244 MB/s rd, 62644 op/s 
2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 175 MB/s rd, 44997 op/s 
2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 122 MB/s rd, 31259 op/s 
2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229 MB/s rd, 58674 op/s 
2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 271 MB/s rd, 69501 op/s 
2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 211 MB/s rd, 54020 op/s 
2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 164 MB/s rd, 42001 op/s 
2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 134 MB/s rd, 34380 op/s 
2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 293 MB/s rd, 75213 op/s 
2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 337 MB/s rd, 86353 op/s 
2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229 MB/s rd, 58839 op/s 
2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 152 MB/s rd, 39117 op/s 


restarting osd 
--------------- 

2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster [INF] osd.0 marked itself down 
2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster [INF] osdmap e849: 9 osds: 8 up, 9 in 
2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 kB/s rd, 130 op/s 
2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster [INF] osdmap e850: 9 osds: 8 up, 9 in 
2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster [INF] osd.1 marked itself down 
2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster [INF] osdmap e851: 9 osds: 7 up, 9 in 
2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster [INF] osdmap e852: 9 osds: 7 up, 9 in 
2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster [INF] osd.2 marked itself down 
2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster [INF] osdmap e853: 9 osds: 6 up, 9 in 
2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster [INF] osdmap e854: 9 osds: 6 up, 9 in 
2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster [INF] osd.1 10.7.0.152:6800/14848 boot 
2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster [INF] osdmap e855: 9 osds: 7 up, 9 in 
2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster [INF] osd.3 marked itself down 
2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster [INF] osdmap e856: 9 osds: 6 up, 9 in 
2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster [INF] osd.6 marked itself down 
2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster [INF] osd.2 10.7.0.152:6812/15410 boot 
2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster [INF] osdmap e857: 9 osds: 6 up, 9 in 
2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster [INF] osdmap e858: 9 osds: 6 up, 9 in 
2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster [INF] osd.3 10.7.0.152:6816/15971 boot 
2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster [INF] osdmap e859: 9 osds: 7 up, 9 in 
2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster [INF] osd.7 marked itself down 
2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster [INF] osd.0 10.7.0.152:6804/14481 boot 
2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster [INF] osdmap e860: 9 osds: 7 up, 9 in 
2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster [INF] osd.6 10.7.0.153:6808/21955 boot 
2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster [INF] osdmap e861: 9 osds: 8 up, 9 in 
2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 stale+active+remapped, 498 stale+active+clean, 2 peering, 28 active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster [INF] osd.8 marked itself down 
2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster [INF] osdmap e862: 9 osds: 7 up, 9 in 
2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 stale+active+remapped, 587 stale+active+clean, 2 peering, 11 active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster [INF] osd.7 10.7.0.153:6804/22536 boot 
2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster [INF] osdmap e863: 9 osds: 8 up, 9 in 
2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 stale+active+remapped, 503 stale+active+clean, 4 active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster [INF] osdmap e864: 9 osds: 8 up, 9 in 
2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 stale+active+remapped, 503 stale+active+clean, 4 active+undersized+degraded+remapped, 5 peering, 13 active 


AFTER OSD RESTART 
------------------ 


2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 786 MB/s rd, 196 kop/s 
2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1578 MB/s rd, 394 kop/s 
2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 932 MB/s rd, 233 kop/s 
2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 627 MB/s rd, 156 kop/s 
2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1034 MB/s rd, 258 kop/s 
2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 529 MB/s rd, 132 kop/s 
2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 770 MB/s rd, 192 kop/s 
2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1358 MB/s rd, 339 kop/s 
2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 649 MB/s rd, 162 kop/s 
2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 580 MB/s rd, 145 kop/s 
2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 962 MB/s rd, 240 kop/s 
2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 506 MB/s rd, 126 kop/s 
2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 774 MB/s rd, 193 kop/s 
2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1363 MB/s rd, 340 kop/s 
2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 662 MB/s rd, 165 kop/s 
2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 593 MB/s rd, 148 kop/s 
2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 938 MB/s rd, 234 kop/s 

_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]     ` <990277439.516588145.1429693262093.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2015-04-22 10:54       ` Milosz Tanski
  2015-04-22 13:56         ` [ceph-users] " Alexandre DERUMIER
  0 siblings, 1 reply; 35+ messages in thread
From: Milosz Tanski @ 2015-04-22 10:54 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-devel, ceph-users


[-- Attachment #1.1: Type: text/plain, Size: 24658 bytes --]

On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
wrote:

> I wonder if it could be numa related,
>
> I'm using centos 7.1,
> and auto numa balacning is enabled
>
> cat /proc/sys/kernel/numa_balancing   = 1
>
> Maybe osd daemon access to buffer on wrong numa node.
>
> I'll try to reproduce the problem
>

Can you force the degenerate case using numactl? To either affirm or deny
your suspicion.


>
>
> ----- Mail original -----
> De: "aderumier" <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
> À: "ceph-devel" <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "ceph-users" <
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>
> Envoyé: Mercredi 22 Avril 2015 10:40:05
> Objet: [ceph-users] strange benchmark problem : restarting osd daemon
> improve performance from 100k iops to 300k iops
>
> Hi,
>
> I was doing some benchmarks,
> I have found an strange behaviour.
>
> Using fio with rbd engine, I was able to reach around 100k iops.
> (osd datas in linux buffer, iostat show 0% disk access)
>
> then after restarting all osd daemons,
>
> the same fio benchmark show now around 300k iops.
> (osd datas in linux buffer, iostat show 0% disk access)
>
>
> any ideas?
>
>
>
>
> before restarting osd
> ---------------------
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=rbd, iodepth=32
> ...
> fio-2.2.7-10-g51e9
> Starting 10 processes
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 iops]
> [eta 14m:45s]
> fio: terminating on signal 2
>
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 22
> 10:00:04 2015
> read : io=11558MB, bw=451487KB/s, iops=112871, runt= 26215msec
> slat (usec): min=5, max=3685, avg=16.89, stdev=17.38
> clat (usec): min=5, max=62584, avg=2695.80, stdev=5351.23
> lat (usec): min=109, max=62598, avg=2712.68, stdev=5350.42
> clat percentiles (usec):
> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
> | 99.99th=[47360]
> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, stdev=28809.95
> lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=20.79%
> lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03%
> lat (msec) : 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37%
> lat (msec) : 100=0.01%
> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710
> IO depths : 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, >=64=0.0%
> issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s,
> mint=26215msec, maxt=26215msec
>
> Disk stats (read/write):
> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
> [root@ceph1-3 fiorbd]# ./fio fiorbd
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=rbd, iodepth=32
>
>
>
>
> AFTER RESTARTING OSDS
> ----------------------
> [root@ceph1-3 fiorbd]# ./fio fiorbd
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=rbd, iodepth=32
> ...
> fio-2.2.7-10-g51e9
> Starting 10 processes
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 iops]
> [eta 01h:09m:27s]
> fio: terminating on signal 2
>
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 22
> 10:02:28 2015
> read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt= 7389msec
> slat (usec): min=5, max=3406, avg=26.59, stdev=40.35
> clat (usec): min=8, max=684328, avg=930.43, stdev=6419.12
> lat (usec): min=154, max=684342, avg=957.02, stdev=6419.28
> clat percentiles (usec):
> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
> | 99.99th=[436224]
> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
> stdev=28263.82
> lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=1.23%
> lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84%
> lat (msec) : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01%
> lat (msec) : 250=0.01%, 500=0.02%, 750=0.01%
> cpu : usr=44.06%, sys=11.26%, ctx=642620, majf=0, minf=6832
> IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0%
> issued : total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s,
> mint=7389msec, maxt=7389msec
>
> Disk stats (read/write):
> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>
>
>
>
> CEPH LOG
> --------
>
> before restarting osd
> ----------------------
>
> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster [INF]
> pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 298
> MB/s rd, 76465 op/s
> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster [INF]
> pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 333
> MB/s rd, 85355 op/s
> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster [INF]
> pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 343
> MB/s rd, 87932 op/s
> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster [INF]
> pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 328
> MB/s rd, 84151 op/s
> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster [INF]
> pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 237
> MB/s rd, 60855 op/s
> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster [INF]
> pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 144
> MB/s rd, 36935 op/s
> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster [INF]
> pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 321
> MB/s rd, 82334 op/s
> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster [INF]
> pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 368
> MB/s rd, 94211 op/s
> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster [INF]
> pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 244
> MB/s rd, 62644 op/s
> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster [INF]
> pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 175
> MB/s rd, 44997 op/s
> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster [INF]
> pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 122
> MB/s rd, 31259 op/s
> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster [INF]
> pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229
> MB/s rd, 58674 op/s
> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster [INF]
> pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 271
> MB/s rd, 69501 op/s
> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster [INF]
> pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 211
> MB/s rd, 54020 op/s
> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster [INF]
> pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 164
> MB/s rd, 42001 op/s
> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster [INF]
> pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 134
> MB/s rd, 34380 op/s
> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster [INF]
> pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 293
> MB/s rd, 75213 op/s
> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster [INF]
> pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 337
> MB/s rd, 86353 op/s
> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster [INF]
> pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229
> MB/s rd, 58839 op/s
> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster [INF]
> pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 152
> MB/s rd, 39117 op/s
>
>
> restarting osd
> ---------------
>
> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster [INF]
> osd.0 marked itself down
> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster [INF]
> osdmap e849: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster [INF]
> pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 kB/s
> rd, 130 op/s
> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster [INF]
> osdmap e850: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster [INF]
> pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster [INF]
> pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster [INF]
> osd.1 marked itself down
> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster [INF]
> osdmap e851: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster [INF]
> pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used,
> 874 GB / 1295 GB avail
> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster [INF]
> osdmap e852: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster [INF]
> pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used,
> 874 GB / 1295 GB avail
> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster [INF]
> osd.2 marked itself down
> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster [INF]
> osdmap e853: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster [INF]
> pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used,
> 874 GB / 1295 GB avail
> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster [INF]
> osdmap e854: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster [INF]
> pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used,
> 874 GB / 1295 GB avail
> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster [INF]
> HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck degraded; 64 pgs
> stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; 3/9 in osds are
> down; clock skew detected on mon.ceph1-2
> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster [INF]
> osd.1 10.7.0.152:6800/14848 boot
> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster [INF]
> osdmap e855: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster [INF]
> pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used,
> 874 GB / 1295 GB avail
> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster [INF]
> osd.3 marked itself down
> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster [INF]
> osdmap e856: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster [INF]
> pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used,
> 874 GB / 1295 GB avail
> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster [INF]
> osd.6 marked itself down
> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster [INF]
> osd.2 10.7.0.152:6812/15410 boot
> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster [INF]
> osdmap e857: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster [INF]
> pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used,
> 874 GB / 1295 GB avail
> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster [INF]
> osdmap e858: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster [INF]
> pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used,
> 874 GB / 1295 GB avail
> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster [INF]
> osd.3 10.7.0.152:6816/15971 boot
> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster [INF]
> osdmap e859: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster [INF]
> pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used,
> 874 GB / 1295 GB avail
> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster [INF]
> osd.7 marked itself down
> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster [INF]
> osd.0 10.7.0.152:6804/14481 boot
> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster [INF]
> osdmap e860: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster [INF]
> pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used,
> 874 GB / 1295 GB avail
> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster [INF]
> osd.6 10.7.0.153:6808/21955 boot
> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster [INF]
> osdmap e861: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster [INF]
> pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster [INF]
> osd.8 marked itself down
> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster [INF]
> osdmap e862: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster [INF]
> pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster [INF]
> osd.7 10.7.0.153:6804/22536 boot
> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster [INF]
> osdmap e863: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster [INF]
> pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
> stale+active+remapped, 503 stale+active+clean, 4
> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 397
> active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295
> GB avail
> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster [INF]
> osdmap e864: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster [INF]
> pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
> stale+active+remapped, 503 stale+active+clean, 4
> active+undersized+degraded+remapped, 5 peering, 13 active
>
>
> AFTER OSD RESTART
> ------------------
>
>
> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster [INF]
> pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 786
> MB/s rd, 196 kop/s
> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster [INF]
> pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1578
> MB/s rd, 394 kop/s
> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster [INF]
> pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 932
> MB/s rd, 233 kop/s
> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster [INF]
> pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 627
> MB/s rd, 156 kop/s
> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster [INF]
> pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1034
> MB/s rd, 258 kop/s
> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster [INF]
> pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 529
> MB/s rd, 132 kop/s
> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster [INF]
> pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 770
> MB/s rd, 192 kop/s
> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster [INF]
> pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1358
> MB/s rd, 339 kop/s
> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster [INF]
> pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 649
> MB/s rd, 162 kop/s
> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster [INF]
> pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 580
> MB/s rd, 145 kop/s
> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster [INF]
> pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 962
> MB/s rd, 240 kop/s
> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster [INF]
> pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 506
> MB/s rd, 126 kop/s
> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster [INF]
> pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 774
> MB/s rd, 193 kop/s
> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster [INF]
> pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1363
> MB/s rd, 340 kop/s
> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster [INF]
> pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 662
> MB/s rd, 165 kop/s
> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster [INF]
> pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 593
> MB/s rd, 148 kop/s
> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster [INF]
> pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 active+remapped,
> 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 938
> MB/s rd, 234 kop/s
>
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org

[-- Attachment #1.2: Type: text/html, Size: 30937 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
  2015-04-22 10:54       ` Milosz Tanski
@ 2015-04-22 13:56         ` Alexandre DERUMIER
       [not found]           ` <886679306.527978224.1429710961512.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-22 13:56 UTC (permalink / raw)
  To: Milosz Tanski; +Cc: ceph-devel, ceph-users

Hi,

I have done a lot of test today, and it seem indeed numa related.

My numastat was

# numastat
                           node0           node1
numa_hit                99075422       153976877
numa_miss              167490965         1493663
numa_foreign             1493663       167491417
interleave_hit            157745          167015
local_node              99049179       153830554
other_node             167517697         1639986

So, a lot of miss.

In this case , I can reproduce ios going from 85k to 300k iops, up and down.

now setting
echo 0 > /proc/sys/kernel/numa_balancing

and starting osd daemons with

numactl --interleave=all /usr/bin/ceph-osd


I have a constant 300k iops !


I wonder if it could be improve by binding osd daemons to specific numa node.
I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning.



----- Mail original -----
De: "Milosz Tanski" <milosz@adfin.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com>
Envoyé: Mercredi 22 Avril 2015 12:54:23
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops



On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote: 


I wonder if it could be numa related, 

I'm using centos 7.1, 
and auto numa balacning is enabled 

cat /proc/sys/kernel/numa_balancing = 1 

Maybe osd daemon access to buffer on wrong numa node. 

I'll try to reproduce the problem 



Can you force the degenerate case using numactl? To either affirm or deny your suspicion. 

BQ_BEGIN


----- Mail original ----- 
De: "aderumier" < aderumier@odiso.com > 
À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" < ceph-users@lists.ceph.com > 
Envoyé: Mercredi 22 Avril 2015 10:40:05 
Objet: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

Hi, 

I was doing some benchmarks, 
I have found an strange behaviour. 

Using fio with rbd engine, I was able to reach around 100k iops. 
(osd datas in linux buffer, iostat show 0% disk access) 

then after restarting all osd daemons, 

the same fio benchmark show now around 300k iops. 
(osd datas in linux buffer, iostat show 0% disk access) 


any ideas? 




before restarting osd 
--------------------- 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 
... 
fio-2.2.7-10-g51e9 
Starting 10 processes 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 iops] [eta 14m:45s] 
fio: terminating on signal 2 

rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 22 10:00:04 2015 
read : io=11558MB, bw=451487KB/s, iops=112871, runt= 26215msec 
slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 
clat (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 
lat (usec): min=109, max=62598, avg=2712.68, stdev=5350.42 
clat percentiles (usec): 
| 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
| 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
| 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
| 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
| 99.99th=[47360] 
bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, stdev=28809.95 
lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=20.79% 
lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% 
lat (msec) : 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% 
lat (msec) : 100=0.01% 
cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 
IO depths : 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, >=64=0.0% 
issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, mint=26215msec, maxt=26215msec 

Disk stats (read/write): 
sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
[root@ceph1-3 fiorbd]# ./fio fiorbd 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 




AFTER RESTARTING OSDS 
---------------------- 
[root@ceph1-3 fiorbd]# ./fio fiorbd 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 
... 
fio-2.2.7-10-g51e9 
Starting 10 processes 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
rbd engine: RBD version: 0.1.9 
^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 iops] [eta 01h:09m:27s] 
fio: terminating on signal 2 

rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 22 10:02:28 2015 
read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt= 7389msec 
slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
clat (usec): min=8, max=684328, avg=930.43, stdev=6419.12 
lat (usec): min=154, max=684342, avg=957.02, stdev=6419.28 
clat percentiles (usec): 
| 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
| 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
| 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
| 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
| 99.99th=[436224] 
bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, stdev=28263.82 
lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=1.23% 
lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% 
lat (msec) : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% 
lat (msec) : 250=0.01%, 500=0.02%, 750=0.01% 
cpu : usr=44.06%, sys=11.26%, ctx=642620, majf=0, minf=6832 
IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% 
issued : total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 

Disk stats (read/write): 
sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 




CEPH LOG 
-------- 

before restarting osd 
---------------------- 

2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 298 MB/s rd, 76465 op/s 
2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 333 MB/s rd, 85355 op/s 
2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 343 MB/s rd, 87932 op/s 
2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 328 MB/s rd, 84151 op/s 
2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 237 MB/s rd, 60855 op/s 
2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 144 MB/s rd, 36935 op/s 
2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 321 MB/s rd, 82334 op/s 
2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 368 MB/s rd, 94211 op/s 
2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 244 MB/s rd, 62644 op/s 
2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 175 MB/s rd, 44997 op/s 
2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 122 MB/s rd, 31259 op/s 
2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229 MB/s rd, 58674 op/s 
2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 271 MB/s rd, 69501 op/s 
2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 211 MB/s rd, 54020 op/s 
2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 164 MB/s rd, 42001 op/s 
2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 134 MB/s rd, 34380 op/s 
2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 293 MB/s rd, 75213 op/s 
2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 337 MB/s rd, 86353 op/s 
2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229 MB/s rd, 58839 op/s 
2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 152 MB/s rd, 39117 op/s 


restarting osd 
--------------- 

2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster [INF] osd.0 marked itself down 
2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster [INF] osdmap e849: 9 osds: 8 up, 9 in 
2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 kB/s rd, 130 op/s 
2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster [INF] osdmap e850: 9 osds: 8 up, 9 in 
2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster [INF] osd.1 marked itself down 
2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster [INF] osdmap e851: 9 osds: 7 up, 9 in 
2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster [INF] osdmap e852: 9 osds: 7 up, 9 in 
2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster [INF] osd.2 marked itself down 
2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster [INF] osdmap e853: 9 osds: 6 up, 9 in 
2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster [INF] osdmap e854: 9 osds: 6 up, 9 in 
2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster [INF] osd.1 10.7.0.152:6800/14848 boot 
2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster [INF] osdmap e855: 9 osds: 7 up, 9 in 
2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster [INF] osd.3 marked itself down 
2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster [INF] osdmap e856: 9 osds: 6 up, 9 in 
2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster [INF] osd.6 marked itself down 
2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster [INF] osd.2 10.7.0.152:6812/15410 boot 
2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster [INF] osdmap e857: 9 osds: 6 up, 9 in 
2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster [INF] osdmap e858: 9 osds: 6 up, 9 in 
2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster [INF] osd.3 10.7.0.152:6816/15971 boot 
2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster [INF] osdmap e859: 9 osds: 7 up, 9 in 
2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster [INF] osd.7 marked itself down 
2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster [INF] osd.0 10.7.0.152:6804/14481 boot 
2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster [INF] osdmap e860: 9 osds: 7 up, 9 in 
2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster [INF] osd.6 10.7.0.153:6808/21955 boot 
2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster [INF] osdmap e861: 9 osds: 8 up, 9 in 
2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 stale+active+remapped, 498 stale+active+clean, 2 peering, 28 active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster [INF] osd.8 marked itself down 
2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster [INF] osdmap e862: 9 osds: 7 up, 9 in 
2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 stale+active+remapped, 587 stale+active+clean, 2 peering, 11 active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster [INF] osd.7 10.7.0.153:6804/22536 boot 
2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster [INF] osdmap e863: 9 osds: 8 up, 9 in 
2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 stale+active+remapped, 503 stale+active+clean, 4 active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster [INF] osdmap e864: 9 osds: 8 up, 9 in 
2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 stale+active+remapped, 503 stale+active+clean, 4 active+undersized+degraded+remapped, 5 peering, 13 active 


AFTER OSD RESTART 
------------------ 


2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 786 MB/s rd, 196 kop/s 
2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1578 MB/s rd, 394 kop/s 
2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 932 MB/s rd, 233 kop/s 
2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 627 MB/s rd, 156 kop/s 
2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1034 MB/s rd, 258 kop/s 
2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 529 MB/s rd, 132 kop/s 
2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 770 MB/s rd, 192 kop/s 
2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1358 MB/s rd, 339 kop/s 
2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 649 MB/s rd, 162 kop/s 
2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 580 MB/s rd, 145 kop/s 
2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 962 MB/s rd, 240 kop/s 
2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 506 MB/s rd, 126 kop/s 
2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 774 MB/s rd, 193 kop/s 
2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1363 MB/s rd, 340 kop/s 
2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 662 MB/s rd, 165 kop/s 
2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 593 MB/s rd, 148 kop/s 
2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 938 MB/s rd, 234 kop/s 

_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 

BQ_END




-- 
Milosz Tanski 
CTO 
16 East 34th Street, 15th floor 
New York, NY 10016 

p: 646-253-9055 
e: milosz@adfin.com 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]           ` <886679306.527978224.1429710961512.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2015-04-22 14:01             ` Mark Nelson
       [not found]               ` <5537A9A8.10200-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Mark Nelson @ 2015-04-22 14:01 UTC (permalink / raw)
  To: Alexandre DERUMIER, Milosz Tanski; +Cc: ceph-devel, ceph-users

Hi Alexandre,

We should discuss this at the perf meeting today.  We knew NUMA node 
affinity issues were going to crop up sooner or later (and indeed 
already have in some cases), but this is pretty major.  It's probably 
time to really dig in and figure out how to deal with this.

Note: this is one of the reasons I like small nodes with single sockets 
and fewer OSDs.

Mark

On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
> Hi,
>
> I have done a lot of test today, and it seem indeed numa related.
>
> My numastat was
>
> # numastat
>                             node0           node1
> numa_hit                99075422       153976877
> numa_miss              167490965         1493663
> numa_foreign             1493663       167491417
> interleave_hit            157745          167015
> local_node              99049179       153830554
> other_node             167517697         1639986
>
> So, a lot of miss.
>
> In this case , I can reproduce ios going from 85k to 300k iops, up and down.
>
> now setting
> echo 0 > /proc/sys/kernel/numa_balancing
>
> and starting osd daemons with
>
> numactl --interleave=all /usr/bin/ceph-osd
>
>
> I have a constant 300k iops !
>
>
> I wonder if it could be improve by binding osd daemons to specific numa node.
> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning.
>
>
>
> ----- Mail original -----
> De: "Milosz Tanski" <milosz@adfin.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com>
> Envoyé: Mercredi 22 Avril 2015 12:54:23
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
>
>
> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote:
>
>
> I wonder if it could be numa related,
>
> I'm using centos 7.1,
> and auto numa balacning is enabled
>
> cat /proc/sys/kernel/numa_balancing = 1
>
> Maybe osd daemon access to buffer on wrong numa node.
>
> I'll try to reproduce the problem
>
>
>
> Can you force the degenerate case using numactl? To either affirm or deny your suspicion.
>
> BQ_BEGIN
>
>
> ----- Mail original -----
> De: "aderumier" < aderumier@odiso.com >
> À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" < ceph-users@lists.ceph.com >
> Envoyé: Mercredi 22 Avril 2015 10:40:05
> Objet: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> Hi,
>
> I was doing some benchmarks,
> I have found an strange behaviour.
>
> Using fio with rbd engine, I was able to reach around 100k iops.
> (osd datas in linux buffer, iostat show 0% disk access)
>
> then after restarting all osd daemons,
>
> the same fio benchmark show now around 300k iops.
> (osd datas in linux buffer, iostat show 0% disk access)
>
>
> any ideas?
>
>
>
>
> before restarting osd
> ---------------------
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
> ...
> fio-2.2.7-10-g51e9
> Starting 10 processes
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 iops] [eta 14m:45s]
> fio: terminating on signal 2
>
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 22 10:00:04 2015
> read : io=11558MB, bw=451487KB/s, iops=112871, runt= 26215msec
> slat (usec): min=5, max=3685, avg=16.89, stdev=17.38
> clat (usec): min=5, max=62584, avg=2695.80, stdev=5351.23
> lat (usec): min=109, max=62598, avg=2712.68, stdev=5350.42
> clat percentiles (usec):
> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
> | 99.99th=[47360]
> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, stdev=28809.95
> lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=20.79%
> lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03%
> lat (msec) : 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37%
> lat (msec) : 100=0.01%
> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710
> IO depths : 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, >=64=0.0%
> issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, mint=26215msec, maxt=26215msec
>
> Disk stats (read/write):
> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
> [root@ceph1-3 fiorbd]# ./fio fiorbd
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
>
>
>
>
> AFTER RESTARTING OSDS
> ----------------------
> [root@ceph1-3 fiorbd]# ./fio fiorbd
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
> ...
> fio-2.2.7-10-g51e9
> Starting 10 processes
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 iops] [eta 01h:09m:27s]
> fio: terminating on signal 2
>
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 22 10:02:28 2015
> read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt= 7389msec
> slat (usec): min=5, max=3406, avg=26.59, stdev=40.35
> clat (usec): min=8, max=684328, avg=930.43, stdev=6419.12
> lat (usec): min=154, max=684342, avg=957.02, stdev=6419.28
> clat percentiles (usec):
> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
> | 99.99th=[436224]
> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, stdev=28263.82
> lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=1.23%
> lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84%
> lat (msec) : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01%
> lat (msec) : 250=0.01%, 500=0.02%, 750=0.01%
> cpu : usr=44.06%, sys=11.26%, ctx=642620, majf=0, minf=6832
> IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0%
> issued : total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s, mint=7389msec, maxt=7389msec
>
> Disk stats (read/write):
> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>
>
>
>
> CEPH LOG
> --------
>
> before restarting osd
> ----------------------
>
> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 298 MB/s rd, 76465 op/s
> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 333 MB/s rd, 85355 op/s
> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 343 MB/s rd, 87932 op/s
> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 328 MB/s rd, 84151 op/s
> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 237 MB/s rd, 60855 op/s
> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 144 MB/s rd, 36935 op/s
> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 321 MB/s rd, 82334 op/s
> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 368 MB/s rd, 94211 op/s
> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 244 MB/s rd, 62644 op/s
> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 175 MB/s rd, 44997 op/s
> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 122 MB/s rd, 31259 op/s
> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229 MB/s rd, 58674 op/s
> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 271 MB/s rd, 69501 op/s
> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 211 MB/s rd, 54020 op/s
> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 164 MB/s rd, 42001 op/s
> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 134 MB/s rd, 34380 op/s
> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 293 MB/s rd, 75213 op/s
> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 337 MB/s rd, 86353 op/s
> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229 MB/s rd, 58839 op/s
> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 152 MB/s rd, 39117 op/s
>
>
> restarting osd
> ---------------
>
> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster [INF] osd.0 marked itself down
> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster [INF] osdmap e849: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 kB/s rd, 130 op/s
> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster [INF] osdmap e850: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster [INF] osd.1 marked itself down
> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster [INF] osdmap e851: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster [INF] osdmap e852: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster [INF] osd.2 marked itself down
> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster [INF] osdmap e853: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster [INF] osdmap e854: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster [INF] osd.1 10.7.0.152:6800/14848 boot
> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster [INF] osdmap e855: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster [INF] osd.3 marked itself down
> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster [INF] osdmap e856: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster [INF] osd.6 marked itself down
> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster [INF] osd.2 10.7.0.152:6812/15410 boot
> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster [INF] osdmap e857: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster [INF] osdmap e858: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster [INF] osd.3 10.7.0.152:6816/15971 boot
> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster [INF] osdmap e859: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster [INF] osd.7 marked itself down
> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster [INF] osd.0 10.7.0.152:6804/14481 boot
> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster [INF] osdmap e860: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster [INF] osd.6 10.7.0.153:6808/21955 boot
> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster [INF] osdmap e861: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 stale+active+remapped, 498 stale+active+clean, 2 peering, 28 active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster [INF] osd.8 marked itself down
> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster [INF] osdmap e862: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 stale+active+remapped, 587 stale+active+clean, 2 peering, 11 active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster [INF] osd.7 10.7.0.153:6804/22536 boot
> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster [INF] osdmap e863: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 stale+active+remapped, 503 stale+active+clean, 4 active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster [INF] osdmap e864: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 stale+active+remapped, 503 stale+active+clean, 4 active+undersized+degraded+remapped, 5 peering, 13 active
>
>
> AFTER OSD RESTART
> ------------------
>
>
> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 786 MB/s rd, 196 kop/s
> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1578 MB/s rd, 394 kop/s
> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 932 MB/s rd, 233 kop/s
> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 627 MB/s rd, 156 kop/s
> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1034 MB/s rd, 258 kop/s
> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 529 MB/s rd, 132 kop/s
> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 770 MB/s rd, 192 kop/s
> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1358 MB/s rd, 339 kop/s
> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 649 MB/s rd, 162 kop/s
> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 580 MB/s rd, 145 kop/s
> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 962 MB/s rd, 240 kop/s
> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 506 MB/s rd, 126 kop/s
> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 774 MB/s rd, 193 kop/s
> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1363 MB/s rd, 340 kop/s
> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 662 MB/s rd, 165 kop/s
> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 593 MB/s rd, 148 kop/s
> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 938 MB/s rd, 234 kop/s
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]               ` <5537A9A8.10200-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-04-22 14:09                 ` Alexandre DERUMIER
  2015-04-22 14:34                 ` Srinivasula Maram
  1 sibling, 0 replies; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-22 14:09 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel, Milosz Tanski, ceph-users

>>We should discuss this at the perf meeting today. We knew NUMA node 
>>affinity issues were going to crop up sooner or later (and indeed 
>>already have in some cases), but this is pretty major. It's probably 
>>time to really dig in and figure out how to deal with this. 

Damn, I'm on the road currently,I'll be to short for me to going to the meting today.
Sorry.

>>Note: this is one of the reasons I like small nodes with single sockets 
>>and fewer OSDs. 

Well, Indeed it's a 2 sockets nodes, but with only 4 osd currently.






----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "aderumier" <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com>
Envoyé: Mercredi 22 Avril 2015 16:01:12
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi Alexandre, 

We should discuss this at the perf meeting today. We knew NUMA node 
affinity issues were going to crop up sooner or later (and indeed 
already have in some cases), but this is pretty major. It's probably 
time to really dig in and figure out how to deal with this. 

Note: this is one of the reasons I like small nodes with single sockets 
and fewer OSDs. 

Mark 

On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
> Hi, 
> 
> I have done a lot of test today, and it seem indeed numa related. 
> 
> My numastat was 
> 
> # numastat 
> node0 node1 
> numa_hit 99075422 153976877 
> numa_miss 167490965 1493663 
> numa_foreign 1493663 167491417 
> interleave_hit 157745 167015 
> local_node 99049179 153830554 
> other_node 167517697 1639986 
> 
> So, a lot of miss. 
> 
> In this case , I can reproduce ios going from 85k to 300k iops, up and down. 
> 
> now setting 
> echo 0 > /proc/sys/kernel/numa_balancing 
> 
> and starting osd daemons with 
> 
> numactl --interleave=all /usr/bin/ceph-osd 
> 
> 
> I have a constant 300k iops ! 
> 
> 
> I wonder if it could be improve by binding osd daemons to specific numa node. 
> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning. 
> 
> 
> 
> ----- Mail original ----- 
> De: "Milosz Tanski" <milosz@adfin.com> 
> À: "aderumier" <aderumier@odiso.com> 
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com> 
> Envoyé: Mercredi 22 Avril 2015 12:54:23 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> 
> 
> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote: 
> 
> 
> I wonder if it could be numa related, 
> 
> I'm using centos 7.1, 
> and auto numa balacning is enabled 
> 
> cat /proc/sys/kernel/numa_balancing = 1 
> 
> Maybe osd daemon access to buffer on wrong numa node. 
> 
> I'll try to reproduce the problem 
> 
> 
> 
> Can you force the degenerate case using numactl? To either affirm or deny your suspicion. 
> 
>  
> 
> 
> ----- Mail original ----- 
> De: "aderumier" < aderumier@odiso.com > 
> À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" < ceph-users@lists.ceph.com > 
> Envoyé: Mercredi 22 Avril 2015 10:40:05 
> Objet: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> Hi, 
> 
> I was doing some benchmarks, 
> I have found an strange behaviour. 
> 
> Using fio with rbd engine, I was able to reach around 100k iops. 
> (osd datas in linux buffer, iostat show 0% disk access) 
> 
> then after restarting all osd daemons, 
> 
> the same fio benchmark show now around 300k iops. 
> (osd datas in linux buffer, iostat show 0% disk access) 
> 
> 
> any ideas? 
> 
> 
> 
> 
> before restarting osd 
> --------------------- 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 
> ... 
> fio-2.2.7-10-g51e9 
> Starting 10 processes 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 iops] [eta 14m:45s] 
> fio: terminating on signal 2 
> 
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 22 10:00:04 2015 
> read : io=11558MB, bw=451487KB/s, iops=112871, runt= 26215msec 
> slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 
> clat (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 
> lat (usec): min=109, max=62598, avg=2712.68, stdev=5350.42 
> clat percentiles (usec): 
> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
> | 99.99th=[47360] 
> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, stdev=28809.95 
> lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=20.79% 
> lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% 
> lat (msec) : 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% 
> lat (msec) : 100=0.01% 
> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 
> IO depths : 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% 
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, >=64=0.0% 
> issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 
> latency : target=0, window=0, percentile=100.00%, depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, mint=26215msec, maxt=26215msec 
> 
> Disk stats (read/write): 
> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 
> 
> 
> 
> 
> AFTER RESTARTING OSDS 
> ---------------------- 
> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 
> ... 
> fio-2.2.7-10-g51e9 
> Starting 10 processes 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 iops] [eta 01h:09m:27s] 
> fio: terminating on signal 2 
> 
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 22 10:02:28 2015 
> read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt= 7389msec 
> slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
> clat (usec): min=8, max=684328, avg=930.43, stdev=6419.12 
> lat (usec): min=154, max=684342, avg=957.02, stdev=6419.28 
> clat percentiles (usec): 
> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
> | 99.99th=[436224] 
> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, stdev=28263.82 
> lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=1.23% 
> lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% 
> lat (msec) : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% 
> lat (msec) : 250=0.01%, 500=0.02%, 750=0.01% 
> cpu : usr=44.06%, sys=11.26%, ctx=642620, majf=0, minf=6832 
> IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% 
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
> complete : 0=0.0%, 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% 
> issued : total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 
> latency : target=0, window=0, percentile=100.00%, depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 
> 
> Disk stats (read/write): 
> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
> 
> 
> 
> 
> CEPH LOG 
> -------- 
> 
> before restarting osd 
> ---------------------- 
> 
> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 298 MB/s rd, 76465 op/s 
> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 333 MB/s rd, 85355 op/s 
> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 343 MB/s rd, 87932 op/s 
> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 328 MB/s rd, 84151 op/s 
> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 237 MB/s rd, 60855 op/s 
> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 144 MB/s rd, 36935 op/s 
> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 321 MB/s rd, 82334 op/s 
> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 368 MB/s rd, 94211 op/s 
> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 244 MB/s rd, 62644 op/s 
> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 175 MB/s rd, 44997 op/s 
> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 122 MB/s rd, 31259 op/s 
> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229 MB/s rd, 58674 op/s 
> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 271 MB/s rd, 69501 op/s 
> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 211 MB/s rd, 54020 op/s 
> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 164 MB/s rd, 42001 op/s 
> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 134 MB/s rd, 34380 op/s 
> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 293 MB/s rd, 75213 op/s 
> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 337 MB/s rd, 86353 op/s 
> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 229 MB/s rd, 58839 op/s 
> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 1295 GB avail; 152 MB/s rd, 39117 op/s 
> 
> 
> restarting osd 
> --------------- 
> 
> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster [INF] osd.0 marked itself down 
> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster [INF] osdmap e849: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 kB/s rd, 130 op/s 
> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster [INF] osdmap e850: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster [INF] osd.1 marked itself down 
> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster [INF] osdmap e851: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster [INF] osdmap e852: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster [INF] osd.2 marked itself down 
> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster [INF] osdmap e853: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster [INF] osdmap e854: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster [INF] osd.1 10.7.0.152:6800/14848 boot 
> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster [INF] osdmap e855: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster [INF] osd.3 marked itself down 
> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster [INF] osdmap e856: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster [INF] osd.6 marked itself down 
> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster [INF] osd.2 10.7.0.152:6812/15410 boot 
> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster [INF] osdmap e857: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster [INF] osdmap e858: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster [INF] osd.3 10.7.0.152:6816/15971 boot 
> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster [INF] osdmap e859: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster [INF] osd.7 marked itself down 
> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster [INF] osd.0 10.7.0.152:6804/14481 boot 
> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster [INF] osdmap e860: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292 active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster [INF] osd.6 10.7.0.153:6808/21955 boot 
> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster [INF] osdmap e861: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 stale+active+remapped, 498 stale+active+clean, 2 peering, 28 active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster [INF] osd.8 marked itself down 
> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster [INF] osdmap e862: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 stale+active+remapped, 587 stale+active+clean, 2 peering, 11 active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster [INF] osd.7 10.7.0.153:6804/22536 boot 
> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster [INF] osdmap e863: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 stale+active+remapped, 503 stale+active+clean, 4 active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster [INF] osdmap e864: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 stale+active+remapped, 503 stale+active+clean, 4 active+undersized+degraded+remapped, 5 peering, 13 active 
> 
> 
> AFTER OSD RESTART 
> ------------------ 
> 
> 
> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 786 MB/s rd, 196 kop/s 
> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1578 MB/s rd, 394 kop/s 
> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 932 MB/s rd, 233 kop/s 
> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 627 MB/s rd, 156 kop/s 
> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1034 MB/s rd, 258 kop/s 
> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 529 MB/s rd, 132 kop/s 
> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 770 MB/s rd, 192 kop/s 
> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1358 MB/s rd, 339 kop/s 
> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 649 MB/s rd, 162 kop/s 
> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 580 MB/s rd, 145 kop/s 
> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 962 MB/s rd, 240 kop/s 
> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 506 MB/s rd, 126 kop/s 
> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 774 MB/s rd, 193 kop/s 
> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 1363 MB/s rd, 340 kop/s 
> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 662 MB/s rd, 165 kop/s 
> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 593 MB/s rd, 148 kop/s 
> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 1295 GB avail; 938 MB/s rd, 234 kop/s 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]               ` <5537A9A8.10200-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-04-22 14:09                 ` Alexandre DERUMIER
@ 2015-04-22 14:34                 ` Srinivasula Maram
  2015-04-22 16:30                   ` [ceph-users] " Alexandre DERUMIER
  1 sibling, 1 reply; 35+ messages in thread
From: Srinivasula Maram @ 2015-04-22 14:34 UTC (permalink / raw)
  To: Mark Nelson, Alexandre DERUMIER, Milosz Tanski; +Cc: ceph-devel, ceph-users

I feel it is due to tcmalloc issue

I have seen similar issue in my setup after 20 days.

Thanks,
Srinivas



-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Mark Nelson
Sent: Wednesday, April 22, 2015 7:31 PM
To: Alexandre DERUMIER; Milosz Tanski
Cc: ceph-devel; ceph-users
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi Alexandre,

We should discuss this at the perf meeting today.  We knew NUMA node affinity issues were going to crop up sooner or later (and indeed already have in some cases), but this is pretty major.  It's probably time to really dig in and figure out how to deal with this.

Note: this is one of the reasons I like small nodes with single sockets and fewer OSDs.

Mark

On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
> Hi,
>
> I have done a lot of test today, and it seem indeed numa related.
>
> My numastat was
>
> # numastat
>                             node0           node1
> numa_hit                99075422       153976877
> numa_miss              167490965         1493663
> numa_foreign             1493663       167491417
> interleave_hit            157745          167015
> local_node              99049179       153830554
> other_node             167517697         1639986
>
> So, a lot of miss.
>
> In this case , I can reproduce ios going from 85k to 300k iops, up and down.
>
> now setting
> echo 0 > /proc/sys/kernel/numa_balancing
>
> and starting osd daemons with
>
> numactl --interleave=all /usr/bin/ceph-osd
>
>
> I have a constant 300k iops !
>
>
> I wonder if it could be improve by binding osd daemons to specific numa node.
> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning.
>
>
>
> ----- Mail original -----
> De: "Milosz Tanski" <milosz@adfin.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users"
> <ceph-users@lists.ceph.com>
> Envoyé: Mercredi 22 Avril 2015 12:54:23
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
> daemon improve performance from 100k iops to 300k iops
>
>
>
> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote:
>
>
> I wonder if it could be numa related,
>
> I'm using centos 7.1,
> and auto numa balacning is enabled
>
> cat /proc/sys/kernel/numa_balancing = 1
>
> Maybe osd daemon access to buffer on wrong numa node.
>
> I'll try to reproduce the problem
>
>
>
> Can you force the degenerate case using numactl? To either affirm or deny your suspicion.
>
> BQ_BEGIN
>
>
> ----- Mail original -----
> De: "aderumier" < aderumier@odiso.com >
> À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" <
> ceph-users@lists.ceph.com >
> Envoyé: Mercredi 22 Avril 2015 10:40:05
> Objet: [ceph-users] strange benchmark problem : restarting osd daemon
> improve performance from 100k iops to 300k iops
>
> Hi,
>
> I was doing some benchmarks,
> I have found an strange behaviour.
>
> Using fio with rbd engine, I was able to reach around 100k iops.
> (osd datas in linux buffer, iostat show 0% disk access)
>
> then after restarting all osd daemons,
>
> the same fio benchmark show now around 300k iops.
> (osd datas in linux buffer, iostat show 0% disk access)
>
>
> any ideas?
>
>
>
>
> before restarting osd
> ---------------------
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=rbd, iodepth=32 ...
> fio-2.2.7-10-g51e9
> Starting 10 processes
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0
> iops] [eta 14m:45s]
> fio: terminating on signal 2
>
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr
> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt=
> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat
> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
> (usec):
> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
> | 99.99th=[47360]
> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01%
> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
> >=64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%,
> depth=32
>
> Run status group 0 (all jobs):
> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s,
> mint=26215msec, maxt=26215msec
>
> Disk stats (read/write):
> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
> [root@ceph1-3 fiorbd]# ./fio fiorbd
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=rbd, iodepth=32
>
>
>
>
> AFTER RESTARTING OSDS
> ----------------------
> [root@ceph1-3 fiorbd]# ./fio fiorbd
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=rbd, iodepth=32 ...
> fio-2.2.7-10-g51e9
> Starting 10 processes
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> rbd engine: RBD version: 0.1.9
> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0
> iops] [eta 01h:09m:27s]
> fio: terminating on signal 2
>
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr
> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt=
> 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 clat
> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
> (usec):
> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
> | 99.99th=[436224]
> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec)
> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
> target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s,
> mint=7389msec, maxt=7389msec
>
> Disk stats (read/write):
> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>
>
>
>
> CEPH LOG
> --------
>
> before restarting osd
> ----------------------
>
> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 298 MB/s rd, 76465 op/s
> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 333 MB/s rd, 85355 op/s
> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 343 MB/s rd, 87932 op/s
> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 328 MB/s rd, 84151 op/s
> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 237 MB/s rd, 60855 op/s
> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 144 MB/s rd, 36935 op/s
> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 321 MB/s rd, 82334 op/s
> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 368 MB/s rd, 94211 op/s
> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 244 MB/s rd, 62644 op/s
> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 175 MB/s rd, 44997 op/s
> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 122 MB/s rd, 31259 op/s
> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 229 MB/s rd, 58674 op/s
> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 271 MB/s rd, 69501 op/s
> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 211 MB/s rd, 54020 op/s
> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 164 MB/s rd, 42001 op/s
> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 134 MB/s rd, 34380 op/s
> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 293 MB/s rd, 75213 op/s
> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 337 MB/s rd, 86353 op/s
> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 229 MB/s rd, 58839 op/s
> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> 1295 GB avail; 152 MB/s rd, 39117 op/s
>
>
> restarting osd
> ---------------
>
> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
> [INF] osd.0 marked itself down
> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
> [INF] osdmap e849: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516
> kB/s rd, 130 op/s
> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
> [INF] osdmap e850: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
> [INF] osd.1 marked itself down
> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
> [INF] osdmap e851: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
> [INF] osdmap e852: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
> [INF] osd.2 marked itself down
> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
> [INF] osdmap e853: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
> [INF] osdmap e854: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
> [INF] osd.1 10.7.0.152:6800/14848 boot
> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
> [INF] osdmap e855: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
> [INF] osd.3 marked itself down
> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
> [INF] osdmap e856: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
> [INF] osd.6 marked itself down
> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
> [INF] osd.2 10.7.0.152:6812/15410 boot
> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
> [INF] osdmap e857: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
> [INF] osdmap e858: 9 osds: 6 up, 9 in
> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
> [INF] osd.3 10.7.0.152:6816/15971 boot
> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
> [INF] osdmap e859: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
> [INF] osd.7 marked itself down
> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
> [INF] osd.0 10.7.0.152:6804/14481 boot
> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
> [INF] osdmap e860: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
> [INF] osd.6 10.7.0.153:6808/21955 boot
> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
> [INF] osdmap e861: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data,
> 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
> [INF] osd.8 marked itself down
> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
> [INF] osdmap e862: 9 osds: 7 up, 9 in
> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data,
> 420 GB used, 874 GB / 1295 GB avail
> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
> [INF] osd.7 10.7.0.153:6804/22536 boot
> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
> [INF] osdmap e863: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
> stale+active+remapped, 503 stale+active+clean, 4
> active+undersized+degraded+remapped, 5 peering, 13 active+remapped,
> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB
> / 1295 GB avail
> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
> [INF] osdmap e864: 9 osds: 8 up, 9 in
> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
> stale+active+remapped, 503 stale+active+clean, 4
> active+undersized+degraded+remapped, 5 peering, 13 active
>
>
> AFTER OSD RESTART
> ------------------
>
>
> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 786 MB/s rd, 196 kop/s
> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 1578 MB/s rd, 394 kop/s
> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 932 MB/s rd, 233 kop/s
> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 627 MB/s rd, 156 kop/s
> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 1034 MB/s rd, 258 kop/s
> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 529 MB/s rd, 132 kop/s
> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 770 MB/s rd, 192 kop/s
> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 1358 MB/s rd, 339 kop/s
> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 649 MB/s rd, 162 kop/s
> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 580 MB/s rd, 145 kop/s
> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 962 MB/s rd, 240 kop/s
> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 506 MB/s rd, 126 kop/s
> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 774 MB/s rd, 193 kop/s
> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 1363 MB/s rd, 340 kop/s
> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 662 MB/s rd, 165 kop/s
> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 593 MB/s rd, 148 kop/s
> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> 1295 GB avail; 938 MB/s rd, 234 kop/s
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
  2015-04-22 14:34                 ` Srinivasula Maram
@ 2015-04-22 16:30                   ` Alexandre DERUMIER
       [not found]                     ` <336199846.535234014.1429720226874.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-22 16:30 UTC (permalink / raw)
  To: Srinivasula Maram; +Cc: Mark Nelson, Milosz Tanski, ceph-devel, ceph-users

Hi,

>>I feel it is due to tcmalloc issue 

Indeed, I had patched one of my node, but not the other.
So maybe I have hit this bug. (but I can't confirm, I don't have traces).

But numa interleaving seem to help in my case (maybe not from 100->300k, but 250k->300k).

I need to do more long tests to confirm that.


----- Mail original -----
De: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
À: "Mark Nelson" <mnelson@redhat.com>, "aderumier" <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com>
Envoyé: Mercredi 22 Avril 2015 16:34:33
Objet: RE: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

I feel it is due to tcmalloc issue 

I have seen similar issue in my setup after 20 days. 

Thanks, 
Srinivas 



-----Original Message----- 
From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Mark Nelson 
Sent: Wednesday, April 22, 2015 7:31 PM 
To: Alexandre DERUMIER; Milosz Tanski 
Cc: ceph-devel; ceph-users 
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

Hi Alexandre, 

We should discuss this at the perf meeting today. We knew NUMA node affinity issues were going to crop up sooner or later (and indeed already have in some cases), but this is pretty major. It's probably time to really dig in and figure out how to deal with this. 

Note: this is one of the reasons I like small nodes with single sockets and fewer OSDs. 

Mark 

On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
> Hi, 
> 
> I have done a lot of test today, and it seem indeed numa related. 
> 
> My numastat was 
> 
> # numastat 
> node0 node1 
> numa_hit 99075422 153976877 
> numa_miss 167490965 1493663 
> numa_foreign 1493663 167491417 
> interleave_hit 157745 167015 
> local_node 99049179 153830554 
> other_node 167517697 1639986 
> 
> So, a lot of miss. 
> 
> In this case , I can reproduce ios going from 85k to 300k iops, up and down. 
> 
> now setting 
> echo 0 > /proc/sys/kernel/numa_balancing 
> 
> and starting osd daemons with 
> 
> numactl --interleave=all /usr/bin/ceph-osd 
> 
> 
> I have a constant 300k iops ! 
> 
> 
> I wonder if it could be improve by binding osd daemons to specific numa node. 
> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning. 
> 
> 
> 
> ----- Mail original ----- 
> De: "Milosz Tanski" <milosz@adfin.com> 
> À: "aderumier" <aderumier@odiso.com> 
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" 
> <ceph-users@lists.ceph.com> 
> Envoyé: Mercredi 22 Avril 2015 12:54:23 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops 
> 
> 
> 
> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote: 
> 
> 
> I wonder if it could be numa related, 
> 
> I'm using centos 7.1, 
> and auto numa balacning is enabled 
> 
> cat /proc/sys/kernel/numa_balancing = 1 
> 
> Maybe osd daemon access to buffer on wrong numa node. 
> 
> I'll try to reproduce the problem 
> 
> 
> 
> Can you force the degenerate case using numactl? To either affirm or deny your suspicion. 
> 
>  
> 
> 
> ----- Mail original ----- 
> De: "aderumier" < aderumier@odiso.com > 
> À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" < 
> ceph-users@lists.ceph.com > 
> Envoyé: Mercredi 22 Avril 2015 10:40:05 
> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
> improve performance from 100k iops to 300k iops 
> 
> Hi, 
> 
> I was doing some benchmarks, 
> I have found an strange behaviour. 
> 
> Using fio with rbd engine, I was able to reach around 100k iops. 
> (osd datas in linux buffer, iostat show 0% disk access) 
> 
> then after restarting all osd daemons, 
> 
> the same fio benchmark show now around 300k iops. 
> (osd datas in linux buffer, iostat show 0% disk access) 
> 
> 
> any ideas? 
> 
> 
> 
> 
> before restarting osd 
> --------------------- 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 ... 
> fio-2.2.7-10-g51e9 
> Starting 10 processes 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
> iops] [eta 14m:45s] 
> fio: terminating on signal 2 
> 
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
> (usec): 
> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
> | 99.99th=[47360] 
> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
> >=64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
> depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
> mint=26215msec, maxt=26215msec 
> 
> Disk stats (read/write): 
> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 
> 
> 
> 
> 
> AFTER RESTARTING OSDS 
> ---------------------- 
> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 ... 
> fio-2.2.7-10-g51e9 
> Starting 10 processes 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
> iops] [eta 01h:09m:27s] 
> fio: terminating on signal 2 
> 
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt= 
> 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 clat 
> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
> (usec): 
> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
> | 99.99th=[436224] 
> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
> target=0, window=0, percentile=100.00%, depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s, 
> mint=7389msec, maxt=7389msec 
> 
> Disk stats (read/write): 
> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
> 
> 
> 
> 
> CEPH LOG 
> -------- 
> 
> before restarting osd 
> ---------------------- 
> 
> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 298 MB/s rd, 76465 op/s 
> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 333 MB/s rd, 85355 op/s 
> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 343 MB/s rd, 87932 op/s 
> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 328 MB/s rd, 84151 op/s 
> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 237 MB/s rd, 60855 op/s 
> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 144 MB/s rd, 36935 op/s 
> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 321 MB/s rd, 82334 op/s 
> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 368 MB/s rd, 94211 op/s 
> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 244 MB/s rd, 62644 op/s 
> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 175 MB/s rd, 44997 op/s 
> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 122 MB/s rd, 31259 op/s 
> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 229 MB/s rd, 58674 op/s 
> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 271 MB/s rd, 69501 op/s 
> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 211 MB/s rd, 54020 op/s 
> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 164 MB/s rd, 42001 op/s 
> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 134 MB/s rd, 34380 op/s 
> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 293 MB/s rd, 75213 op/s 
> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 337 MB/s rd, 86353 op/s 
> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 229 MB/s rd, 58839 op/s 
> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 152 MB/s rd, 39117 op/s 
> 
> 
> restarting osd 
> --------------- 
> 
> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
> [INF] osd.0 marked itself down 
> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
> [INF] osdmap e849: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
> kB/s rd, 130 op/s 
> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
> [INF] osdmap e850: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
> [INF] osd.1 marked itself down 
> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
> [INF] osdmap e851: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
> [INF] osdmap e852: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
> [INF] osd.2 marked itself down 
> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
> [INF] osdmap e853: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
> [INF] osdmap e854: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
> [INF] osd.1 10.7.0.152:6800/14848 boot 
> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
> [INF] osdmap e855: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
> [INF] osd.3 marked itself down 
> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
> [INF] osdmap e856: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
> [INF] osd.6 marked itself down 
> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
> [INF] osd.2 10.7.0.152:6812/15410 boot 
> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
> [INF] osdmap e857: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
> [INF] osdmap e858: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
> [INF] osd.3 10.7.0.152:6816/15971 boot 
> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
> [INF] osdmap e859: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
> [INF] osd.7 marked itself down 
> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
> [INF] osd.0 10.7.0.152:6804/14481 boot 
> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
> [INF] osdmap e860: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
> [INF] osd.6 10.7.0.153:6808/21955 boot 
> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
> [INF] osdmap e861: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
> 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
> [INF] osd.8 marked itself down 
> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
> [INF] osdmap e862: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
> 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
> [INF] osd.7 10.7.0.153:6804/22536 boot 
> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
> [INF] osdmap e863: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
> stale+active+remapped, 503 stale+active+clean, 4 
> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB 
> / 1295 GB avail 
> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
> [INF] osdmap e864: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
> stale+active+remapped, 503 stale+active+clean, 4 
> active+undersized+degraded+remapped, 5 peering, 13 active 
> 
> 
> AFTER OSD RESTART 
> ------------------ 
> 
> 
> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 786 MB/s rd, 196 kop/s 
> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 932 MB/s rd, 233 kop/s 
> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 627 MB/s rd, 156 kop/s 
> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 529 MB/s rd, 132 kop/s 
> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 770 MB/s rd, 192 kop/s 
> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 649 MB/s rd, 162 kop/s 
> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 580 MB/s rd, 145 kop/s 
> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 962 MB/s rd, 240 kop/s 
> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 506 MB/s rd, 126 kop/s 
> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 774 MB/s rd, 193 kop/s 
> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 662 MB/s rd, 165 kop/s 
> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 593 MB/s rd, 148 kop/s 
> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 938 MB/s rd, 234 kop/s 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                     ` <336199846.535234014.1429720226874.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2015-04-23  8:00                       ` Alexandre DERUMIER
  2015-04-23  8:04                         ` 答复: [ceph-users] " 管清政
       [not found]                         ` <1806383776.554938002.1429776034371.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 2 replies; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-23  8:00 UTC (permalink / raw)
  To: Srinivasula Maram; +Cc: ceph-users, ceph-devel, Milosz Tanski

Hi,
I'm hitting this bug again today.

So don't seem to be numa related (I have try to flush linux buffer to be sure).

and tcmalloc is patched (I don't known how to verify that it's ok).

I don't have restarted osd yet.

Maybe some perf trace could be usefulll ?


----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Mercredi 22 Avril 2015 18:30:26
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi, 

>>I feel it is due to tcmalloc issue 

Indeed, I had patched one of my node, but not the other. 
So maybe I have hit this bug. (but I can't confirm, I don't have traces). 

But numa interleaving seem to help in my case (maybe not from 100->300k, but 250k->300k). 

I need to do more long tests to confirm that. 


----- Mail original ----- 
De: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com> 
À: "Mark Nelson" <mnelson@redhat.com>, "aderumier" <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com> 
Envoyé: Mercredi 22 Avril 2015 16:34:33 
Objet: RE: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

I feel it is due to tcmalloc issue 

I have seen similar issue in my setup after 20 days. 

Thanks, 
Srinivas 



-----Original Message----- 
From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Mark Nelson 
Sent: Wednesday, April 22, 2015 7:31 PM 
To: Alexandre DERUMIER; Milosz Tanski 
Cc: ceph-devel; ceph-users 
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

Hi Alexandre, 

We should discuss this at the perf meeting today. We knew NUMA node affinity issues were going to crop up sooner or later (and indeed already have in some cases), but this is pretty major. It's probably time to really dig in and figure out how to deal with this. 

Note: this is one of the reasons I like small nodes with single sockets and fewer OSDs. 

Mark 

On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
> Hi, 
> 
> I have done a lot of test today, and it seem indeed numa related. 
> 
> My numastat was 
> 
> # numastat 
> node0 node1 
> numa_hit 99075422 153976877 
> numa_miss 167490965 1493663 
> numa_foreign 1493663 167491417 
> interleave_hit 157745 167015 
> local_node 99049179 153830554 
> other_node 167517697 1639986 
> 
> So, a lot of miss. 
> 
> In this case , I can reproduce ios going from 85k to 300k iops, up and down. 
> 
> now setting 
> echo 0 > /proc/sys/kernel/numa_balancing 
> 
> and starting osd daemons with 
> 
> numactl --interleave=all /usr/bin/ceph-osd 
> 
> 
> I have a constant 300k iops ! 
> 
> 
> I wonder if it could be improve by binding osd daemons to specific numa node. 
> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning. 
> 
> 
> 
> ----- Mail original ----- 
> De: "Milosz Tanski" <milosz@adfin.com> 
> À: "aderumier" <aderumier@odiso.com> 
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" 
> <ceph-users@lists.ceph.com> 
> Envoyé: Mercredi 22 Avril 2015 12:54:23 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops 
> 
> 
> 
> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote: 
> 
> 
> I wonder if it could be numa related, 
> 
> I'm using centos 7.1, 
> and auto numa balacning is enabled 
> 
> cat /proc/sys/kernel/numa_balancing = 1 
> 
> Maybe osd daemon access to buffer on wrong numa node. 
> 
> I'll try to reproduce the problem 
> 
> 
> 
> Can you force the degenerate case using numactl? To either affirm or deny your suspicion. 
> 
> 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" < aderumier@odiso.com > 
> À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" < 
> ceph-users@lists.ceph.com > 
> Envoyé: Mercredi 22 Avril 2015 10:40:05 
> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
> improve performance from 100k iops to 300k iops 
> 
> Hi, 
> 
> I was doing some benchmarks, 
> I have found an strange behaviour. 
> 
> Using fio with rbd engine, I was able to reach around 100k iops. 
> (osd datas in linux buffer, iostat show 0% disk access) 
> 
> then after restarting all osd daemons, 
> 
> the same fio benchmark show now around 300k iops. 
> (osd datas in linux buffer, iostat show 0% disk access) 
> 
> 
> any ideas? 
> 
> 
> 
> 
> before restarting osd 
> --------------------- 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 ... 
> fio-2.2.7-10-g51e9 
> Starting 10 processes 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
> iops] [eta 14m:45s] 
> fio: terminating on signal 2 
> 
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
> (usec): 
> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
> | 99.99th=[47360] 
> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
> >=64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
> depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
> mint=26215msec, maxt=26215msec 
> 
> Disk stats (read/write): 
> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 
> 
> 
> 
> 
> AFTER RESTARTING OSDS 
> ---------------------- 
> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 ... 
> fio-2.2.7-10-g51e9 
> Starting 10 processes 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
> iops] [eta 01h:09m:27s] 
> fio: terminating on signal 2 
> 
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt= 
> 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 clat 
> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
> (usec): 
> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
> | 99.99th=[436224] 
> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
> target=0, window=0, percentile=100.00%, depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s, 
> mint=7389msec, maxt=7389msec 
> 
> Disk stats (read/write): 
> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
> 
> 
> 
> 
> CEPH LOG 
> -------- 
> 
> before restarting osd 
> ---------------------- 
> 
> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 298 MB/s rd, 76465 op/s 
> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 333 MB/s rd, 85355 op/s 
> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 343 MB/s rd, 87932 op/s 
> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 328 MB/s rd, 84151 op/s 
> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 237 MB/s rd, 60855 op/s 
> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 144 MB/s rd, 36935 op/s 
> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 321 MB/s rd, 82334 op/s 
> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 368 MB/s rd, 94211 op/s 
> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 244 MB/s rd, 62644 op/s 
> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 175 MB/s rd, 44997 op/s 
> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 122 MB/s rd, 31259 op/s 
> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 229 MB/s rd, 58674 op/s 
> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 271 MB/s rd, 69501 op/s 
> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 211 MB/s rd, 54020 op/s 
> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 164 MB/s rd, 42001 op/s 
> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 134 MB/s rd, 34380 op/s 
> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 293 MB/s rd, 75213 op/s 
> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 337 MB/s rd, 86353 op/s 
> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 229 MB/s rd, 58839 op/s 
> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 152 MB/s rd, 39117 op/s 
> 
> 
> restarting osd 
> --------------- 
> 
> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
> [INF] osd.0 marked itself down 
> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
> [INF] osdmap e849: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
> kB/s rd, 130 op/s 
> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
> [INF] osdmap e850: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
> [INF] osd.1 marked itself down 
> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
> [INF] osdmap e851: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
> [INF] osdmap e852: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
> [INF] osd.2 marked itself down 
> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
> [INF] osdmap e853: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
> [INF] osdmap e854: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
> [INF] osd.1 10.7.0.152:6800/14848 boot 
> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
> [INF] osdmap e855: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
> [INF] osd.3 marked itself down 
> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
> [INF] osdmap e856: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
> [INF] osd.6 marked itself down 
> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
> [INF] osd.2 10.7.0.152:6812/15410 boot 
> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
> [INF] osdmap e857: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
> [INF] osdmap e858: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
> [INF] osd.3 10.7.0.152:6816/15971 boot 
> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
> [INF] osdmap e859: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
> [INF] osd.7 marked itself down 
> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
> [INF] osd.0 10.7.0.152:6804/14481 boot 
> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
> [INF] osdmap e860: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
> [INF] osd.6 10.7.0.153:6808/21955 boot 
> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
> [INF] osdmap e861: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
> 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
> [INF] osd.8 marked itself down 
> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
> [INF] osdmap e862: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
> 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
> [INF] osd.7 10.7.0.153:6804/22536 boot 
> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
> [INF] osdmap e863: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
> stale+active+remapped, 503 stale+active+clean, 4 
> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB 
> / 1295 GB avail 
> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
> [INF] osdmap e864: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
> stale+active+remapped, 503 stale+active+clean, 4 
> active+undersized+degraded+remapped, 5 peering, 13 active 
> 
> 
> AFTER OSD RESTART 
> ------------------ 
> 
> 
> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 786 MB/s rd, 196 kop/s 
> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 932 MB/s rd, 233 kop/s 
> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 627 MB/s rd, 156 kop/s 
> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 529 MB/s rd, 132 kop/s 
> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 770 MB/s rd, 192 kop/s 
> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 649 MB/s rd, 162 kop/s 
> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 580 MB/s rd, 145 kop/s 
> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 962 MB/s rd, 240 kop/s 
> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 506 MB/s rd, 126 kop/s 
> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 774 MB/s rd, 193 kop/s 
> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 662 MB/s rd, 165 kop/s 
> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 593 MB/s rd, 148 kop/s 
> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 938 MB/s rd, 234 kop/s 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* 答复: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
  2015-04-23  8:00                       ` Alexandre DERUMIER
@ 2015-04-23  8:04                         ` 管清政
       [not found]                         ` <1806383776.554938002.1429776034371.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  1 sibling, 0 replies; 35+ messages in thread
From: 管清政 @ 2015-04-23  8:04 UTC (permalink / raw)
  To: ceph-devel



-----邮件原件-----
发件人: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] 代表 Alexandre DERUMIER
发送时间: 2015年4月23日 16:01
收件人: Srinivasula Maram
抄送: ceph-users; ceph-devel; Milosz Tanski
主题: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi,
I'm hitting this bug again today.

So don't seem to be numa related (I have try to flush linux buffer to be sure).

and tcmalloc is patched (I don't known how to verify that it's ok).

I don't have restarted osd yet.

Maybe some perf trace could be usefulll ?


----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Mercredi 22 Avril 2015 18:30:26
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi, 

>>I feel it is due to tcmalloc issue 

Indeed, I had patched one of my node, but not the other. 
So maybe I have hit this bug. (but I can't confirm, I don't have traces). 

But numa interleaving seem to help in my case (maybe not from 100->300k, but 250k->300k). 

I need to do more long tests to confirm that. 


----- Mail original ----- 
De: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com> 
À: "Mark Nelson" <mnelson@redhat.com>, "aderumier" <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com> 
Envoyé: Mercredi 22 Avril 2015 16:34:33 
Objet: RE: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

I feel it is due to tcmalloc issue 

I have seen similar issue in my setup after 20 days. 

Thanks, 
Srinivas 



-----Original Message----- 
From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Mark Nelson 
Sent: Wednesday, April 22, 2015 7:31 PM 
To: Alexandre DERUMIER; Milosz Tanski 
Cc: ceph-devel; ceph-users 
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

Hi Alexandre, 

We should discuss this at the perf meeting today. We knew NUMA node affinity issues were going to crop up sooner or later (and indeed already have in some cases), but this is pretty major. It's probably time to really dig in and figure out how to deal with this. 

Note: this is one of the reasons I like small nodes with single sockets and fewer OSDs. 

Mark 

On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
> Hi, 
> 
> I have done a lot of test today, and it seem indeed numa related. 
> 
> My numastat was 
> 
> # numastat 
> node0 node1 
> numa_hit 99075422 153976877 
> numa_miss 167490965 1493663 
> numa_foreign 1493663 167491417 
> interleave_hit 157745 167015 
> local_node 99049179 153830554 
> other_node 167517697 1639986 
> 
> So, a lot of miss. 
> 
> In this case , I can reproduce ios going from 85k to 300k iops, up and down. 
> 
> now setting 
> echo 0 > /proc/sys/kernel/numa_balancing 
> 
> and starting osd daemons with 
> 
> numactl --interleave=all /usr/bin/ceph-osd 
> 
> 
> I have a constant 300k iops ! 
> 
> 
> I wonder if it could be improve by binding osd daemons to specific numa node. 
> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning. 
> 
> 
> 
> ----- Mail original ----- 
> De: "Milosz Tanski" <milosz@adfin.com> 
> À: "aderumier" <aderumier@odiso.com> 
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" 
> <ceph-users@lists.ceph.com> 
> Envoyé: Mercredi 22 Avril 2015 12:54:23 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops 
> 
> 
> 
> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote: 
> 
> 
> I wonder if it could be numa related, 
> 
> I'm using centos 7.1, 
> and auto numa balacning is enabled 
> 
> cat /proc/sys/kernel/numa_balancing = 1 
> 
> Maybe osd daemon access to buffer on wrong numa node. 
> 
> I'll try to reproduce the problem 
> 
> 
> 
> Can you force the degenerate case using numactl? To either affirm or deny your suspicion. 
> 
> 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" < aderumier@odiso.com > 
> À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" < 
> ceph-users@lists.ceph.com > 
> Envoyé: Mercredi 22 Avril 2015 10:40:05 
> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
> improve performance from 100k iops to 300k iops 
> 
> Hi, 
> 
> I was doing some benchmarks, 
> I have found an strange behaviour. 
> 
> Using fio with rbd engine, I was able to reach around 100k iops. 
> (osd datas in linux buffer, iostat show 0% disk access) 
> 
> then after restarting all osd daemons, 
> 
> the same fio benchmark show now around 300k iops. 
> (osd datas in linux buffer, iostat show 0% disk access) 
> 
> 
> any ideas? 
> 
> 
> 
> 
> before restarting osd 
> --------------------- 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 ... 
> fio-2.2.7-10-g51e9 
> Starting 10 processes 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
> iops] [eta 14m:45s] 
> fio: terminating on signal 2 
> 
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
> (usec): 
> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
> | 99.99th=[47360] 
> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
> >=64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
> depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
> mint=26215msec, maxt=26215msec 
> 
> Disk stats (read/write): 
> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 
> 
> 
> 
> 
> AFTER RESTARTING OSDS 
> ---------------------- 
> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 ... 
> fio-2.2.7-10-g51e9 
> Starting 10 processes 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
> iops] [eta 01h:09m:27s] 
> fio: terminating on signal 2 
> 
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt= 
> 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 clat 
> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
> (usec): 
> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
> | 99.99th=[436224] 
> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
> target=0, window=0, percentile=100.00%, depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s, 
> mint=7389msec, maxt=7389msec 
> 
> Disk stats (read/write): 
> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
> 
> 
> 
> 
> CEPH LOG 
> -------- 
> 
> before restarting osd 
> ---------------------- 
> 
> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 298 MB/s rd, 76465 op/s 
> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 333 MB/s rd, 85355 op/s 
> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 343 MB/s rd, 87932 op/s 
> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 328 MB/s rd, 84151 op/s 
> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 237 MB/s rd, 60855 op/s 
> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 144 MB/s rd, 36935 op/s 
> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 321 MB/s rd, 82334 op/s 
> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 368 MB/s rd, 94211 op/s 
> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 244 MB/s rd, 62644 op/s 
> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 175 MB/s rd, 44997 op/s 
> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 122 MB/s rd, 31259 op/s 
> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 229 MB/s rd, 58674 op/s 
> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 271 MB/s rd, 69501 op/s 
> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 211 MB/s rd, 54020 op/s 
> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 164 MB/s rd, 42001 op/s 
> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 134 MB/s rd, 34380 op/s 
> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 293 MB/s rd, 75213 op/s 
> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 337 MB/s rd, 86353 op/s 
> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 229 MB/s rd, 58839 op/s 
> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 152 MB/s rd, 39117 op/s 
> 
> 
> restarting osd 
> --------------- 
> 
> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
> [INF] osd.0 marked itself down 
> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
> [INF] osdmap e849: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
> kB/s rd, 130 op/s 
> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
> [INF] osdmap e850: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
> [INF] osd.1 marked itself down 
> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
> [INF] osdmap e851: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
> [INF] osdmap e852: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
> [INF] osd.2 marked itself down 
> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
> [INF] osdmap e853: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
> [INF] osdmap e854: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
> [INF] osd.1 10.7.0.152:6800/14848 boot 
> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
> [INF] osdmap e855: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
> [INF] osd.3 marked itself down 
> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
> [INF] osdmap e856: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
> [INF] osd.6 marked itself down 
> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
> [INF] osd.2 10.7.0.152:6812/15410 boot 
> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
> [INF] osdmap e857: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
> [INF] osdmap e858: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
> [INF] osd.3 10.7.0.152:6816/15971 boot 
> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
> [INF] osdmap e859: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
> [INF] osd.7 marked itself down 
> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
> [INF] osd.0 10.7.0.152:6804/14481 boot 
> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
> [INF] osdmap e860: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
> [INF] osd.6 10.7.0.153:6808/21955 boot 
> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
> [INF] osdmap e861: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
> 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
> [INF] osd.8 marked itself down 
> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
> [INF] osdmap e862: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
> 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
> [INF] osd.7 10.7.0.153:6804/22536 boot 
> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
> [INF] osdmap e863: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
> stale+active+remapped, 503 stale+active+clean, 4 
> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB 
> / 1295 GB avail 
> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
> [INF] osdmap e864: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
> stale+active+remapped, 503 stale+active+clean, 4 
> active+undersized+degraded+remapped, 5 peering, 13 active 
> 
> 
> AFTER OSD RESTART 
> ------------------ 
> 
> 
> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 786 MB/s rd, 196 kop/s 
> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 932 MB/s rd, 233 kop/s 
> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 627 MB/s rd, 156 kop/s 
> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 529 MB/s rd, 132 kop/s 
> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 770 MB/s rd, 192 kop/s 
> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 649 MB/s rd, 162 kop/s 
> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 580 MB/s rd, 145 kop/s 
> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 962 MB/s rd, 240 kop/s 
> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 506 MB/s rd, 126 kop/s 
> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 774 MB/s rd, 193 kop/s 
> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 662 MB/s rd, 165 kop/s 
> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 593 MB/s rd, 148 kop/s 
> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 938 MB/s rd, 234 kop/s 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




********************************************************************************************************************************
The information in this email is confidential and may be legally privileged. If you have received this email in error or are not the intended recipient, please immediately notify the sender and delete this message from your computer. Any use, distribution, or copying of this email other than by the intended recipient is strictly prohibited. All messages sent to and from us may be monitored to ensure compliance with internal policies and to protect our business. 
Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks. 

收发邮件者请注意:
本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。 
********************************************************************************************************************************

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                         ` <1806383776.554938002.1429776034371.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2015-04-23 10:58                           ` Alexandre DERUMIER
  2015-04-23 11:33                             ` [ceph-users] " Mark Nelson
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-23 10:58 UTC (permalink / raw)
  To: Srinivasula Maram; +Cc: ceph-users, ceph-devel, Milosz Tanski

Maybe it's tcmalloc related
I thinked to have patched it correctly, but perf show a lot of tcmalloc::ThreadCache::ReleaseToCentralCache

before osd restart (100k)
------------------
  11.66%        ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::ThreadCache::ReleaseToCentralCache
   8.51%        ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::CentralFreeList::FetchFromSpans
   3.04%        ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::CentralFreeList::ReleaseToSpans
   2.04%        ceph-osd  libtcmalloc.so.4.1.2  [.] operator new
   1.63%         swapper  [kernel.kallsyms]     [k] intel_idle
   1.35%        ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::CentralFreeList::ReleaseListToSpans
   1.33%        ceph-osd  libtcmalloc.so.4.1.2  [.] operator delete
   1.07%        ceph-osd  libstdc++.so.6.0.19   [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string
   0.91%        ceph-osd  libpthread-2.17.so    [.] pthread_mutex_trylock
   0.88%        ceph-osd  libc-2.17.so          [.] __memcpy_ssse3_back
   0.81%        ceph-osd  ceph-osd              [.] Mutex::Lock
   0.79%        ceph-osd  [kernel.kallsyms]     [k] copy_user_enhanced_fast_string
   0.74%        ceph-osd  libpthread-2.17.so    [.] pthread_mutex_unlock
   0.67%        ceph-osd  [kernel.kallsyms]     [k] _raw_spin_lock
   0.63%         swapper  [kernel.kallsyms]     [k] native_write_msr_safe
   0.62%        ceph-osd  [kernel.kallsyms]     [k] avc_has_perm_noaudit
   0.58%        ceph-osd  ceph-osd              [.] operator<
   0.57%        ceph-osd  [kernel.kallsyms]     [k] __schedule
   0.57%        ceph-osd  [kernel.kallsyms]     [k] __d_lookup_rcu
   0.54%         swapper  [kernel.kallsyms]     [k] __schedule


after osd restart (300k iops)
------------------------------
   3.47%      ceph-osd  libtcmalloc.so.4.1.2  [.] operator new
   1.92%      ceph-osd  libtcmalloc.so.4.1.2  [.] operator delete
   1.86%       swapper  [kernel.kallsyms]     [k] intel_idle
   1.52%      ceph-osd  libstdc++.so.6.0.19   [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string
   1.34%      ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::ThreadCache::ReleaseToCentralCache
   1.24%      ceph-osd  libc-2.17.so          [.] __memcpy_ssse3_back
   1.23%      ceph-osd  ceph-osd              [.] Mutex::Lock
   1.21%      ceph-osd  libpthread-2.17.so    [.] pthread_mutex_trylock
   1.11%      ceph-osd  [kernel.kallsyms]     [k] copy_user_enhanced_fast_string
   0.95%      ceph-osd  libpthread-2.17.so    [.] pthread_mutex_unlock
   0.94%      ceph-osd  [kernel.kallsyms]     [k] _raw_spin_lock
   0.78%      ceph-osd  [kernel.kallsyms]     [k] __d_lookup_rcu
   0.70%      ceph-osd  [kernel.kallsyms]     [k] tcp_sendmsg
   0.70%      ceph-osd  ceph-osd              [.] Message::Message
   0.68%      ceph-osd  [kernel.kallsyms]     [k] __schedule
   0.66%      ceph-osd  [kernel.kallsyms]     [k] idle_cpu
   0.65%      ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::CentralFreeList::FetchFromSpans
   0.64%       swapper  [kernel.kallsyms]     [k] native_write_msr_safe
   0.61%      ceph-osd  ceph-osd              [.] std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
   0.60%       swapper  [kernel.kallsyms]     [k] __schedule
   0.60%      ceph-osd  libstdc++.so.6.0.19   [.] 0x00000000000bdd2b
   0.57%      ceph-osd  ceph-osd              [.] operator<
   0.57%      ceph-osd  ceph-osd              [.] crc32_iscsi_00
   0.56%      ceph-osd  libstdc++.so.6.0.19   [.] std::string::_Rep::_M_dispose
   0.55%      ceph-osd  [kernel.kallsyms]     [k] __switch_to
   0.54%      ceph-osd  libc-2.17.so          [.] vfprintf
   0.52%      ceph-osd  [kernel.kallsyms]     [k] fget_light

----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Jeudi 23 Avril 2015 10:00:34
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi, 
I'm hitting this bug again today. 

So don't seem to be numa related (I have try to flush linux buffer to be sure). 

and tcmalloc is patched (I don't known how to verify that it's ok). 

I don't have restarted osd yet. 

Maybe some perf trace could be usefulll ? 


----- Mail original ----- 
De: "aderumier" <aderumier@odiso.com> 
À: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com> 
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com> 
Envoyé: Mercredi 22 Avril 2015 18:30:26 
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

Hi, 

>>I feel it is due to tcmalloc issue 

Indeed, I had patched one of my node, but not the other. 
So maybe I have hit this bug. (but I can't confirm, I don't have traces). 

But numa interleaving seem to help in my case (maybe not from 100->300k, but 250k->300k). 

I need to do more long tests to confirm that. 


----- Mail original ----- 
De: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com> 
À: "Mark Nelson" <mnelson@redhat.com>, "aderumier" <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com> 
Envoyé: Mercredi 22 Avril 2015 16:34:33 
Objet: RE: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

I feel it is due to tcmalloc issue 

I have seen similar issue in my setup after 20 days. 

Thanks, 
Srinivas 



-----Original Message----- 
From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Mark Nelson 
Sent: Wednesday, April 22, 2015 7:31 PM 
To: Alexandre DERUMIER; Milosz Tanski 
Cc: ceph-devel; ceph-users 
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

Hi Alexandre, 

We should discuss this at the perf meeting today. We knew NUMA node affinity issues were going to crop up sooner or later (and indeed already have in some cases), but this is pretty major. It's probably time to really dig in and figure out how to deal with this. 

Note: this is one of the reasons I like small nodes with single sockets and fewer OSDs. 

Mark 

On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
> Hi, 
> 
> I have done a lot of test today, and it seem indeed numa related. 
> 
> My numastat was 
> 
> # numastat 
> node0 node1 
> numa_hit 99075422 153976877 
> numa_miss 167490965 1493663 
> numa_foreign 1493663 167491417 
> interleave_hit 157745 167015 
> local_node 99049179 153830554 
> other_node 167517697 1639986 
> 
> So, a lot of miss. 
> 
> In this case , I can reproduce ios going from 85k to 300k iops, up and down. 
> 
> now setting 
> echo 0 > /proc/sys/kernel/numa_balancing 
> 
> and starting osd daemons with 
> 
> numactl --interleave=all /usr/bin/ceph-osd 
> 
> 
> I have a constant 300k iops ! 
> 
> 
> I wonder if it could be improve by binding osd daemons to specific numa node. 
> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning. 
> 
> 
> 
> ----- Mail original ----- 
> De: "Milosz Tanski" <milosz@adfin.com> 
> À: "aderumier" <aderumier@odiso.com> 
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" 
> <ceph-users@lists.ceph.com> 
> Envoyé: Mercredi 22 Avril 2015 12:54:23 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops 
> 
> 
> 
> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote: 
> 
> 
> I wonder if it could be numa related, 
> 
> I'm using centos 7.1, 
> and auto numa balacning is enabled 
> 
> cat /proc/sys/kernel/numa_balancing = 1 
> 
> Maybe osd daemon access to buffer on wrong numa node. 
> 
> I'll try to reproduce the problem 
> 
> 
> 
> Can you force the degenerate case using numactl? To either affirm or deny your suspicion. 
> 
> 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" < aderumier@odiso.com > 
> À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" < 
> ceph-users@lists.ceph.com > 
> Envoyé: Mercredi 22 Avril 2015 10:40:05 
> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
> improve performance from 100k iops to 300k iops 
> 
> Hi, 
> 
> I was doing some benchmarks, 
> I have found an strange behaviour. 
> 
> Using fio with rbd engine, I was able to reach around 100k iops. 
> (osd datas in linux buffer, iostat show 0% disk access) 
> 
> then after restarting all osd daemons, 
> 
> the same fio benchmark show now around 300k iops. 
> (osd datas in linux buffer, iostat show 0% disk access) 
> 
> 
> any ideas? 
> 
> 
> 
> 
> before restarting osd 
> --------------------- 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 ... 
> fio-2.2.7-10-g51e9 
> Starting 10 processes 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
> iops] [eta 14m:45s] 
> fio: terminating on signal 2 
> 
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
> (usec): 
> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
> | 99.99th=[47360] 
> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
> >=64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
> depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
> mint=26215msec, maxt=26215msec 
> 
> Disk stats (read/write): 
> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 
> 
> 
> 
> 
> AFTER RESTARTING OSDS 
> ---------------------- 
> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> ioengine=rbd, iodepth=32 ... 
> fio-2.2.7-10-g51e9 
> Starting 10 processes 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> rbd engine: RBD version: 0.1.9 
> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
> iops] [eta 01h:09m:27s] 
> fio: terminating on signal 2 
> 
> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt= 
> 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 clat 
> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
> (usec): 
> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
> | 99.99th=[436224] 
> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
> target=0, window=0, percentile=100.00%, depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s, 
> mint=7389msec, maxt=7389msec 
> 
> Disk stats (read/write): 
> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
> 
> 
> 
> 
> CEPH LOG 
> -------- 
> 
> before restarting osd 
> ---------------------- 
> 
> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 298 MB/s rd, 76465 op/s 
> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 333 MB/s rd, 85355 op/s 
> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 343 MB/s rd, 87932 op/s 
> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 328 MB/s rd, 84151 op/s 
> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 237 MB/s rd, 60855 op/s 
> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 144 MB/s rd, 36935 op/s 
> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 321 MB/s rd, 82334 op/s 
> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 368 MB/s rd, 94211 op/s 
> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 244 MB/s rd, 62644 op/s 
> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 175 MB/s rd, 44997 op/s 
> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 122 MB/s rd, 31259 op/s 
> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 229 MB/s rd, 58674 op/s 
> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 271 MB/s rd, 69501 op/s 
> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 211 MB/s rd, 54020 op/s 
> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 164 MB/s rd, 42001 op/s 
> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 134 MB/s rd, 34380 op/s 
> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 293 MB/s rd, 75213 op/s 
> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 337 MB/s rd, 86353 op/s 
> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 229 MB/s rd, 58839 op/s 
> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> 1295 GB avail; 152 MB/s rd, 39117 op/s 
> 
> 
> restarting osd 
> --------------- 
> 
> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
> [INF] osd.0 marked itself down 
> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
> [INF] osdmap e849: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
> kB/s rd, 130 op/s 
> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
> [INF] osdmap e850: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
> [INF] osd.1 marked itself down 
> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
> [INF] osdmap e851: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
> [INF] osdmap e852: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
> [INF] osd.2 marked itself down 
> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
> [INF] osdmap e853: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
> [INF] osdmap e854: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
> [INF] osd.1 10.7.0.152:6800/14848 boot 
> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
> [INF] osdmap e855: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
> [INF] osd.3 marked itself down 
> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
> [INF] osdmap e856: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
> [INF] osd.6 marked itself down 
> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
> [INF] osd.2 10.7.0.152:6812/15410 boot 
> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
> [INF] osdmap e857: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
> [INF] osdmap e858: 9 osds: 6 up, 9 in 
> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
> [INF] osd.3 10.7.0.152:6816/15971 boot 
> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
> [INF] osdmap e859: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
> [INF] osd.7 marked itself down 
> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
> [INF] osd.0 10.7.0.152:6804/14481 boot 
> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
> [INF] osdmap e860: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292 
> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
> [INF] osd.6 10.7.0.153:6808/21955 boot 
> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
> [INF] osdmap e861: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
> 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
> [INF] osd.8 marked itself down 
> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
> [INF] osdmap e862: 9 osds: 7 up, 9 in 
> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
> 420 GB used, 874 GB / 1295 GB avail 
> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
> [INF] osd.7 10.7.0.153:6804/22536 boot 
> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
> [INF] osdmap e863: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
> stale+active+remapped, 503 stale+active+clean, 4 
> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB 
> / 1295 GB avail 
> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
> [INF] osdmap e864: 9 osds: 8 up, 9 in 
> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
> stale+active+remapped, 503 stale+active+clean, 4 
> active+undersized+degraded+remapped, 5 peering, 13 active 
> 
> 
> AFTER OSD RESTART 
> ------------------ 
> 
> 
> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 786 MB/s rd, 196 kop/s 
> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 932 MB/s rd, 233 kop/s 
> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 627 MB/s rd, 156 kop/s 
> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 529 MB/s rd, 132 kop/s 
> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 770 MB/s rd, 192 kop/s 
> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 649 MB/s rd, 162 kop/s 
> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 580 MB/s rd, 145 kop/s 
> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 962 MB/s rd, 240 kop/s 
> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 506 MB/s rd, 126 kop/s 
> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 774 MB/s rd, 193 kop/s 
> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 662 MB/s rd, 165 kop/s 
> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 593 MB/s rd, 148 kop/s 
> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> 1295 GB avail; 938 MB/s rd, 234 kop/s 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
  2015-04-23 10:58                           ` Alexandre DERUMIER
@ 2015-04-23 11:33                             ` Mark Nelson
  2015-04-23 11:56                               ` Alexandre DERUMIER
  0 siblings, 1 reply; 35+ messages in thread
From: Mark Nelson @ 2015-04-23 11:33 UTC (permalink / raw)
  To: Alexandre DERUMIER, Srinivasula Maram
  Cc: ceph-users, ceph-devel, Milosz Tanski

Thanks for the testing Alexandre!

If you have the means to compile the same version of ceph with jemalloc, 
I would be very interested to see how it does.

In some ways I'm glad it turned out not to be NUMA.  I still suspect we 
will have to deal with it at some point, but perhaps not today. ;)

Mark

On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote:
> Maybe it's tcmalloc related
> I thinked to have patched it correctly, but perf show a lot of tcmalloc::ThreadCache::ReleaseToCentralCache
>
> before osd restart (100k)
> ------------------
>    11.66%        ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::ThreadCache::ReleaseToCentralCache
>     8.51%        ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::CentralFreeList::FetchFromSpans
>     3.04%        ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::CentralFreeList::ReleaseToSpans
>     2.04%        ceph-osd  libtcmalloc.so.4.1.2  [.] operator new
>     1.63%         swapper  [kernel.kallsyms]     [k] intel_idle
>     1.35%        ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::CentralFreeList::ReleaseListToSpans
>     1.33%        ceph-osd  libtcmalloc.so.4.1.2  [.] operator delete
>     1.07%        ceph-osd  libstdc++.so.6.0.19   [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string
>     0.91%        ceph-osd  libpthread-2.17.so    [.] pthread_mutex_trylock
>     0.88%        ceph-osd  libc-2.17.so          [.] __memcpy_ssse3_back
>     0.81%        ceph-osd  ceph-osd              [.] Mutex::Lock
>     0.79%        ceph-osd  [kernel.kallsyms]     [k] copy_user_enhanced_fast_string
>     0.74%        ceph-osd  libpthread-2.17.so    [.] pthread_mutex_unlock
>     0.67%        ceph-osd  [kernel.kallsyms]     [k] _raw_spin_lock
>     0.63%         swapper  [kernel.kallsyms]     [k] native_write_msr_safe
>     0.62%        ceph-osd  [kernel.kallsyms]     [k] avc_has_perm_noaudit
>     0.58%        ceph-osd  ceph-osd              [.] operator<
>     0.57%        ceph-osd  [kernel.kallsyms]     [k] __schedule
>     0.57%        ceph-osd  [kernel.kallsyms]     [k] __d_lookup_rcu
>     0.54%         swapper  [kernel.kallsyms]     [k] __schedule
>
>
> after osd restart (300k iops)
> ------------------------------
>     3.47%      ceph-osd  libtcmalloc.so.4.1.2  [.] operator new
>     1.92%      ceph-osd  libtcmalloc.so.4.1.2  [.] operator delete
>     1.86%       swapper  [kernel.kallsyms]     [k] intel_idle
>     1.52%      ceph-osd  libstdc++.so.6.0.19   [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string
>     1.34%      ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::ThreadCache::ReleaseToCentralCache
>     1.24%      ceph-osd  libc-2.17.so          [.] __memcpy_ssse3_back
>     1.23%      ceph-osd  ceph-osd              [.] Mutex::Lock
>     1.21%      ceph-osd  libpthread-2.17.so    [.] pthread_mutex_trylock
>     1.11%      ceph-osd  [kernel.kallsyms]     [k] copy_user_enhanced_fast_string
>     0.95%      ceph-osd  libpthread-2.17.so    [.] pthread_mutex_unlock
>     0.94%      ceph-osd  [kernel.kallsyms]     [k] _raw_spin_lock
>     0.78%      ceph-osd  [kernel.kallsyms]     [k] __d_lookup_rcu
>     0.70%      ceph-osd  [kernel.kallsyms]     [k] tcp_sendmsg
>     0.70%      ceph-osd  ceph-osd              [.] Message::Message
>     0.68%      ceph-osd  [kernel.kallsyms]     [k] __schedule
>     0.66%      ceph-osd  [kernel.kallsyms]     [k] idle_cpu
>     0.65%      ceph-osd  libtcmalloc.so.4.1.2  [.] tcmalloc::CentralFreeList::FetchFromSpans
>     0.64%       swapper  [kernel.kallsyms]     [k] native_write_msr_safe
>     0.61%      ceph-osd  ceph-osd              [.] std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
>     0.60%       swapper  [kernel.kallsyms]     [k] __schedule
>     0.60%      ceph-osd  libstdc++.so.6.0.19   [.] 0x00000000000bdd2b
>     0.57%      ceph-osd  ceph-osd              [.] operator<
>     0.57%      ceph-osd  ceph-osd              [.] crc32_iscsi_00
>     0.56%      ceph-osd  libstdc++.so.6.0.19   [.] std::string::_Rep::_M_dispose
>     0.55%      ceph-osd  [kernel.kallsyms]     [k] __switch_to
>     0.54%      ceph-osd  libc-2.17.so          [.] vfprintf
>     0.52%      ceph-osd  [kernel.kallsyms]     [k] fget_light
>
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Jeudi 23 Avril 2015 10:00:34
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> Hi,
> I'm hitting this bug again today.
>
> So don't seem to be numa related (I have try to flush linux buffer to be sure).
>
> and tcmalloc is patched (I don't known how to verify that it's ok).
>
> I don't have restarted osd yet.
>
> Maybe some perf trace could be usefulll ?
>
>
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Mercredi 22 Avril 2015 18:30:26
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> Hi,
>
>>> I feel it is due to tcmalloc issue
>
> Indeed, I had patched one of my node, but not the other.
> So maybe I have hit this bug. (but I can't confirm, I don't have traces).
>
> But numa interleaving seem to help in my case (maybe not from 100->300k, but 250k->300k).
>
> I need to do more long tests to confirm that.
>
>
> ----- Mail original -----
> De: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
> À: "Mark Nelson" <mnelson@redhat.com>, "aderumier" <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com>
> Envoyé: Mercredi 22 Avril 2015 16:34:33
> Objet: RE: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> I feel it is due to tcmalloc issue
>
> I have seen similar issue in my setup after 20 days.
>
> Thanks,
> Srinivas
>
>
>
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Mark Nelson
> Sent: Wednesday, April 22, 2015 7:31 PM
> To: Alexandre DERUMIER; Milosz Tanski
> Cc: ceph-devel; ceph-users
> Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> Hi Alexandre,
>
> We should discuss this at the perf meeting today. We knew NUMA node affinity issues were going to crop up sooner or later (and indeed already have in some cases), but this is pretty major. It's probably time to really dig in and figure out how to deal with this.
>
> Note: this is one of the reasons I like small nodes with single sockets and fewer OSDs.
>
> Mark
>
> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
>> Hi,
>>
>> I have done a lot of test today, and it seem indeed numa related.
>>
>> My numastat was
>>
>> # numastat
>> node0 node1
>> numa_hit 99075422 153976877
>> numa_miss 167490965 1493663
>> numa_foreign 1493663 167491417
>> interleave_hit 157745 167015
>> local_node 99049179 153830554
>> other_node 167517697 1639986
>>
>> So, a lot of miss.
>>
>> In this case , I can reproduce ios going from 85k to 300k iops, up and down.
>>
>> now setting
>> echo 0 > /proc/sys/kernel/numa_balancing
>>
>> and starting osd daemons with
>>
>> numactl --interleave=all /usr/bin/ceph-osd
>>
>>
>> I have a constant 300k iops !
>>
>>
>> I wonder if it could be improve by binding osd daemons to specific numa node.
>> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning.
>>
>>
>>
>> ----- Mail original -----
>> De: "Milosz Tanski" <milosz@adfin.com>
>> À: "aderumier" <aderumier@odiso.com>
>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users"
>> <ceph-users@lists.ceph.com>
>> Envoyé: Mercredi 22 Avril 2015 12:54:23
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
>>
>>
>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote:
>>
>>
>> I wonder if it could be numa related,
>>
>> I'm using centos 7.1,
>> and auto numa balacning is enabled
>>
>> cat /proc/sys/kernel/numa_balancing = 1
>>
>> Maybe osd daemon access to buffer on wrong numa node.
>>
>> I'll try to reproduce the problem
>>
>>
>>
>> Can you force the degenerate case using numactl? To either affirm or deny your suspicion.
>>
>>
>>
>>
>> ----- Mail original -----
>> De: "aderumier" < aderumier@odiso.com >
>> À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" <
>> ceph-users@lists.ceph.com >
>> Envoyé: Mercredi 22 Avril 2015 10:40:05
>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon
>> improve performance from 100k iops to 300k iops
>>
>> Hi,
>>
>> I was doing some benchmarks,
>> I have found an strange behaviour.
>>
>> Using fio with rbd engine, I was able to reach around 100k iops.
>> (osd datas in linux buffer, iostat show 0% disk access)
>>
>> then after restarting all osd daemons,
>>
>> the same fio benchmark show now around 300k iops.
>> (osd datas in linux buffer, iostat show 0% disk access)
>>
>>
>> any ideas?
>>
>>
>>
>>
>> before restarting osd
>> ---------------------
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>> ioengine=rbd, iodepth=32 ...
>> fio-2.2.7-10-g51e9
>> Starting 10 processes
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0
>> iops] [eta 14m:45s]
>> fio: terminating on signal 2
>>
>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr
>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt=
>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat
>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
>> (usec):
>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
>> | 99.99th=[47360]
>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01%
>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%,
>> depth=32
>>
>> Run status group 0 (all jobs):
>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s,
>> mint=26215msec, maxt=26215msec
>>
>> Disk stats (read/write):
>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>> ioengine=rbd, iodepth=32
>>
>>
>>
>>
>> AFTER RESTARTING OSDS
>> ----------------------
>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>> ioengine=rbd, iodepth=32 ...
>> fio-2.2.7-10-g51e9
>> Starting 10 processes
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0
>> iops] [eta 01h:09m:27s]
>> fio: terminating on signal 2
>>
>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr
>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt=
>> 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 clat
>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
>> (usec):
>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
>> | 99.99th=[436224]
>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec)
>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
>> target=0, window=0, percentile=100.00%, depth=32
>>
>> Run status group 0 (all jobs):
>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s,
>> mint=7389msec, maxt=7389msec
>>
>> Disk stats (read/write):
>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>>
>>
>>
>>
>> CEPH LOG
>> --------
>>
>> before restarting osd
>> ----------------------
>>
>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 298 MB/s rd, 76465 op/s
>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 333 MB/s rd, 85355 op/s
>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 343 MB/s rd, 87932 op/s
>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 328 MB/s rd, 84151 op/s
>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 237 MB/s rd, 60855 op/s
>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 144 MB/s rd, 36935 op/s
>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 321 MB/s rd, 82334 op/s
>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 368 MB/s rd, 94211 op/s
>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 244 MB/s rd, 62644 op/s
>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 175 MB/s rd, 44997 op/s
>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 122 MB/s rd, 31259 op/s
>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 229 MB/s rd, 58674 op/s
>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 271 MB/s rd, 69501 op/s
>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 211 MB/s rd, 54020 op/s
>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 164 MB/s rd, 42001 op/s
>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 134 MB/s rd, 34380 op/s
>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 293 MB/s rd, 75213 op/s
>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 337 MB/s rd, 86353 op/s
>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 229 MB/s rd, 58839 op/s
>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 152 MB/s rd, 39117 op/s
>>
>>
>> restarting osd
>> ---------------
>>
>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
>> [INF] osd.0 marked itself down
>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
>> [INF] osdmap e849: 9 osds: 8 up, 9 in
>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516
>> kB/s rd, 130 op/s
>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
>> [INF] osdmap e850: 9 osds: 8 up, 9 in
>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
>> [INF] osd.1 marked itself down
>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
>> [INF] osdmap e851: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
>> [INF] osdmap e852: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
>> [INF] osd.2 marked itself down
>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
>> [INF] osdmap e853: 9 osds: 6 up, 9 in
>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
>> [INF] osdmap e854: 9 osds: 6 up, 9 in
>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
>> [INF] osd.1 10.7.0.152:6800/14848 boot
>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
>> [INF] osdmap e855: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
>> [INF] osd.3 marked itself down
>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
>> [INF] osdmap e856: 9 osds: 6 up, 9 in
>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
>> [INF] osd.6 marked itself down
>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
>> [INF] osd.2 10.7.0.152:6812/15410 boot
>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
>> [INF] osdmap e857: 9 osds: 6 up, 9 in
>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
>> [INF] osdmap e858: 9 osds: 6 up, 9 in
>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
>> [INF] osd.3 10.7.0.152:6816/15971 boot
>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
>> [INF] osdmap e859: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
>> [INF] osd.7 marked itself down
>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
>> [INF] osd.0 10.7.0.152:6804/14481 boot
>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
>> [INF] osdmap e860: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
>> [INF] osd.6 10.7.0.153:6808/21955 boot
>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
>> [INF] osdmap e861: 9 osds: 8 up, 9 in
>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data,
>> 420 GB used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
>> [INF] osd.8 marked itself down
>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
>> [INF] osdmap e862: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data,
>> 420 GB used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
>> [INF] osd.7 10.7.0.153:6804/22536 boot
>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
>> [INF] osdmap e863: 9 osds: 8 up, 9 in
>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
>> stale+active+remapped, 503 stale+active+clean, 4
>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped,
>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB
>> / 1295 GB avail
>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
>> [INF] osdmap e864: 9 osds: 8 up, 9 in
>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
>> stale+active+remapped, 503 stale+active+clean, 4
>> active+undersized+degraded+remapped, 5 peering, 13 active
>>
>>
>> AFTER OSD RESTART
>> ------------------
>>
>>
>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 786 MB/s rd, 196 kop/s
>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 1578 MB/s rd, 394 kop/s
>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 932 MB/s rd, 233 kop/s
>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 627 MB/s rd, 156 kop/s
>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 1034 MB/s rd, 258 kop/s
>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 529 MB/s rd, 132 kop/s
>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 770 MB/s rd, 192 kop/s
>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 1358 MB/s rd, 339 kop/s
>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 649 MB/s rd, 162 kop/s
>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 580 MB/s rd, 145 kop/s
>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 962 MB/s rd, 240 kop/s
>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 506 MB/s rd, 126 kop/s
>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 774 MB/s rd, 193 kop/s
>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 1363 MB/s rd, 340 kop/s
>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 662 MB/s rd, 165 kop/s
>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 593 MB/s rd, 148 kop/s
>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 938 MB/s rd, 234 kop/s
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
  2015-04-23 11:33                             ` [ceph-users] " Mark Nelson
@ 2015-04-23 11:56                               ` Alexandre DERUMIER
  2015-04-23 16:24                                 ` Somnath Roy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-23 11:56 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Srinivasula Maram, ceph-users, ceph-devel, Milosz Tanski

>>If you have the means to compile the same version of ceph with jemalloc, 
>>I would be very interested to see how it does.

Yes, sure. (I have around 3-4 weeks to do all the benchs)

But I don't know how to do it ? 
I'm running the cluster on centos7.1, maybe it can be easy to patch the srpms to rebuild the package with jemalloc.



----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "aderumier" <aderumier@odiso.com>, "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Jeudi 23 Avril 2015 13:33:00
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Thanks for the testing Alexandre! 

If you have the means to compile the same version of ceph with jemalloc, 
I would be very interested to see how it does. 

In some ways I'm glad it turned out not to be NUMA. I still suspect we 
will have to deal with it at some point, but perhaps not today. ;) 

Mark 

On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: 
> Maybe it's tcmalloc related 
> I thinked to have patched it correctly, but perf show a lot of tcmalloc::ThreadCache::ReleaseToCentralCache 
> 
> before osd restart (100k) 
> ------------------ 
> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 
> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::FetchFromSpans 
> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::ReleaseToSpans 
> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 
> 1.63% swapper [kernel.kallsyms] [k] intel_idle 
> 1.35% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::ReleaseListToSpans 
> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 
> 1.07% ceph-osd libstdc++.so.6.0.19 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string 
> 0.91% ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 
> 0.88% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 
> 0.81% ceph-osd ceph-osd [.] Mutex::Lock 
> 0.79% ceph-osd [kernel.kallsyms] [k] copy_user_enhanced_fast_string 
> 0.74% ceph-osd libpthread-2.17.so [.] pthread_mutex_unlock 
> 0.67% ceph-osd [kernel.kallsyms] [k] _raw_spin_lock 
> 0.63% swapper [kernel.kallsyms] [k] native_write_msr_safe 
> 0.62% ceph-osd [kernel.kallsyms] [k] avc_has_perm_noaudit 
> 0.58% ceph-osd ceph-osd [.] operator< 
> 0.57% ceph-osd [kernel.kallsyms] [k] __schedule 
> 0.57% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
> 0.54% swapper [kernel.kallsyms] [k] __schedule 
> 
> 
> after osd restart (300k iops) 
> ------------------------------ 
> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 
> 1.92% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 
> 1.86% swapper [kernel.kallsyms] [k] intel_idle 
> 1.52% ceph-osd libstdc++.so.6.0.19 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string 
> 1.34% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 
> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 
> 1.23% ceph-osd ceph-osd [.] Mutex::Lock 
> 1.21% ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 
> 1.11% ceph-osd [kernel.kallsyms] [k] copy_user_enhanced_fast_string 
> 0.95% ceph-osd libpthread-2.17.so [.] pthread_mutex_unlock 
> 0.94% ceph-osd [kernel.kallsyms] [k] _raw_spin_lock 
> 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 
> 0.70% ceph-osd ceph-osd [.] Message::Message 
> 0.68% ceph-osd [kernel.kallsyms] [k] __schedule 
> 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 
> 0.65% ceph-osd libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::FetchFromSpans 
> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 
> 0.61% ceph-osd ceph-osd [.] std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 
> 0.60% swapper [kernel.kallsyms] [k] __schedule 
> 0.60% ceph-osd libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 
> 0.57% ceph-osd ceph-osd [.] operator< 
> 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 
> 0.56% ceph-osd libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 
> 0.55% ceph-osd [kernel.kallsyms] [k] __switch_to 
> 0.54% ceph-osd libc-2.17.so [.] vfprintf 
> 0.52% ceph-osd [kernel.kallsyms] [k] fget_light 
> 
> ----- Mail original ----- 
> De: "aderumier" <aderumier@odiso.com> 
> À: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com> 
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com> 
> Envoyé: Jeudi 23 Avril 2015 10:00:34 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> Hi, 
> I'm hitting this bug again today. 
> 
> So don't seem to be numa related (I have try to flush linux buffer to be sure). 
> 
> and tcmalloc is patched (I don't known how to verify that it's ok). 
> 
> I don't have restarted osd yet. 
> 
> Maybe some perf trace could be usefulll ? 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" <aderumier@odiso.com> 
> À: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com> 
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com> 
> Envoyé: Mercredi 22 Avril 2015 18:30:26 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> Hi, 
> 
>>> I feel it is due to tcmalloc issue 
> 
> Indeed, I had patched one of my node, but not the other. 
> So maybe I have hit this bug. (but I can't confirm, I don't have traces). 
> 
> But numa interleaving seem to help in my case (maybe not from 100->300k, but 250k->300k). 
> 
> I need to do more long tests to confirm that. 
> 
> 
> ----- Mail original ----- 
> De: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com> 
> À: "Mark Nelson" <mnelson@redhat.com>, "aderumier" <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com> 
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com> 
> Envoyé: Mercredi 22 Avril 2015 16:34:33 
> Objet: RE: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> I feel it is due to tcmalloc issue 
> 
> I have seen similar issue in my setup after 20 days. 
> 
> Thanks, 
> Srinivas 
> 
> 
> 
> -----Original Message----- 
> From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Mark Nelson 
> Sent: Wednesday, April 22, 2015 7:31 PM 
> To: Alexandre DERUMIER; Milosz Tanski 
> Cc: ceph-devel; ceph-users 
> Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> Hi Alexandre, 
> 
> We should discuss this at the perf meeting today. We knew NUMA node affinity issues were going to crop up sooner or later (and indeed already have in some cases), but this is pretty major. It's probably time to really dig in and figure out how to deal with this. 
> 
> Note: this is one of the reasons I like small nodes with single sockets and fewer OSDs. 
> 
> Mark 
> 
> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
>> Hi, 
>> 
>> I have done a lot of test today, and it seem indeed numa related. 
>> 
>> My numastat was 
>> 
>> # numastat 
>> node0 node1 
>> numa_hit 99075422 153976877 
>> numa_miss 167490965 1493663 
>> numa_foreign 1493663 167491417 
>> interleave_hit 157745 167015 
>> local_node 99049179 153830554 
>> other_node 167517697 1639986 
>> 
>> So, a lot of miss. 
>> 
>> In this case , I can reproduce ios going from 85k to 300k iops, up and down. 
>> 
>> now setting 
>> echo 0 > /proc/sys/kernel/numa_balancing 
>> 
>> and starting osd daemons with 
>> 
>> numactl --interleave=all /usr/bin/ceph-osd 
>> 
>> 
>> I have a constant 300k iops ! 
>> 
>> 
>> I wonder if it could be improve by binding osd daemons to specific numa node. 
>> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning. 
>> 
>> 
>> 
>> ----- Mail original ----- 
>> De: "Milosz Tanski" <milosz@adfin.com> 
>> À: "aderumier" <aderumier@odiso.com> 
>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" 
>> <ceph-users@lists.ceph.com> 
>> Envoyé: Mercredi 22 Avril 2015 12:54:23 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> 
>> 
>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote: 
>> 
>> 
>> I wonder if it could be numa related, 
>> 
>> I'm using centos 7.1, 
>> and auto numa balacning is enabled 
>> 
>> cat /proc/sys/kernel/numa_balancing = 1 
>> 
>> Maybe osd daemon access to buffer on wrong numa node. 
>> 
>> I'll try to reproduce the problem 
>> 
>> 
>> 
>> Can you force the degenerate case using numactl? To either affirm or deny your suspicion. 
>> 
>> 
>> 
>> 
>> ----- Mail original ----- 
>> De: "aderumier" < aderumier@odiso.com > 
>> À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" < 
>> ceph-users@lists.ceph.com > 
>> Envoyé: Mercredi 22 Avril 2015 10:40:05 
>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
>> improve performance from 100k iops to 300k iops 
>> 
>> Hi, 
>> 
>> I was doing some benchmarks, 
>> I have found an strange behaviour. 
>> 
>> Using fio with rbd engine, I was able to reach around 100k iops. 
>> (osd datas in linux buffer, iostat show 0% disk access) 
>> 
>> then after restarting all osd daemons, 
>> 
>> the same fio benchmark show now around 300k iops. 
>> (osd datas in linux buffer, iostat show 0% disk access) 
>> 
>> 
>> any ideas? 
>> 
>> 
>> 
>> 
>> before restarting osd 
>> --------------------- 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=rbd, iodepth=32 ... 
>> fio-2.2.7-10-g51e9 
>> Starting 10 processes 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
>> iops] [eta 14m:45s] 
>> fio: terminating on signal 2 
>> 
>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
>> (usec): 
>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
>> | 99.99th=[47360] 
>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
>> depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
>> mint=26215msec, maxt=26215msec 
>> 
>> Disk stats (read/write): 
>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=rbd, iodepth=32 
>> 
>> 
>> 
>> 
>> AFTER RESTARTING OSDS 
>> ---------------------- 
>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=rbd, iodepth=32 ... 
>> fio-2.2.7-10-g51e9 
>> Starting 10 processes 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
>> iops] [eta 01h:09m:27s] 
>> fio: terminating on signal 2 
>> 
>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, runt= 
>> 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 clat 
>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
>> (usec): 
>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
>> | 99.99th=[436224] 
>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
>> target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, maxb=1036.8MB/s, 
>> mint=7389msec, maxt=7389msec 
>> 
>> Disk stats (read/write): 
>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
>> 
>> 
>> 
>> 
>> CEPH LOG 
>> -------- 
>> 
>> before restarting osd 
>> ---------------------- 
>> 
>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 298 MB/s rd, 76465 op/s 
>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 333 MB/s rd, 85355 op/s 
>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 343 MB/s rd, 87932 op/s 
>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 328 MB/s rd, 84151 op/s 
>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 237 MB/s rd, 60855 op/s 
>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 144 MB/s rd, 36935 op/s 
>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 321 MB/s rd, 82334 op/s 
>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 368 MB/s rd, 94211 op/s 
>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 244 MB/s rd, 62644 op/s 
>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 175 MB/s rd, 44997 op/s 
>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 122 MB/s rd, 31259 op/s 
>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 229 MB/s rd, 58674 op/s 
>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 271 MB/s rd, 69501 op/s 
>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 211 MB/s rd, 54020 op/s 
>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 164 MB/s rd, 42001 op/s 
>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 134 MB/s rd, 34380 op/s 
>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 293 MB/s rd, 75213 op/s 
>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 337 MB/s rd, 86353 op/s 
>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 229 MB/s rd, 58839 op/s 
>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 152 MB/s rd, 39117 op/s 
>> 
>> 
>> restarting osd 
>> --------------- 
>> 
>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
>> [INF] osd.0 marked itself down 
>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
>> [INF] osdmap e849: 9 osds: 8 up, 9 in 
>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
>> kB/s rd, 130 op/s 
>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
>> [INF] osdmap e850: 9 osds: 8 up, 9 in 
>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 794 
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
>> [INF] osd.1 marked itself down 
>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
>> [INF] osdmap e851: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
>> [INF] osdmap e852: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 684 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
>> [INF] osd.2 marked itself down 
>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
>> [INF] osdmap e853: 9 osds: 6 up, 9 in 
>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
>> [INF] osdmap e854: 9 osds: 6 up, 9 in 
>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
>> [INF] osd.1 10.7.0.152:6800/14848 boot 
>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
>> [INF] osdmap e855: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 600 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
>> [INF] osd.3 marked itself down 
>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
>> [INF] osdmap e856: 9 osds: 6 up, 9 in 
>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 506 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
>> [INF] osd.6 marked itself down 
>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
>> [INF] osd.2 10.7.0.152:6812/15410 boot 
>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
>> [INF] osdmap e857: 9 osds: 6 up, 9 in 
>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
>> [INF] osdmap e858: 9 osds: 6 up, 9 in 
>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
>> [INF] osd.3 10.7.0.152:6816/15971 boot 
>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
>> [INF] osdmap e859: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 398 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
>> [INF] osd.7 marked itself down 
>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
>> [INF] osd.0 10.7.0.152:6804/14481 boot 
>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
>> [INF] osdmap e860: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 292 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
>> [INF] osd.6 10.7.0.153:6808/21955 boot 
>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
>> [INF] osdmap e861: 9 osds: 8 up, 9 in 
>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
>> 420 GB used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
>> [INF] osd.8 marked itself down 
>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
>> [INF] osdmap e862: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
>> 420 GB used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
>> [INF] osd.7 10.7.0.153:6804/22536 boot 
>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
>> [INF] osdmap e863: 9 osds: 8 up, 9 in 
>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
>> stale+active+remapped, 503 stale+active+clean, 4 
>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 GB 
>> / 1295 GB avail 
>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
>> [INF] osdmap e864: 9 osds: 8 up, 9 in 
>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
>> stale+active+remapped, 503 stale+active+clean, 4 
>> active+undersized+degraded+remapped, 5 peering, 13 active 
>> 
>> 
>> AFTER OSD RESTART 
>> ------------------ 
>> 
>> 
>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 786 MB/s rd, 196 kop/s 
>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 932 MB/s rd, 233 kop/s 
>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 627 MB/s rd, 156 kop/s 
>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 529 MB/s rd, 132 kop/s 
>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 770 MB/s rd, 192 kop/s 
>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 649 MB/s rd, 162 kop/s 
>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 580 MB/s rd, 145 kop/s 
>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 962 MB/s rd, 240 kop/s 
>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 506 MB/s rd, 126 kop/s 
>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 774 MB/s rd, 193 kop/s 
>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 662 MB/s rd, 165 kop/s 
>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 593 MB/s rd, 148 kop/s 
>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 938 MB/s rd, 234 kop/s 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> ________________________________ 
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
  2015-04-23 11:56                               ` Alexandre DERUMIER
@ 2015-04-23 16:24                                 ` Somnath Roy
       [not found]                                   ` <755F6B91B3BE364F9BCA11EA3F9E0C6F2CD806BE-cXZ6iGhjG0il5HHZYNR2WTJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Somnath Roy @ 2015-04-23 16:24 UTC (permalink / raw)
  To: Alexandre DERUMIER, Mark Nelson; +Cc: ceph-users, ceph-devel, Milosz Tanski

Alexandre,
You can configure with --with-jemalloc or ./do_autogen -J to build ceph with jemalloc.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Alexandre DERUMIER
Sent: Thursday, April 23, 2015 4:56 AM
To: Mark Nelson
Cc: ceph-users; ceph-devel; Milosz Tanski
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

>>If you have the means to compile the same version of ceph with 
>>jemalloc, I would be very interested to see how it does.

Yes, sure. (I have around 3-4 weeks to do all the benchs)

But I don't know how to do it ? 
I'm running the cluster on centos7.1, maybe it can be easy to patch the srpms to rebuild the package with jemalloc.



----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "aderumier" <aderumier@odiso.com>, "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Jeudi 23 Avril 2015 13:33:00
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Thanks for the testing Alexandre! 

If you have the means to compile the same version of ceph with jemalloc, I would be very interested to see how it does. 

In some ways I'm glad it turned out not to be NUMA. I still suspect we will have to deal with it at some point, but perhaps not today. ;) 

Mark 

On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: 
> Maybe it's tcmalloc related
> I thinked to have patched it correctly, but perf show a lot of 
> tcmalloc::ThreadCache::ReleaseToCentralCache
> 
> before osd restart (100k)
> ------------------
> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] 
> tcmalloc::ThreadCache::ReleaseToCentralCache
> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] 
> tcmalloc::CentralFreeList::FetchFromSpans
> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] 
> tcmalloc::CentralFreeList::ReleaseToSpans
> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper 
> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 
> [.] tcmalloc::CentralFreeList::ReleaseListToSpans
> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd 
> libstdc++.so.6.0.19 [.] std::basic_string<char, 
> std::char_traits<char>, std::allocator<char> >::basic_string 0.91% 
> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd 
> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] 
> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] 
> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] 
> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] 
> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] 
> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] 
> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57% 
> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd 
> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] 
> [k] __schedule
> 
> 
> after osd restart (300k iops)
> ------------------------------
> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd 
> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper 
> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 
> [.] std::basic_string<char, std::char_traits<char>, 
> std::allocator<char> >::basic_string 1.34% ceph-osd 
> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache
> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd 
> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] 
> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] 
> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] 
> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] 
> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd 
> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] 
> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% 
> ceph-osd libtcmalloc.so.4.1.2 [.] 
> tcmalloc::CentralFreeList::FetchFromSpans
> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% 
> ceph-osd ceph-osd [.] 
> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd 
> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.] 
> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd 
> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd 
> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] 
> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light
> 
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" 
> <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Jeudi 23 Avril 2015 10:00:34
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops
> 
> Hi,
> I'm hitting this bug again today. 
> 
> So don't seem to be numa related (I have try to flush linux buffer to be sure). 
> 
> and tcmalloc is patched (I don't known how to verify that it's ok). 
> 
> I don't have restarted osd yet. 
> 
> Maybe some perf trace could be usefulll ? 
> 
> 
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" 
> <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Mercredi 22 Avril 2015 18:30:26
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops
> 
> Hi,
> 
>>> I feel it is due to tcmalloc issue
> 
> Indeed, I had patched one of my node, but not the other. 
> So maybe I have hit this bug. (but I can't confirm, I don't have traces). 
> 
> But numa interleaving seem to help in my case (maybe not from 100->300k, but 250k->300k). 
> 
> I need to do more long tests to confirm that. 
> 
> 
> ----- Mail original -----
> De: "Srinivasula Maram" <Srinivasula.Maram@sandisk.com>
> À: "Mark Nelson" <mnelson@redhat.com>, "aderumier" 
> <aderumier@odiso.com>, "Milosz Tanski" <milosz@adfin.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" 
> <ceph-users@lists.ceph.com>
> Envoyé: Mercredi 22 Avril 2015 16:34:33
> Objet: RE: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops
> 
> I feel it is due to tcmalloc issue
> 
> I have seen similar issue in my setup after 20 days. 
> 
> Thanks,
> Srinivas
> 
> 
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf 
> Of Mark Nelson
> Sent: Wednesday, April 22, 2015 7:31 PM
> To: Alexandre DERUMIER; Milosz Tanski
> Cc: ceph-devel; ceph-users
> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops
> 
> Hi Alexandre,
> 
> We should discuss this at the perf meeting today. We knew NUMA node affinity issues were going to crop up sooner or later (and indeed already have in some cases), but this is pretty major. It's probably time to really dig in and figure out how to deal with this. 
> 
> Note: this is one of the reasons I like small nodes with single sockets and fewer OSDs. 
> 
> Mark
> 
> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
>> Hi,
>> 
>> I have done a lot of test today, and it seem indeed numa related. 
>> 
>> My numastat was
>> 
>> # numastat
>> node0 node1
>> numa_hit 99075422 153976877
>> numa_miss 167490965 1493663
>> numa_foreign 1493663 167491417
>> interleave_hit 157745 167015
>> local_node 99049179 153830554
>> other_node 167517697 1639986
>> 
>> So, a lot of miss. 
>> 
>> In this case , I can reproduce ios going from 85k to 300k iops, up and down. 
>> 
>> now setting
>> echo 0 > /proc/sys/kernel/numa_balancing
>> 
>> and starting osd daemons with
>> 
>> numactl --interleave=all /usr/bin/ceph-osd
>> 
>> 
>> I have a constant 300k iops ! 
>> 
>> 
>> I wonder if it could be improve by binding osd daemons to specific numa node. 
>> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning. 
>> 
>> 
>> 
>> ----- Mail original -----
>> De: "Milosz Tanski" <milosz@adfin.com>
>> À: "aderumier" <aderumier@odiso.com>
>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" 
>> <ceph-users@lists.ceph.com>
>> Envoyé: Mercredi 22 Avril 2015 12:54:23
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops
>> 
>> 
>> 
>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote: 
>> 
>> 
>> I wonder if it could be numa related,
>> 
>> I'm using centos 7.1,
>> and auto numa balacning is enabled
>> 
>> cat /proc/sys/kernel/numa_balancing = 1
>> 
>> Maybe osd daemon access to buffer on wrong numa node. 
>> 
>> I'll try to reproduce the problem
>> 
>> 
>> 
>> Can you force the degenerate case using numactl? To either affirm or deny your suspicion. 
>> 
>> 
>> 
>> 
>> ----- Mail original -----
>> De: "aderumier" < aderumier@odiso.com >
>> À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" < 
>> ceph-users@lists.ceph.com >
>> Envoyé: Mercredi 22 Avril 2015 10:40:05
>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
>> improve performance from 100k iops to 300k iops
>> 
>> Hi,
>> 
>> I was doing some benchmarks,
>> I have found an strange behaviour. 
>> 
>> Using fio with rbd engine, I was able to reach around 100k iops. 
>> (osd datas in linux buffer, iostat show 0% disk access)
>> 
>> then after restarting all osd daemons,
>> 
>> the same fio benchmark show now around 300k iops. 
>> (osd datas in linux buffer, iostat show 0% disk access)
>> 
>> 
>> any ideas? 
>> 
>> 
>> 
>> 
>> before restarting osd
>> ---------------------
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=rbd, iodepth=32 ...
>> fio-2.2.7-10-g51e9
>> Starting 10 processes
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
>> iops] [eta 14m:45s]
>> fio: terminating on signal 2
>> 
>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr
>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat
>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
>> (usec): 
>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
>> | 99.99th=[47360]
>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%,
>> depth=32
>> 
>> Run status group 0 (all jobs): 
>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
>> mint=26215msec, maxt=26215msec
>> 
>> Disk stats (read/write): 
>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=rbd, iodepth=32
>> 
>> 
>> 
>> 
>> AFTER RESTARTING OSDS
>> ----------------------
>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=rbd, iodepth=32 ...
>> fio-2.2.7-10-g51e9
>> Starting 10 processes
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> rbd engine: RBD version: 0.1.9
>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
>> iops] [eta 01h:09m:27s]
>> fio: terminating on signal 2
>> 
>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr
>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, 
>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
>> clat
>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
>> (usec): 
>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
>> | 99.99th=[436224]
>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec)
>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
>> target=0, window=0, percentile=100.00%, depth=32
>> 
>> Run status group 0 (all jobs): 
>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, 
>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec
>> 
>> Disk stats (read/write): 
>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>> 
>> 
>> 
>> 
>> CEPH LOG
>> --------
>> 
>> before restarting osd
>> ----------------------
>> 
>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 298 MB/s rd, 76465 op/s
>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 333 MB/s rd, 85355 op/s
>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 343 MB/s rd, 87932 op/s
>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 328 MB/s rd, 84151 op/s
>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 237 MB/s rd, 60855 op/s
>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 144 MB/s rd, 36935 op/s
>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 321 MB/s rd, 82334 op/s
>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 368 MB/s rd, 94211 op/s
>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 244 MB/s rd, 62644 op/s
>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 175 MB/s rd, 44997 op/s
>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 122 MB/s rd, 31259 op/s
>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 229 MB/s rd, 58674 op/s
>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 271 MB/s rd, 69501 op/s
>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 211 MB/s rd, 54020 op/s
>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 164 MB/s rd, 42001 op/s
>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 134 MB/s rd, 34380 op/s
>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 293 MB/s rd, 75213 op/s
>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 337 MB/s rd, 86353 op/s
>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 229 MB/s rd, 58839 op/s
>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> 1295 GB avail; 152 MB/s rd, 39117 op/s
>> 
>> 
>> restarting osd
>> ---------------
>> 
>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
>> [INF] osd.0 marked itself down
>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
>> [INF] osdmap e849: 9 osds: 8 up, 9 in
>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>> stale+active+794
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516
>> kB/s rd, 130 op/s
>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
>> [INF] osdmap e850: 9 osds: 8 up, 9 in
>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>> stale+active+794
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>> stale+active+794
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
>> [INF] osd.1 marked itself down
>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
>> [INF] osdmap e851: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>> stale+active+684
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
>> [INF] osdmap e852: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>> stale+active+684
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
>> [INF] osd.2 marked itself down
>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
>> [INF] osdmap e853: 9 osds: 6 up, 9 in
>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>> stale+active+600
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
>> [INF] osdmap e854: 9 osds: 6 up, 9 in
>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>> stale+active+600
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
>> [INF] osd.1 10.7.0.152:6800/14848 boot
>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
>> [INF] osdmap e855: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>> stale+active+600
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
>> [INF] osd.3 marked itself down
>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
>> [INF] osdmap e856: 9 osds: 6 up, 9 in
>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 
>> stale+active+506
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
>> [INF] osd.6 marked itself down
>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
>> [INF] osd.2 10.7.0.152:6812/15410 boot
>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
>> [INF] osdmap e857: 9 osds: 6 up, 9 in
>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>> stale+active+398
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
>> [INF] osdmap e858: 9 osds: 6 up, 9 in
>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>> stale+active+398
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
>> [INF] osd.3 10.7.0.152:6816/15971 boot
>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
>> [INF] osdmap e859: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>> stale+active+398
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
>> [INF] osd.7 marked itself down
>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
>> [INF] osd.0 10.7.0.152:6804/14481 boot
>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
>> [INF] osdmap e860: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 
>> stale+active+292
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
>> [INF] osd.6 10.7.0.153:6808/21955 boot
>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
>> [INF] osdmap e861: 9 osds: 8 up, 9 in
>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data,
>> 420 GB used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
>> [INF] osd.8 marked itself down
>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
>> [INF] osdmap e862: 9 osds: 7 up, 9 in
>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data,
>> 420 GB used, 874 GB / 1295 GB avail
>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
>> [INF] osd.7 10.7.0.153:6804/22536 boot
>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
>> [INF] osdmap e863: 9 osds: 8 up, 9 in
>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
>> stale+active+remapped, 503 stale+active+clean, 4
>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped,
>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 
>> GB / 1295 GB avail
>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
>> [INF] osdmap e864: 9 osds: 8 up, 9 in
>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
>> stale+active+remapped, 503 stale+active+clean, 4
>> active+undersized+degraded+remapped, 5 peering, 13 active
>> 
>> 
>> AFTER OSD RESTART
>> ------------------
>> 
>> 
>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 786 MB/s rd, 196 kop/s
>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 1578 MB/s rd, 394 kop/s
>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 932 MB/s rd, 233 kop/s
>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 627 MB/s rd, 156 kop/s
>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 1034 MB/s rd, 258 kop/s
>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 529 MB/s rd, 132 kop/s
>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 770 MB/s rd, 192 kop/s
>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 1358 MB/s rd, 339 kop/s
>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 649 MB/s rd, 162 kop/s
>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 580 MB/s rd, 145 kop/s
>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 962 MB/s rd, 240 kop/s
>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 506 MB/s rd, 126 kop/s
>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 774 MB/s rd, 193 kop/s
>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 1363 MB/s rd, 340 kop/s
>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 662 MB/s rd, 165 kop/s
>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 593 MB/s rd, 148 kop/s
>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> 1295 GB avail; 938 MB/s rd, 234 kop/s
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                   ` <755F6B91B3BE364F9BCA11EA3F9E0C6F2CD806BE-cXZ6iGhjG0il5HHZYNR2WTJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
@ 2015-04-24 11:37                                     ` Irek Fasikhov
       [not found]                                       ` <CAF-rypyB=CjjJQs_+3Q=ELCVwpg9nyWKdBu8gVyuDQa49=GHtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Irek Fasikhov @ 2015-04-24 11:37 UTC (permalink / raw)
  To: Somnath Roy; +Cc: ceph-users, ceph-devel, Milosz Tanski


[-- Attachment #1.1: Type: text/plain, Size: 38224 bytes --]

Hi,Alexandre!
Do not try to change the parameter vm.min_free_kbytes?

2015-04-23 19:24 GMT+03:00 Somnath Roy <Somnath.Roy-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>:

> Alexandre,
> You can configure with --with-jemalloc or ./do_autogen -J to build ceph
> with jemalloc.
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org] On Behalf Of
> Alexandre DERUMIER
> Sent: Thursday, April 23, 2015 4:56 AM
> To: Mark Nelson
> Cc: ceph-users; ceph-devel; Milosz Tanski
> Subject: Re: [ceph-users] strange benchmark problem : restarting osd
> daemon improve performance from 100k iops to 300k iops
>
> >>If you have the means to compile the same version of ceph with
> >>jemalloc, I would be very interested to see how it does.
>
> Yes, sure. (I have around 3-4 weeks to do all the benchs)
>
> But I don't know how to do it ?
> I'm running the cluster on centos7.1, maybe it can be easy to patch the
> srpms to rebuild the package with jemalloc.
>
>
>
> ----- Mail original -----
> De: "Mark Nelson" <mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> À: "aderumier" <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>, "Srinivasula Maram" <
> Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>, "ceph-devel" <
> ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "Milosz Tanski" <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
> Envoyé: Jeudi 23 Avril 2015 13:33:00
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon
> improve performance from 100k iops to 300k iops
>
> Thanks for the testing Alexandre!
>
> If you have the means to compile the same version of ceph with jemalloc, I
> would be very interested to see how it does.
>
> In some ways I'm glad it turned out not to be NUMA. I still suspect we
> will have to deal with it at some point, but perhaps not today. ;)
>
> Mark
>
> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote:
> > Maybe it's tcmalloc related
> > I thinked to have patched it correctly, but perf show a lot of
> > tcmalloc::ThreadCache::ReleaseToCentralCache
> >
> > before osd restart (100k)
> > ------------------
> > 11.66% ceph-osd libtcmalloc.so.4.1.2 [.]
> > tcmalloc::ThreadCache::ReleaseToCentralCache
> > 8.51% ceph-osd libtcmalloc.so.4.1.2 [.]
> > tcmalloc::CentralFreeList::FetchFromSpans
> > 3.04% ceph-osd libtcmalloc.so.4.1.2 [.]
> > tcmalloc::CentralFreeList::ReleaseToSpans
> > 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper
> > [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2
> > [.] tcmalloc::CentralFreeList::ReleaseListToSpans
> > 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd
> > libstdc++.so.6.0.19 [.] std::basic_string<char,
> > std::char_traits<char>, std::allocator<char> >::basic_string 0.91%
> > ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd
> > libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.]
> > Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k]
> > copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.]
> > pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k]
> > _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k]
> > native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k]
> > avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57%
> > ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd
> > [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms]
> > [k] __schedule
> >
> >
> > after osd restart (300k iops)
> > ------------------------------
> > 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd
> > libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper
> > [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19
> > [.] std::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >::basic_string 1.34% ceph-osd
> > libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache
> > 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd
> > ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.]
> > pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k]
> > copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.]
> > pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k]
> > _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu
> > 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd
> > ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k]
> > __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65%
> > ceph-osd libtcmalloc.so.4.1.2 [.]
> > tcmalloc::CentralFreeList::FetchFromSpans
> > 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61%
> > ceph-osd ceph-osd [.]
> > std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
> > 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd
> > libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.]
> > operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd
> > libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd
> > [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.]
> > vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light
> >
> > ----- Mail original -----
> > De: "aderumier" <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
> > À: "Srinivasula Maram" <Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > Cc: "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>, "ceph-devel"
> > <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "Milosz Tanski" <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
> > Envoyé: Jeudi 23 Avril 2015 10:00:34
> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd
> > daemon improve performance from 100k iops to 300k iops
> >
> > Hi,
> > I'm hitting this bug again today.
> >
> > So don't seem to be numa related (I have try to flush linux buffer to be
> sure).
> >
> > and tcmalloc is patched (I don't known how to verify that it's ok).
> >
> > I don't have restarted osd yet.
> >
> > Maybe some perf trace could be usefulll ?
> >
> >
> > ----- Mail original -----
> > De: "aderumier" <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
> > À: "Srinivasula Maram" <Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > Cc: "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>, "ceph-devel"
> > <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "Milosz Tanski" <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
> > Envoyé: Mercredi 22 Avril 2015 18:30:26
> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd
> > daemon improve performance from 100k iops to 300k iops
> >
> > Hi,
> >
> >>> I feel it is due to tcmalloc issue
> >
> > Indeed, I had patched one of my node, but not the other.
> > So maybe I have hit this bug. (but I can't confirm, I don't have traces).
> >
> > But numa interleaving seem to help in my case (maybe not from 100->300k,
> but 250k->300k).
> >
> > I need to do more long tests to confirm that.
> >
> >
> > ----- Mail original -----
> > De: "Srinivasula Maram" <Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > À: "Mark Nelson" <mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "aderumier"
> > <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>, "Milosz Tanski" <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
> > Cc: "ceph-devel" <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "ceph-users"
> > <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>
> > Envoyé: Mercredi 22 Avril 2015 16:34:33
> > Objet: RE: [ceph-users] strange benchmark problem : restarting osd
> > daemon improve performance from 100k iops to 300k iops
> >
> > I feel it is due to tcmalloc issue
> >
> > I have seen similar issue in my setup after 20 days.
> >
> > Thanks,
> > Srinivas
> >
> >
> >
> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org] On Behalf
> > Of Mark Nelson
> > Sent: Wednesday, April 22, 2015 7:31 PM
> > To: Alexandre DERUMIER; Milosz Tanski
> > Cc: ceph-devel; ceph-users
> > Subject: Re: [ceph-users] strange benchmark problem : restarting osd
> > daemon improve performance from 100k iops to 300k iops
> >
> > Hi Alexandre,
> >
> > We should discuss this at the perf meeting today. We knew NUMA node
> affinity issues were going to crop up sooner or later (and indeed already
> have in some cases), but this is pretty major. It's probably time to really
> dig in and figure out how to deal with this.
> >
> > Note: this is one of the reasons I like small nodes with single sockets
> and fewer OSDs.
> >
> > Mark
> >
> > On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
> >> Hi,
> >>
> >> I have done a lot of test today, and it seem indeed numa related.
> >>
> >> My numastat was
> >>
> >> # numastat
> >> node0 node1
> >> numa_hit 99075422 153976877
> >> numa_miss 167490965 1493663
> >> numa_foreign 1493663 167491417
> >> interleave_hit 157745 167015
> >> local_node 99049179 153830554
> >> other_node 167517697 1639986
> >>
> >> So, a lot of miss.
> >>
> >> In this case , I can reproduce ios going from 85k to 300k iops, up and
> down.
> >>
> >> now setting
> >> echo 0 > /proc/sys/kernel/numa_balancing
> >>
> >> and starting osd daemons with
> >>
> >> numactl --interleave=all /usr/bin/ceph-osd
> >>
> >>
> >> I have a constant 300k iops !
> >>
> >>
> >> I wonder if it could be improve by binding osd daemons to specific numa
> node.
> >> I have 2 numanode of 10 cores with 6 osd, but I think it also require
> ceph.conf osd threads tunning.
> >>
> >>
> >>
> >> ----- Mail original -----
> >> De: "Milosz Tanski" <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
> >> À: "aderumier" <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
> >> Cc: "ceph-devel" <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "ceph-users"
> >> <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>
> >> Envoyé: Mercredi 22 Avril 2015 12:54:23
> >> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
> >> daemon improve performance from 100k iops to 300k iops
> >>
> >>
> >>
> >> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER <
> aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org > wrote:
> >>
> >>
> >> I wonder if it could be numa related,
> >>
> >> I'm using centos 7.1,
> >> and auto numa balacning is enabled
> >>
> >> cat /proc/sys/kernel/numa_balancing = 1
> >>
> >> Maybe osd daemon access to buffer on wrong numa node.
> >>
> >> I'll try to reproduce the problem
> >>
> >>
> >>
> >> Can you force the degenerate case using numactl? To either affirm or
> deny your suspicion.
> >>
> >>
> >>
> >>
> >> ----- Mail original -----
> >> De: "aderumier" < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org >
> >> À: "ceph-devel" < ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >, "ceph-users" <
> >> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org >
> >> Envoyé: Mercredi 22 Avril 2015 10:40:05
> >> Objet: [ceph-users] strange benchmark problem : restarting osd daemon
> >> improve performance from 100k iops to 300k iops
> >>
> >> Hi,
> >>
> >> I was doing some benchmarks,
> >> I have found an strange behaviour.
> >>
> >> Using fio with rbd engine, I was able to reach around 100k iops.
> >> (osd datas in linux buffer, iostat show 0% disk access)
> >>
> >> then after restarting all osd daemons,
> >>
> >> the same fio benchmark show now around 300k iops.
> >> (osd datas in linux buffer, iostat show 0% disk access)
> >>
> >>
> >> any ideas?
> >>
> >>
> >>
> >>
> >> before restarting osd
> >> ---------------------
> >> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> >> ioengine=rbd, iodepth=32 ...
> >> fio-2.2.7-10-g51e9
> >> Starting 10 processes
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0
> >> iops] [eta 14m:45s]
> >> fio: terminating on signal 2
> >>
> >> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr
> >> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt=
> >> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat
> >> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
> >> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
> >> (usec):
> >> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
> >> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
> >> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
> >> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
> >> | 99.99th=[47360]
> >> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
> >> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
> >> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
> >> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01%
> >> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
> >> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
> >> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> >> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
> >>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
> >> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%,
> >> depth=32
> >>
> >> Run status group 0 (all jobs):
> >> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s,
> >> mint=26215msec, maxt=26215msec
> >>
> >> Disk stats (read/write):
> >> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
> >> [root@ceph1-3 fiorbd]# ./fio fiorbd
> >> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> >> ioengine=rbd, iodepth=32
> >>
> >>
> >>
> >>
> >> AFTER RESTARTING OSDS
> >> ----------------------
> >> [root@ceph1-3 fiorbd]# ./fio fiorbd
> >> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> >> ioengine=rbd, iodepth=32 ...
> >> fio-2.2.7-10-g51e9
> >> Starting 10 processes
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> rbd engine: RBD version: 0.1.9
> >> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0
> >> iops] [eta 01h:09m:27s]
> >> fio: terminating on signal 2
> >>
> >> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr
> >> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218,
> >> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35
> >> clat
> >> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
> >> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
> >> (usec):
> >> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
> >> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
> >> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
> >> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
> >> | 99.99th=[436224]
> >> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
> >> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
> >> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec)
> >> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
> >> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
> >> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
> >> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
> >> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
> >> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
> >> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
> >> target=0, window=0, percentile=100.00%, depth=32
> >>
> >> Run status group 0 (all jobs):
> >> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s,
> >> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec
> >>
> >> Disk stats (read/write):
> >> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
> >>
> >>
> >>
> >>
> >> CEPH LOG
> >> --------
> >>
> >> before restarting osd
> >> ----------------------
> >>
> >> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
> >> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 298 MB/s rd, 76465 op/s
> >> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
> >> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 333 MB/s rd, 85355 op/s
> >> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
> >> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 343 MB/s rd, 87932 op/s
> >> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
> >> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 328 MB/s rd, 84151 op/s
> >> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
> >> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 237 MB/s rd, 60855 op/s
> >> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
> >> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 144 MB/s rd, 36935 op/s
> >> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
> >> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 321 MB/s rd, 82334 op/s
> >> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
> >> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 368 MB/s rd, 94211 op/s
> >> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
> >> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 244 MB/s rd, 62644 op/s
> >> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
> >> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 175 MB/s rd, 44997 op/s
> >> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
> >> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 122 MB/s rd, 31259 op/s
> >> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
> >> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 229 MB/s rd, 58674 op/s
> >> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
> >> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 271 MB/s rd, 69501 op/s
> >> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
> >> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 211 MB/s rd, 54020 op/s
> >> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
> >> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 164 MB/s rd, 42001 op/s
> >> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
> >> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 134 MB/s rd, 34380 op/s
> >> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
> >> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 293 MB/s rd, 75213 op/s
> >> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
> >> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 337 MB/s rd, 86353 op/s
> >> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
> >> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 229 MB/s rd, 58839 op/s
> >> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
> >> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
> >> 1295 GB avail; 152 MB/s rd, 39117 op/s
> >>
> >>
> >> restarting osd
> >> ---------------
> >>
> >> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
> >> [INF] osd.0 marked itself down
> >> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
> >> [INF] osdmap e849: 9 osds: 8 up, 9 in
> >> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
> >> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
> >> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
> >> stale+active+794
> >> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516
> >> kB/s rd, 130 op/s
> >> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
> >> [INF] osdmap e850: 9 osds: 8 up, 9 in
> >> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
> >> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
> >> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
> >> stale+active+794
> >> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
> >> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
> >> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
> >> stale+active+794
> >> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
> >> [INF] osd.1 marked itself down
> >> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
> >> [INF] osdmap e851: 9 osds: 7 up, 9 in
> >> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
> >> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
> >> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
> >> stale+active+684
> >> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> >> used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
> >> [INF] osdmap e852: 9 osds: 7 up, 9 in
> >> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
> >> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
> >> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
> >> stale+active+684
> >> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> >> used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
> >> [INF] osd.2 marked itself down
> >> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
> >> [INF] osdmap e853: 9 osds: 6 up, 9 in
> >> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
> >> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
> >> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
> >> stale+active+600
> >> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> >> used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
> >> [INF] osdmap e854: 9 osds: 6 up, 9 in
> >> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
> >> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
> >> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
> >> stale+active+600
> >> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> >> used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
> >> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
> >> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
> >> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
> >> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
> >> [INF] osd.1 10.7.0.152:6800/14848 boot
> >> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
> >> [INF] osdmap e855: 9 osds: 7 up, 9 in
> >> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
> >> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
> >> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
> >> stale+active+600
> >> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> >> used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
> >> [INF] osd.3 marked itself down
> >> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
> >> [INF] osdmap e856: 9 osds: 6 up, 9 in
> >> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
> >> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
> >> stale+active+remapped, 394 stale+active+clean, 37 active+remapped,
> >> stale+active+506
> >> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> >> used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
> >> [INF] osd.6 marked itself down
> >> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
> >> [INF] osd.2 10.7.0.152:6812/15410 boot
> >> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
> >> [INF] osdmap e857: 9 osds: 6 up, 9 in
> >> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
> >> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
> >> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
> >> stale+active+398
> >> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> >> used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
> >> [INF] osdmap e858: 9 osds: 6 up, 9 in
> >> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
> >> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
> >> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
> >> stale+active+398
> >> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> >> used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
> >> [INF] osd.3 10.7.0.152:6816/15971 boot
> >> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
> >> [INF] osdmap e859: 9 osds: 7 up, 9 in
> >> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
> >> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
> >> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
> >> stale+active+398
> >> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> >> used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
> >> [INF] osd.7 marked itself down
> >> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
> >> [INF] osd.0 10.7.0.152:6804/14481 boot
> >> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
> >> [INF] osdmap e860: 9 osds: 7 up, 9 in
> >> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
> >> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
> >> stale+active+remapped, 608 stale+active+clean, 27 active+remapped,
> >> stale+active+292
> >> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
> >> used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
> >> [INF] osd.6 10.7.0.153:6808/21955 boot
> >> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
> >> [INF] osdmap e861: 9 osds: 8 up, 9 in
> >> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
> >> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
> >> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
> >> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data,
> >> 420 GB used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
> >> [INF] osd.8 marked itself down
> >> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
> >> [INF] osdmap e862: 9 osds: 7 up, 9 in
> >> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
> >> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
> >> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
> >> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data,
> >> 420 GB used, 874 GB / 1295 GB avail
> >> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
> >> [INF] osd.7 10.7.0.153:6804/22536 boot
> >> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
> >> [INF] osdmap e863: 9 osds: 8 up, 9 in
> >> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
> >> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
> >> stale+active+remapped, 503 stale+active+clean, 4
> >> active+undersized+degraded+remapped, 5 peering, 13 active+remapped,
> >> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874
> >> GB / 1295 GB avail
> >> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
> >> [INF] osdmap e864: 9 osds: 8 up, 9 in
> >> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
> >> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
> >> stale+active+remapped, 503 stale+active+clean, 4
> >> active+undersized+degraded+remapped, 5 peering, 13 active
> >>
> >>
> >> AFTER OSD RESTART
> >> ------------------
> >>
> >>
> >> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
> >> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 786 MB/s rd, 196 kop/s
> >> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
> >> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 1578 MB/s rd, 394 kop/s
> >> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
> >> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 932 MB/s rd, 233 kop/s
> >> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
> >> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 627 MB/s rd, 156 kop/s
> >> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
> >> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 1034 MB/s rd, 258 kop/s
> >> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
> >> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 529 MB/s rd, 132 kop/s
> >> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
> >> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 770 MB/s rd, 192 kop/s
> >> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
> >> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 1358 MB/s rd, 339 kop/s
> >> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
> >> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 649 MB/s rd, 162 kop/s
> >> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
> >> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 580 MB/s rd, 145 kop/s
> >> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
> >> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 962 MB/s rd, 240 kop/s
> >> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
> >> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 506 MB/s rd, 126 kop/s
> >> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
> >> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 774 MB/s rd, 193 kop/s
> >> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
> >> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 1363 MB/s rd, 340 kop/s
> >> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
> >> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 662 MB/s rd, 165 kop/s
> >> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
> >> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 593 MB/s rd, 148 kop/s
> >> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
> >> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
> >> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
> >> 1295 GB avail; 938 MB/s rd, 234 kop/s
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ________________________________
> >
> > PLEASE NOTE: The information contained in this electronic mail message
> is intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757

[-- Attachment #1.2: Type: text/html, Size: 52993 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                       ` <CAF-rypyB=CjjJQs_+3Q=ELCVwpg9nyWKdBu8gVyuDQa49=GHtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-04-24 16:38                                         ` Alexandre DERUMIER
       [not found]                                           ` <1270073308.621430804.1429893511370.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-24 16:38 UTC (permalink / raw)
  To: ceph-users, ceph-devel; +Cc: Milosz Tanski

Hi,

I have finished to rebuild ceph with jemalloc,

all seem to working fine.

I got a constant 300k iops for the moment, so no speed regression.

I'll do more long benchmark next week.

Regards,

Alexandre

----- Mail original -----
De: "Irek Fasikhov" <malmyzh@gmail.com>
À: "Somnath Roy" <Somnath.Roy@sandisk.com>
Cc: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Vendredi 24 Avril 2015 13:37:52
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi,Alexandre! 
Do not try to change the parameter vm.min_free_kbytes? 

2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com > : 


Alexandre, 
You can configure with --with-jemalloc or ./do_autogen -J to build ceph with jemalloc. 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com ] On Behalf Of Alexandre DERUMIER 
Sent: Thursday, April 23, 2015 4:56 AM 
To: Mark Nelson 
Cc: ceph-users; ceph-devel; Milosz Tanski 
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

>>If you have the means to compile the same version of ceph with 
>>jemalloc, I would be very interested to see how it does. 

Yes, sure. (I have around 3-4 weeks to do all the benchs) 

But I don't know how to do it ? 
I'm running the cluster on centos7.1, maybe it can be easy to patch the srpms to rebuild the package with jemalloc. 



----- Mail original ----- 
De: "Mark Nelson" < mnelson@redhat.com > 
À: "aderumier" < aderumier@odiso.com >, "Srinivasula Maram" < Srinivasula.Maram@sandisk.com > 
Cc: "ceph-users" < ceph-users@lists.ceph.com >, "ceph-devel" < ceph-devel@vger.kernel.org >, "Milosz Tanski" < milosz@adfin.com > 
Envoyé: Jeudi 23 Avril 2015 13:33:00 
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

Thanks for the testing Alexandre! 

If you have the means to compile the same version of ceph with jemalloc, I would be very interested to see how it does. 

In some ways I'm glad it turned out not to be NUMA. I still suspect we will have to deal with it at some point, but perhaps not today. ;) 

Mark 

On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: 
> Maybe it's tcmalloc related 
> I thinked to have patched it correctly, but perf show a lot of 
> tcmalloc::ThreadCache::ReleaseToCentralCache 
> 
> before osd restart (100k) 
> ------------------ 
> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] 
> tcmalloc::ThreadCache::ReleaseToCentralCache 
> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] 
> tcmalloc::CentralFreeList::FetchFromSpans 
> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] 
> tcmalloc::CentralFreeList::ReleaseToSpans 
> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper 
> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 
> [.] tcmalloc::CentralFreeList::ReleaseListToSpans 
> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd 
> libstdc++.so.6.0.19 [.] std::basic_string<char, 
> std::char_traits<char>, std::allocator<char> >::basic_string 0.91% 
> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd 
> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] 
> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] 
> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] 
> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] 
> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] 
> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] 
> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57% 
> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd 
> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] 
> [k] __schedule 
> 
> 
> after osd restart (300k iops) 
> ------------------------------ 
> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd 
> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper 
> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 
> [.] std::basic_string<char, std::char_traits<char>, 
> std::allocator<char> >::basic_string 1.34% ceph-osd 
> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 
> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd 
> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] 
> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] 
> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] 
> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] 
> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd 
> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] 
> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% 
> ceph-osd libtcmalloc.so.4.1.2 [.] 
> tcmalloc::CentralFreeList::FetchFromSpans 
> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% 
> ceph-osd ceph-osd [.] 
> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 
> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd 
> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.] 
> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd 
> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd 
> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] 
> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light 
> 
> ----- Mail original ----- 
> De: "aderumier" < aderumier@odiso.com > 
> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com > 
> Cc: "ceph-users" < ceph-users@lists.ceph.com >, "ceph-devel" 
> < ceph-devel@vger.kernel.org >, "Milosz Tanski" < milosz@adfin.com > 
> Envoyé: Jeudi 23 Avril 2015 10:00:34 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops 
> 
> Hi, 
> I'm hitting this bug again today. 
> 
> So don't seem to be numa related (I have try to flush linux buffer to be sure). 
> 
> and tcmalloc is patched (I don't known how to verify that it's ok). 
> 
> I don't have restarted osd yet. 
> 
> Maybe some perf trace could be usefulll ? 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" < aderumier@odiso.com > 
> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com > 
> Cc: "ceph-users" < ceph-users@lists.ceph.com >, "ceph-devel" 
> < ceph-devel@vger.kernel.org >, "Milosz Tanski" < milosz@adfin.com > 
> Envoyé: Mercredi 22 Avril 2015 18:30:26 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops 
> 
> Hi, 
> 
>>> I feel it is due to tcmalloc issue 
> 
> Indeed, I had patched one of my node, but not the other. 
> So maybe I have hit this bug. (but I can't confirm, I don't have traces). 
> 
> But numa interleaving seem to help in my case (maybe not from 100->300k, but 250k->300k). 
> 
> I need to do more long tests to confirm that. 
> 
> 
> ----- Mail original ----- 
> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com > 
> À: "Mark Nelson" < mnelson@redhat.com >, "aderumier" 
> < aderumier@odiso.com >, "Milosz Tanski" < milosz@adfin.com > 
> Cc: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" 
> < ceph-users@lists.ceph.com > 
> Envoyé: Mercredi 22 Avril 2015 16:34:33 
> Objet: RE: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops 
> 
> I feel it is due to tcmalloc issue 
> 
> I have seen similar issue in my setup after 20 days. 
> 
> Thanks, 
> Srinivas 
> 
> 
> 
> -----Original Message----- 
> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com ] On Behalf 
> Of Mark Nelson 
> Sent: Wednesday, April 22, 2015 7:31 PM 
> To: Alexandre DERUMIER; Milosz Tanski 
> Cc: ceph-devel; ceph-users 
> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
> daemon improve performance from 100k iops to 300k iops 
> 
> Hi Alexandre, 
> 
> We should discuss this at the perf meeting today. We knew NUMA node affinity issues were going to crop up sooner or later (and indeed already have in some cases), but this is pretty major. It's probably time to really dig in and figure out how to deal with this. 
> 
> Note: this is one of the reasons I like small nodes with single sockets and fewer OSDs. 
> 
> Mark 
> 
> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
>> Hi, 
>> 
>> I have done a lot of test today, and it seem indeed numa related. 
>> 
>> My numastat was 
>> 
>> # numastat 
>> node0 node1 
>> numa_hit 99075422 153976877 
>> numa_miss 167490965 1493663 
>> numa_foreign 1493663 167491417 
>> interleave_hit 157745 167015 
>> local_node 99049179 153830554 
>> other_node 167517697 1639986 
>> 
>> So, a lot of miss. 
>> 
>> In this case , I can reproduce ios going from 85k to 300k iops, up and down. 
>> 
>> now setting 
>> echo 0 > /proc/sys/kernel/numa_balancing 
>> 
>> and starting osd daemons with 
>> 
>> numactl --interleave=all /usr/bin/ceph-osd 
>> 
>> 
>> I have a constant 300k iops ! 
>> 
>> 
>> I wonder if it could be improve by binding osd daemons to specific numa node. 
>> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning. 
>> 
>> 
>> 
>> ----- Mail original ----- 
>> De: "Milosz Tanski" < milosz@adfin.com > 
>> À: "aderumier" < aderumier@odiso.com > 
>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" 
>> < ceph-users@lists.ceph.com > 
>> Envoyé: Mercredi 22 Avril 2015 12:54:23 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> 
>> 
>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote: 
>> 
>> 
>> I wonder if it could be numa related, 
>> 
>> I'm using centos 7.1, 
>> and auto numa balacning is enabled 
>> 
>> cat /proc/sys/kernel/numa_balancing = 1 
>> 
>> Maybe osd daemon access to buffer on wrong numa node. 
>> 
>> I'll try to reproduce the problem 
>> 
>> 
>> 
>> Can you force the degenerate case using numactl? To either affirm or deny your suspicion. 
>> 
>> 
>> 
>> 
>> ----- Mail original ----- 
>> De: "aderumier" < aderumier@odiso.com > 
>> À: "ceph-devel" < ceph-devel@vger.kernel.org >, "ceph-users" < 
>> ceph-users@lists.ceph.com > 
>> Envoyé: Mercredi 22 Avril 2015 10:40:05 
>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
>> improve performance from 100k iops to 300k iops 
>> 
>> Hi, 
>> 
>> I was doing some benchmarks, 
>> I have found an strange behaviour. 
>> 
>> Using fio with rbd engine, I was able to reach around 100k iops. 
>> (osd datas in linux buffer, iostat show 0% disk access) 
>> 
>> then after restarting all osd daemons, 
>> 
>> the same fio benchmark show now around 300k iops. 
>> (osd datas in linux buffer, iostat show 0% disk access) 
>> 
>> 
>> any ideas? 
>> 
>> 
>> 
>> 
>> before restarting osd 
>> --------------------- 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=rbd, iodepth=32 ... 
>> fio-2.2.7-10-g51e9 
>> Starting 10 processes 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
>> iops] [eta 14m:45s] 
>> fio: terminating on signal 2 
>> 
>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
>> (usec): 
>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
>> | 99.99th=[47360] 
>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
>> depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
>> mint=26215msec, maxt=26215msec 
>> 
>> Disk stats (read/write): 
>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=rbd, iodepth=32 
>> 
>> 
>> 
>> 
>> AFTER RESTARTING OSDS 
>> ---------------------- 
>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=rbd, iodepth=32 ... 
>> fio-2.2.7-10-g51e9 
>> Starting 10 processes 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> rbd engine: RBD version: 0.1.9 
>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
>> iops] [eta 01h:09m:27s] 
>> fio: terminating on signal 2 
>> 
>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, 
>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
>> clat 
>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
>> (usec): 
>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
>> | 99.99th=[436224] 
>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
>> target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, 
>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 
>> 
>> Disk stats (read/write): 
>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
>> 
>> 
>> 
>> 
>> CEPH LOG 
>> -------- 
>> 
>> before restarting osd 
>> ---------------------- 
>> 
>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 298 MB/s rd, 76465 op/s 
>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 333 MB/s rd, 85355 op/s 
>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 343 MB/s rd, 87932 op/s 
>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 328 MB/s rd, 84151 op/s 
>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 237 MB/s rd, 60855 op/s 
>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 144 MB/s rd, 36935 op/s 
>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 321 MB/s rd, 82334 op/s 
>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 368 MB/s rd, 94211 op/s 
>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 244 MB/s rd, 62644 op/s 
>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 175 MB/s rd, 44997 op/s 
>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 122 MB/s rd, 31259 op/s 
>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 229 MB/s rd, 58674 op/s 
>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 271 MB/s rd, 69501 op/s 
>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 211 MB/s rd, 54020 op/s 
>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 164 MB/s rd, 42001 op/s 
>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 134 MB/s rd, 34380 op/s 
>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 293 MB/s rd, 75213 op/s 
>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 337 MB/s rd, 86353 op/s 
>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 229 MB/s rd, 58839 op/s 
>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>> 1295 GB avail; 152 MB/s rd, 39117 op/s 
>> 
>> 
>> restarting osd 
>> --------------- 
>> 
>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
>> [INF] osd.0 marked itself down 
>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
>> [INF] osdmap e849: 9 osds: 8 up, 9 in 
>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>> stale+active+794 
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
>> kB/s rd, 130 op/s 
>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
>> [INF] osdmap e850: 9 osds: 8 up, 9 in 
>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>> stale+active+794 
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>> stale+active+794 
>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
>> [INF] osd.1 marked itself down 
>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
>> [INF] osdmap e851: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>> stale+active+684 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
>> [INF] osdmap e852: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>> stale+active+684 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
>> [INF] osd.2 marked itself down 
>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
>> [INF] osdmap e853: 9 osds: 6 up, 9 in 
>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>> stale+active+600 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
>> [INF] osdmap e854: 9 osds: 6 up, 9 in 
>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>> stale+active+600 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
>> [INF] osd.1 10.7.0.152:6800/14848 boot 
>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
>> [INF] osdmap e855: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>> stale+active+600 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
>> [INF] osd.3 marked itself down 
>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
>> [INF] osdmap e856: 9 osds: 6 up, 9 in 
>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 
>> stale+active+506 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
>> [INF] osd.6 marked itself down 
>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
>> [INF] osd.2 10.7.0.152:6812/15410 boot 
>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
>> [INF] osdmap e857: 9 osds: 6 up, 9 in 
>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>> stale+active+398 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
>> [INF] osdmap e858: 9 osds: 6 up, 9 in 
>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>> stale+active+398 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
>> [INF] osd.3 10.7.0.152:6816/15971 boot 
>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
>> [INF] osdmap e859: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>> stale+active+398 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
>> [INF] osd.7 marked itself down 
>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
>> [INF] osd.0 10.7.0.152:6804/14481 boot 
>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
>> [INF] osdmap e860: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 
>> stale+active+292 
>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>> used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
>> [INF] osd.6 10.7.0.153:6808/21955 boot 
>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
>> [INF] osdmap e861: 9 osds: 8 up, 9 in 
>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
>> 420 GB used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
>> [INF] osd.8 marked itself down 
>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
>> [INF] osdmap e862: 9 osds: 7 up, 9 in 
>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
>> 420 GB used, 874 GB / 1295 GB avail 
>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
>> [INF] osd.7 10.7.0.153:6804/22536 boot 
>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
>> [INF] osdmap e863: 9 osds: 8 up, 9 in 
>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
>> stale+active+remapped, 503 stale+active+clean, 4 
>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 
>> GB / 1295 GB avail 
>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
>> [INF] osdmap e864: 9 osds: 8 up, 9 in 
>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
>> stale+active+remapped, 503 stale+active+clean, 4 
>> active+undersized+degraded+remapped, 5 peering, 13 active 
>> 
>> 
>> AFTER OSD RESTART 
>> ------------------ 
>> 
>> 
>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 786 MB/s rd, 196 kop/s 
>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 932 MB/s rd, 233 kop/s 
>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 627 MB/s rd, 156 kop/s 
>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 529 MB/s rd, 132 kop/s 
>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 770 MB/s rd, 192 kop/s 
>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 649 MB/s rd, 162 kop/s 
>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 580 MB/s rd, 145 kop/s 
>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 962 MB/s rd, 240 kop/s 
>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 506 MB/s rd, 126 kop/s 
>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 774 MB/s rd, 193 kop/s 
>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 662 MB/s rd, 165 kop/s 
>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 593 MB/s rd, 148 kop/s 
>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>> 1295 GB avail; 938 MB/s rd, 234 kop/s 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> ________________________________ 
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 






-- 
С уважением, Фасихов Ирек Нургаязович 
Моб.: +79229045757 



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                           ` <1270073308.621430804.1429893511370.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2015-04-24 17:36                                             ` Stefan Priebe - Profihost AG
       [not found]                                               ` <BEEF4E3E-92BD-4E41-8E17-5BF9476A3674-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  2015-04-24 18:41                                               ` Somnath Roy
  2015-04-24 17:49                                             ` Milosz Tanski
  1 sibling, 2 replies; 35+ messages in thread
From: Stefan Priebe - Profihost AG @ 2015-04-24 17:36 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-users, ceph-devel, Milosz Tanski


[-- Attachment #1.1: Type: text/plain, Size: 39646 bytes --]

Is jemalloc recommanded in general? Does it also work for firefly?

Stefan

Excuse my typo sent from my mobile phone.

> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>:
> 
> Hi,
> 
> I have finished to rebuild ceph with jemalloc,
> 
> all seem to working fine.
> 
> I got a constant 300k iops for the moment, so no speed regression.
> 
> I'll do more long benchmark next week.
> 
> Regards,
> 
> Alexandre
> 
> ----- Mail original -----
> De: "Irek Fasikhov" <malmyzh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> À: "Somnath Roy" <Somnath.Roy-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: "aderumier" <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>, "Mark Nelson" <mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>, "ceph-devel" <ceph-devel-u79uwXL29TY@public.gmane.orgnel.org>, "Milosz Tanski" <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
> Envoyé: Vendredi 24 Avril 2015 13:37:52
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
> 
> Hi,Alexandre! 
> Do not try to change the parameter vm.min_free_kbytes? 
> 
> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org > : 
> 
> 
> Alexandre, 
> You can configure with --with-jemalloc or ./do_autogen -J to build ceph with jemalloc. 
> 
> Thanks & Regards 
> Somnath 
> 
> -----Original Message----- 
> From: ceph-users [mailto: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org ] On Behalf Of Alexandre DERUMIER 
> Sent: Thursday, April 23, 2015 4:56 AM 
> To: Mark Nelson 
> Cc: ceph-users; ceph-devel; Milosz Tanski 
> Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
>>> If you have the means to compile the same version of ceph with 
>>> jemalloc, I would be very interested to see how it does.
> 
> Yes, sure. (I have around 3-4 weeks to do all the benchs) 
> 
> But I don't know how to do it ? 
> I'm running the cluster on centos7.1, maybe it can be easy to patch the srpms to rebuild the package with jemalloc. 
> 
> 
> 
> ----- Mail original ----- 
> De: "Mark Nelson" < mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org > 
> À: "aderumier" < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org >, "Srinivasula Maram" < Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org > 
> Cc: "ceph-users" < ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org >, "ceph-devel" < ceph-devel@vger.kernel.org >, "Milosz Tanski" < milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org > 
> Envoyé: Jeudi 23 Avril 2015 13:33:00 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> Thanks for the testing Alexandre! 
> 
> If you have the means to compile the same version of ceph with jemalloc, I would be very interested to see how it does. 
> 
> In some ways I'm glad it turned out not to be NUMA. I still suspect we will have to deal with it at some point, but perhaps not today. ;) 
> 
> Mark 
> 
>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: 
>> Maybe it's tcmalloc related 
>> I thinked to have patched it correctly, but perf show a lot of 
>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>> 
>> before osd restart (100k) 
>> ------------------ 
>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] 
>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] 
>> tcmalloc::CentralFreeList::FetchFromSpans 
>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] 
>> tcmalloc::CentralFreeList::ReleaseToSpans 
>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper 
>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 
>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans 
>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd 
>> libstdc++.so.6.0.19 [.] std::basic_string<char, 
>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91% 
>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd 
>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] 
>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] 
>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] 
>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] 
>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] 
>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] 
>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57% 
>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd 
>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] 
>> [k] __schedule 
>> 
>> 
>> after osd restart (300k iops) 
>> ------------------------------ 
>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd 
>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper 
>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 
>> [.] std::basic_string<char, std::char_traits<char>, 
>> std::allocator<char> >::basic_string 1.34% ceph-osd 
>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 
>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd 
>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] 
>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] 
>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] 
>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] 
>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd 
>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] 
>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% 
>> ceph-osd libtcmalloc.so.4.1.2 [.] 
>> tcmalloc::CentralFreeList::FetchFromSpans 
>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% 
>> ceph-osd ceph-osd [.] 
>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 
>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd 
>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.] 
>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd 
>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd 
>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] 
>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light 
>> 
>> ----- Mail original ----- 
>> De: "aderumier" < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org > 
>> À: "Srinivasula Maram" < Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org > 
>> Cc: "ceph-users" < ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org >, "ceph-devel" 
>> < ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >, "Milosz Tanski" < milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org > 
>> Envoyé: Jeudi 23 Avril 2015 10:00:34 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> Hi, 
>> I'm hitting this bug again today. 
>> 
>> So don't seem to be numa related (I have try to flush linux buffer to be sure). 
>> 
>> and tcmalloc is patched (I don't known how to verify that it's ok). 
>> 
>> I don't have restarted osd yet. 
>> 
>> Maybe some perf trace could be usefulll ? 
>> 
>> 
>> ----- Mail original ----- 
>> De: "aderumier" < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org > 
>> À: "Srinivasula Maram" < Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org > 
>> Cc: "ceph-users" < ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org >, "ceph-devel" 
>> < ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >, "Milosz Tanski" < milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org > 
>> Envoyé: Mercredi 22 Avril 2015 18:30:26 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> Hi, 
>> 
>>>> I feel it is due to tcmalloc issue
>> 
>> Indeed, I had patched one of my node, but not the other. 
>> So maybe I have hit this bug. (but I can't confirm, I don't have traces). 
>> 
>> But numa interleaving seem to help in my case (maybe not from 100->300k, but 250k->300k). 
>> 
>> I need to do more long tests to confirm that. 
>> 
>> 
>> ----- Mail original ----- 
>> De: "Srinivasula Maram" < Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org > 
>> À: "Mark Nelson" < mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org >, "aderumier" 
>> < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org >, "Milosz Tanski" < milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org > 
>> Cc: "ceph-devel" < ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >, "ceph-users" 
>> < ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org > 
>> Envoyé: Mercredi 22 Avril 2015 16:34:33 
>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> I feel it is due to tcmalloc issue 
>> 
>> I have seen similar issue in my setup after 20 days. 
>> 
>> Thanks, 
>> Srinivas 
>> 
>> 
>> 
>> -----Original Message----- 
>> From: ceph-users [mailto: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org ] On Behalf 
>> Of Mark Nelson 
>> Sent: Wednesday, April 22, 2015 7:31 PM 
>> To: Alexandre DERUMIER; Milosz Tanski 
>> Cc: ceph-devel; ceph-users 
>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> Hi Alexandre, 
>> 
>> We should discuss this at the perf meeting today. We knew NUMA node affinity issues were going to crop up sooner or later (and indeed already have in some cases), but this is pretty major. It's probably time to really dig in and figure out how to deal with this. 
>> 
>> Note: this is one of the reasons I like small nodes with single sockets and fewer OSDs. 
>> 
>> Mark 
>> 
>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
>>> Hi, 
>>> 
>>> I have done a lot of test today, and it seem indeed numa related. 
>>> 
>>> My numastat was 
>>> 
>>> # numastat 
>>> node0 node1 
>>> numa_hit 99075422 153976877 
>>> numa_miss 167490965 1493663 
>>> numa_foreign 1493663 167491417 
>>> interleave_hit 157745 167015 
>>> local_node 99049179 153830554 
>>> other_node 167517697 1639986 
>>> 
>>> So, a lot of miss. 
>>> 
>>> In this case , I can reproduce ios going from 85k to 300k iops, up and down. 
>>> 
>>> now setting 
>>> echo 0 > /proc/sys/kernel/numa_balancing 
>>> 
>>> and starting osd daemons with 
>>> 
>>> numactl --interleave=all /usr/bin/ceph-osd 
>>> 
>>> 
>>> I have a constant 300k iops ! 
>>> 
>>> 
>>> I wonder if it could be improve by binding osd daemons to specific numa node. 
>>> I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning. 
>>> 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "Milosz Tanski" < milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org > 
>>> À: "aderumier" < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org > 
>>> Cc: "ceph-devel" < ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >, "ceph-users" 
>>> < ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org > 
>>> Envoyé: Mercredi 22 Avril 2015 12:54:23 
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> 
>>> 
>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier-U/x3PoR4x10@public.gmane.orgm > wrote: 
>>> 
>>> 
>>> I wonder if it could be numa related, 
>>> 
>>> I'm using centos 7.1, 
>>> and auto numa balacning is enabled 
>>> 
>>> cat /proc/sys/kernel/numa_balancing = 1 
>>> 
>>> Maybe osd daemon access to buffer on wrong numa node. 
>>> 
>>> I'll try to reproduce the problem 
>>> 
>>> 
>>> 
>>> Can you force the degenerate case using numactl? To either affirm or deny your suspicion. 
>>> 
>>> 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "aderumier" < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org > 
>>> À: "ceph-devel" < ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >, "ceph-users" < 
>>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org > 
>>> Envoyé: Mercredi 22 Avril 2015 10:40:05 
>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
>>> improve performance from 100k iops to 300k iops 
>>> 
>>> Hi, 
>>> 
>>> I was doing some benchmarks, 
>>> I have found an strange behaviour. 
>>> 
>>> Using fio with rbd engine, I was able to reach around 100k iops. 
>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>> 
>>> then after restarting all osd daemons, 
>>> 
>>> the same fio benchmark show now around 300k iops. 
>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>> 
>>> 
>>> any ideas? 
>>> 
>>> 
>>> 
>>> 
>>> before restarting osd 
>>> --------------------- 
>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>> ioengine=rbd, iodepth=32 ... 
>>> fio-2.2.7-10-g51e9 
>>> Starting 10 processes 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
>>> iops] [eta 14m:45s] 
>>> fio: terminating on signal 2 
>>> 
>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
>>> (usec): 
>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
>>> | 99.99th=[47360] 
>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
>>> depth=32 
>>> 
>>> Run status group 0 (all jobs): 
>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
>>> mint=26215msec, maxt=26215msec 
>>> 
>>> Disk stats (read/write): 
>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>> ioengine=rbd, iodepth=32 
>>> 
>>> 
>>> 
>>> 
>>> AFTER RESTARTING OSDS 
>>> ---------------------- 
>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>> ioengine=rbd, iodepth=32 ... 
>>> fio-2.2.7-10-g51e9 
>>> Starting 10 processes 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> rbd engine: RBD version: 0.1.9 
>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
>>> iops] [eta 01h:09m:27s] 
>>> fio: terminating on signal 2 
>>> 
>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, 
>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
>>> clat 
>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
>>> (usec): 
>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
>>> | 99.99th=[436224] 
>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
>>> target=0, window=0, percentile=100.00%, depth=32 
>>> 
>>> Run status group 0 (all jobs): 
>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, 
>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 
>>> 
>>> Disk stats (read/write): 
>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
>>> 
>>> 
>>> 
>>> 
>>> CEPH LOG 
>>> -------- 
>>> 
>>> before restarting osd 
>>> ---------------------- 
>>> 
>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 298 MB/s rd, 76465 op/s 
>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 333 MB/s rd, 85355 op/s 
>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 343 MB/s rd, 87932 op/s 
>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 328 MB/s rd, 84151 op/s 
>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 237 MB/s rd, 60855 op/s 
>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 144 MB/s rd, 36935 op/s 
>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 321 MB/s rd, 82334 op/s 
>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 368 MB/s rd, 94211 op/s 
>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 244 MB/s rd, 62644 op/s 
>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 175 MB/s rd, 44997 op/s 
>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 122 MB/s rd, 31259 op/s 
>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 229 MB/s rd, 58674 op/s 
>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 271 MB/s rd, 69501 op/s 
>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 211 MB/s rd, 54020 op/s 
>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 164 MB/s rd, 42001 op/s 
>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 134 MB/s rd, 34380 op/s 
>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 293 MB/s rd, 75213 op/s 
>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 337 MB/s rd, 86353 op/s 
>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 229 MB/s rd, 58839 op/s 
>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>> 1295 GB avail; 152 MB/s rd, 39117 op/s 
>>> 
>>> 
>>> restarting osd 
>>> --------------- 
>>> 
>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
>>> [INF] osd.0 marked itself down 
>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
>>> [INF] osdmap e849: 9 osds: 8 up, 9 in 
>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>> stale+active+794 
>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
>>> kB/s rd, 130 op/s 
>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
>>> [INF] osdmap e850: 9 osds: 8 up, 9 in 
>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>> stale+active+794 
>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>> stale+active+794 
>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
>>> [INF] osd.1 marked itself down 
>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
>>> [INF] osdmap e851: 9 osds: 7 up, 9 in 
>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>> stale+active+684 
>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>> used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
>>> [INF] osdmap e852: 9 osds: 7 up, 9 in 
>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>> stale+active+684 
>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>> used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
>>> [INF] osd.2 marked itself down 
>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
>>> [INF] osdmap e853: 9 osds: 6 up, 9 in 
>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>> stale+active+600 
>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>> used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
>>> [INF] osdmap e854: 9 osds: 6 up, 9 in 
>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>> stale+active+600 
>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>> used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
>>> [INF] osd.1 10.7.0.152:6800/14848 boot 
>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
>>> [INF] osdmap e855: 9 osds: 7 up, 9 in 
>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>> stale+active+600 
>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>> used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
>>> [INF] osd.3 marked itself down 
>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
>>> [INF] osdmap e856: 9 osds: 6 up, 9 in 
>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 
>>> stale+active+506 
>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>> used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
>>> [INF] osd.6 marked itself down 
>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
>>> [INF] osd.2 10.7.0.152:6812/15410 boot 
>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
>>> [INF] osdmap e857: 9 osds: 6 up, 9 in 
>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>> stale+active+398 
>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>> used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
>>> [INF] osdmap e858: 9 osds: 6 up, 9 in 
>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>> stale+active+398 
>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>> used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
>>> [INF] osd.3 10.7.0.152:6816/15971 boot 
>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
>>> [INF] osdmap e859: 9 osds: 7 up, 9 in 
>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>> stale+active+398 
>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>> used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
>>> [INF] osd.7 marked itself down 
>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
>>> [INF] osd.0 10.7.0.152:6804/14481 boot 
>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
>>> [INF] osdmap e860: 9 osds: 7 up, 9 in 
>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 
>>> stale+active+292 
>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>> used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
>>> [INF] osd.6 10.7.0.153:6808/21955 boot 
>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
>>> [INF] osdmap e861: 9 osds: 8 up, 9 in 
>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
>>> 420 GB used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
>>> [INF] osd.8 marked itself down 
>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
>>> [INF] osdmap e862: 9 osds: 7 up, 9 in 
>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
>>> 420 GB used, 874 GB / 1295 GB avail 
>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
>>> [INF] osd.7 10.7.0.153:6804/22536 boot 
>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
>>> [INF] osdmap e863: 9 osds: 8 up, 9 in 
>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
>>> stale+active+remapped, 503 stale+active+clean, 4 
>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 
>>> GB / 1295 GB avail 
>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
>>> [INF] osdmap e864: 9 osds: 8 up, 9 in 
>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
>>> stale+active+remapped, 503 stale+active+clean, 4 
>>> active+undersized+degraded+remapped, 5 peering, 13 active 
>>> 
>>> 
>>> AFTER OSD RESTART 
>>> ------------------ 
>>> 
>>> 
>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 786 MB/s rd, 196 kop/s 
>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 932 MB/s rd, 233 kop/s 
>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 627 MB/s rd, 156 kop/s 
>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 529 MB/s rd, 132 kop/s 
>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 770 MB/s rd, 192 kop/s 
>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 649 MB/s rd, 162 kop/s 
>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 580 MB/s rd, 145 kop/s 
>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 962 MB/s rd, 240 kop/s 
>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 506 MB/s rd, 126 kop/s 
>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 774 MB/s rd, 193 kop/s 
>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 662 MB/s rd, 165 kop/s 
>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 593 MB/s rd, 148 kop/s 
>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>> 1295 GB avail; 938 MB/s rd, 234 kop/s 
>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> ________________________________ 
>> 
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> 
> 
> 
> -- 
> С уважением, Фасихов Ирек Нургаязович 
> Моб.: +79229045757 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #1.2: Type: text/html, Size: 98551 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                           ` <1270073308.621430804.1429893511370.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  2015-04-24 17:36                                             ` Stefan Priebe - Profihost AG
@ 2015-04-24 17:49                                             ` Milosz Tanski
  1 sibling, 0 replies; 35+ messages in thread
From: Milosz Tanski @ 2015-04-24 17:49 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-users, ceph-devel

On Fri, Apr 24, 2015 at 12:38 PM, Alexandre DERUMIER
<aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org> wrote:
>
> Hi,
>
> I have finished to rebuild ceph with jemalloc,
>
> all seem to working fine.
>
> I got a constant 300k iops for the moment, so no speed regression.
>
> I'll do more long benchmark next week.
>
> Regards,
>
> Alexandre


In my experience jemalloc is much more proactive at returning memory
to the OS, vs. tcmalloc in the default setting is much greedier with
keeping/reusing memory. jemalloc tends to do better if you application
benefits from a large page cache. Also, jemalloc's aggressive behavior
is better if you're running a lot of applications per host because
you're less likely to trigger a kernel dirty write out when allocating
space (because you're not keeping large free cached around per
application).

Howard of Symas and LMDB fame did some benchmarking and comparison
here: http://symas.com/mdb/inmem/malloc/ He came to somewhat similar
conclusions.

It would be helpful if you can reproduce the issue with tcmalloc...
Turn on tcmalloc stats logging (every 1GB allocated or so), then
compare the size to claimed by tcmalloc to process RSS size. If you
can account for a large difference, esp. multipled times a number of
OSD that may be the culprit.

I know things have gotten better in tcmalloc. As in they fixed a few
bugs where really large allocations were never returned to the OS and
the turned down the default greediness. Sadly, distros have slow at
picking these up in the past. If this is a problem it might be worth
to have an option to build tcmalloc (using a version know to be good)
into Ceph at build time.

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                               ` <BEEF4E3E-92BD-4E41-8E17-5BF9476A3674-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2015-04-24 18:02                                                 ` Mark Nelson
       [not found]                                                   ` <553A8527.3020503-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-04-24 18:38                                                 ` Somnath Roy
  1 sibling, 1 reply; 35+ messages in thread
From: Mark Nelson @ 2015-04-24 18:02 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Alexandre DERUMIER
  Cc: ceph-users, ceph-devel, Milosz Tanski

We haven't done any kind of real testing on jemalloc, so use at your own 
peril.  Having said that, we've also been very interested in hearing 
community feedback from folks trying it out, so please feel free to give 
it a shot. :D

Mark

On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote:
> Is jemalloc recommanded in general? Does it also work for firefly?
>
> Stefan
>
> Excuse my typo sent from my mobile phone.
>
> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com
> <mailto:aderumier@odiso.com>>:
>
>> Hi,
>>
>> I have finished to rebuild ceph with jemalloc,
>>
>> all seem to working fine.
>>
>> I got a constant 300k iops for the moment, so no speed regression.
>>
>> I'll do more long benchmark next week.
>>
>> Regards,
>>
>> Alexandre
>>
>> ----- Mail original -----
>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>>
>> À: "Somnath Roy" <Somnath.Roy@sandisk.com
>> <mailto:Somnath.Roy@sandisk.com>>
>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>,
>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>,
>> "ceph-users" <ceph-users@lists.ceph.com
>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel"
>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>,
>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>>
>> Envoyé: Vendredi 24 Avril 2015 13:37:52
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
>> Hi,Alexandre!
>> Do not try to change the parameter vm.min_free_kbytes?
>>
>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com
>> <mailto:Somnath.Roy@sandisk.com> > :
>>
>>
>> Alexandre,
>> You can configure with --with-jemalloc or ./do_autogen -J to build
>> ceph with jemalloc.
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre
>> DERUMIER
>> Sent: Thursday, April 23, 2015 4:56 AM
>> To: Mark Nelson
>> Cc: ceph-users; ceph-devel; Milosz Tanski
>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
>>>> If you have the means to compile the same version of ceph with
>>>> jemalloc, I would be very interested to see how it does.
>>
>> Yes, sure. (I have around 3-4 weeks to do all the benchs)
>>
>> But I don't know how to do it ?
>> I'm running the cluster on centos7.1, maybe it can be easy to patch
>> the srpms to rebuild the package with jemalloc.
>>
>>
>>
>> ----- Mail original -----
>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >
>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >,
>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>> <mailto:Srinivasula.Maram@sandisk.com> >
>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" <
>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>> Envoyé: Jeudi 23 Avril 2015 13:33:00
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
>> Thanks for the testing Alexandre!
>>
>> If you have the means to compile the same version of ceph with
>> jemalloc, I would be very interested to see how it does.
>>
>> In some ways I'm glad it turned out not to be NUMA. I still suspect we
>> will have to deal with it at some point, but perhaps not today. ;)
>>
>> Mark
>>
>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote:
>>> Maybe it's tcmalloc related
>>> I thinked to have patched it correctly, but perf show a lot of
>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>
>>> before osd restart (100k)
>>> ------------------
>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.]
>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.]
>>> tcmalloc::CentralFreeList::FetchFromSpans
>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.]
>>> tcmalloc::CentralFreeList::ReleaseToSpans
>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper
>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2
>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans
>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd
>>> libstdc++.so.6.0.19 [.] std::basic_string<char,
>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91%
>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd
>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.]
>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k]
>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.]
>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k]
>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k]
>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k]
>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57%
>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd
>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms]
>>> [k] __schedule
>>>
>>>
>>> after osd restart (300k iops)
>>> ------------------------------
>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd
>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper
>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19
>>> [.] std::basic_string<char, std::char_traits<char>,
>>> std::allocator<char> >::basic_string 1.34% ceph-osd
>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache
>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd
>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.]
>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k]
>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.]
>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k]
>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu
>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd
>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k]
>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65%
>>> ceph-osd libtcmalloc.so.4.1.2 [.]
>>> tcmalloc::CentralFreeList::FetchFromSpans
>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61%
>>> ceph-osd ceph-osd [.]
>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd
>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.]
>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd
>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd
>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.]
>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light
>>>
>>> ----- Mail original -----
>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>> Envoyé: Jeudi 23 Avril 2015 10:00:34
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> Hi,
>>> I'm hitting this bug again today.
>>>
>>> So don't seem to be numa related (I have try to flush linux buffer to
>>> be sure).
>>>
>>> and tcmalloc is patched (I don't known how to verify that it's ok).
>>>
>>> I don't have restarted osd yet.
>>>
>>> Maybe some perf trace could be usefulll ?
>>>
>>>
>>> ----- Mail original -----
>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>> Envoyé: Mercredi 22 Avril 2015 18:30:26
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> Hi,
>>>
>>>>> I feel it is due to tcmalloc issue
>>>
>>> Indeed, I had patched one of my node, but not the other.
>>> So maybe I have hit this bug. (but I can't confirm, I don't have
>>> traces).
>>>
>>> But numa interleaving seem to help in my case (maybe not from
>>> 100->300k, but 250k->300k).
>>>
>>> I need to do more long tests to confirm that.
>>>
>>>
>>> ----- Mail original -----
>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >,
>>> "aderumier"
>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski"
>>> < milosz@adfin.com <mailto:milosz@adfin.com> >
>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>> Envoyé: Mercredi 22 Avril 2015 16:34:33
>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> I feel it is due to tcmalloc issue
>>>
>>> I have seen similar issue in my setup after 20 days.
>>>
>>> Thanks,
>>> Srinivas
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf
>>> Of Mark Nelson
>>> Sent: Wednesday, April 22, 2015 7:31 PM
>>> To: Alexandre DERUMIER; Milosz Tanski
>>> Cc: ceph-devel; ceph-users
>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> Hi Alexandre,
>>>
>>> We should discuss this at the perf meeting today. We knew NUMA node
>>> affinity issues were going to crop up sooner or later (and indeed
>>> already have in some cases), but this is pretty major. It's probably
>>> time to really dig in and figure out how to deal with this.
>>>
>>> Note: this is one of the reasons I like small nodes with single
>>> sockets and fewer OSDs.
>>>
>>> Mark
>>>
>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
>>>> Hi,
>>>>
>>>> I have done a lot of test today, and it seem indeed numa related.
>>>>
>>>> My numastat was
>>>>
>>>> # numastat
>>>> node0 node1
>>>> numa_hit 99075422 153976877
>>>> numa_miss 167490965 1493663
>>>> numa_foreign 1493663 167491417
>>>> interleave_hit 157745 167015
>>>> local_node 99049179 153830554
>>>> other_node 167517697 1639986
>>>>
>>>> So, a lot of miss.
>>>>
>>>> In this case , I can reproduce ios going from 85k to 300k iops, up
>>>> and down.
>>>>
>>>> now setting
>>>> echo 0 > /proc/sys/kernel/numa_balancing
>>>>
>>>> and starting osd daemons with
>>>>
>>>> numactl --interleave=all /usr/bin/ceph-osd
>>>>
>>>>
>>>> I have a constant 300k iops !
>>>>
>>>>
>>>> I wonder if it could be improve by binding osd daemons to specific
>>>> numa node.
>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also
>>>> require ceph.conf osd threads tunning.
>>>>
>>>>
>>>>
>>>> ----- Mail original -----
>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>>
>>>>
>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER <
>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote:
>>>>
>>>>
>>>> I wonder if it could be numa related,
>>>>
>>>> I'm using centos 7.1,
>>>> and auto numa balacning is enabled
>>>>
>>>> cat /proc/sys/kernel/numa_balancing = 1
>>>>
>>>> Maybe osd daemon access to buffer on wrong numa node.
>>>>
>>>> I'll try to reproduce the problem
>>>>
>>>>
>>>>
>>>> Can you force the degenerate case using numactl? To either affirm or
>>>> deny your suspicion.
>>>>
>>>>
>>>>
>>>>
>>>> ----- Mail original -----
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" <
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05
>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon
>>>> improve performance from 100k iops to 300k iops
>>>>
>>>> Hi,
>>>>
>>>> I was doing some benchmarks,
>>>> I have found an strange behaviour.
>>>>
>>>> Using fio with rbd engine, I was able to reach around 100k iops.
>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>
>>>> then after restarting all osd daemons,
>>>>
>>>> the same fio benchmark show now around 300k iops.
>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>
>>>>
>>>> any ideas?
>>>>
>>>>
>>>>
>>>>
>>>> before restarting osd
>>>> ---------------------
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>> ioengine=rbd, iodepth=32 ...
>>>> fio-2.2.7-10-g51e9
>>>> Starting 10 processes
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0
>>>> iops] [eta 14m:45s]
>>>> fio: terminating on signal 2
>>>>
>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr
>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt=
>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat
>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
>>>> (usec):
>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
>>>> | 99.99th=[47360]
>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01%
>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%,
>>>> depth=32
>>>>
>>>> Run status group 0 (all jobs):
>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s,
>>>> mint=26215msec, maxt=26215msec
>>>>
>>>> Disk stats (read/write):
>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>> ioengine=rbd, iodepth=32
>>>>
>>>>
>>>>
>>>>
>>>> AFTER RESTARTING OSDS
>>>> ----------------------
>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>> ioengine=rbd, iodepth=32 ...
>>>> fio-2.2.7-10-g51e9
>>>> Starting 10 processes
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> rbd engine: RBD version: 0.1.9
>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0
>>>> iops] [eta 01h:09m:27s]
>>>> fio: terminating on signal 2
>>>>
>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr
>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218,
>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35
>>>> clat
>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
>>>> (usec):
>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
>>>> | 99.99th=[436224]
>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec)
>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
>>>> target=0, window=0, percentile=100.00%, depth=32
>>>>
>>>> Run status group 0 (all jobs):
>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s,
>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec
>>>>
>>>> Disk stats (read/write):
>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>>>>
>>>>
>>>>
>>>>
>>>> CEPH LOG
>>>> --------
>>>>
>>>> before restarting osd
>>>> ----------------------
>>>>
>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s
>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s
>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s
>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s
>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s
>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s
>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s
>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s
>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s
>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s
>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s
>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s
>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s
>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s
>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s
>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s
>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s
>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s
>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s
>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s
>>>>
>>>>
>>>> restarting osd
>>>> ---------------
>>>>
>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
>>>> [INF] osd.0 marked itself down
>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in
>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>> stale+active+794
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516
>>>> kB/s rd, 130 op/s
>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in
>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>> stale+active+794
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>> stale+active+794
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
>>>> [INF] osd.1 marked itself down
>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in
>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
>>>> stale+active+684
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>> used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in
>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
>>>> stale+active+684
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>> used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
>>>> [INF] osd.2 marked itself down
>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in
>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>> stale+active+600
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>> used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in
>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>> stale+active+600
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>> used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
>>>> [INF] osd.1 10.7.0.152:6800/14848 boot
>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in
>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>> stale+active+600
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>> used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
>>>> [INF] osd.3 marked itself down
>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in
>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped,
>>>> stale+active+506
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>> used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
>>>> [INF] osd.6 marked itself down
>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
>>>> [INF] osd.2 10.7.0.152:6812/15410 boot
>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in
>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>> stale+active+398
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>> used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in
>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>> stale+active+398
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>> used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
>>>> [INF] osd.3 10.7.0.152:6816/15971 boot
>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in
>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>> stale+active+398
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>> used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
>>>> [INF] osd.7 marked itself down
>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
>>>> [INF] osd.0 10.7.0.152:6804/14481 boot
>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in
>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped,
>>>> stale+active+292
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>> used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
>>>> [INF] osd.6 10.7.0.153:6808/21955 boot
>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in
>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data,
>>>> 420 GB used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
>>>> [INF] osd.8 marked itself down
>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in
>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data,
>>>> 420 GB used, 874 GB / 1295 GB avail
>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
>>>> [INF] osd.7 10.7.0.153:6804/22536 boot
>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in
>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped,
>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874
>>>> GB / 1295 GB avail
>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in
>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>> active+undersized+degraded+remapped, 5 peering, 13 active
>>>>
>>>>
>>>> AFTER OSD RESTART
>>>> ------------------
>>>>
>>>>
>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s
>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s
>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s
>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s
>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s
>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s
>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s
>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s
>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s
>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s
>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s
>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s
>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s
>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s
>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s
>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s
>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> ________________________________
>>>
>>> PLEASE NOTE: The information contained in this electronic mail
>>> message is intended only for the use of the designated recipient(s)
>>> named above. If the reader of this message is not the intended
>>> recipient, you are hereby notified that you have received this
>>> message in error and that any review, dissemination, distribution, or
>>> copying of this message is strictly prohibited. If you have received
>>> this communication in error, please notify the sender by telephone or
>>> e-mail (as shown above) immediately and destroy any and all copies of
>>> this message in your possession (whether hard copies or
>>> electronically stored copies).
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>>
>>
>> --
>> С уважением, Фасихов Ирек Нургаязович
>> Моб.: +79229045757
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> <mailto:majordomo@vger.kernel.org>
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                               ` <BEEF4E3E-92BD-4E41-8E17-5BF9476A3674-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  2015-04-24 18:02                                                 ` Mark Nelson
@ 2015-04-24 18:38                                                 ` Somnath Roy
  1 sibling, 0 replies; 35+ messages in thread
From: Somnath Roy @ 2015-04-24 18:38 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Alexandre DERUMIER
  Cc: ceph-users, ceph-devel, Milosz Tanski


[-- Attachment #1.1: Type: text/plain, Size: 36922 bytes --]

<< Does it also work for firefly?
No, it is integrated post Giant.

Thanks & Regards
Somnath

From: Stefan Priebe - Profihost AG [mailto:s.priebe@profihost.ag]
Sent: Friday, April 24, 2015 10:37 AM
To: Alexandre DERUMIER
Cc: ceph-users; ceph-devel; Somnath Roy; Mark Nelson; Milosz Tanski
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Is jemalloc recommanded in general? Does it also work for firefly?

Stefan

Excuse my typo sent from my mobile phone.

Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com<mailto:aderumier@odiso.com>>:
Hi,

I have finished to rebuild ceph with jemalloc,

all seem to working fine.

I got a constant 300k iops for the moment, so no speed regression.

I'll do more long benchmark next week.

Regards,

Alexandre

----- Mail original -----
De: "Irek Fasikhov" <malmyzh@gmail.com<mailto:malmyzh@gmail.com>>
À: "Somnath Roy" <Somnath.Roy@sandisk.com<mailto:Somnath.Roy@sandisk.com>>
Cc: "aderumier" <aderumier@odiso.com<mailto:aderumier@odiso.com>>, "Mark Nelson" <mnelson@redhat.com<mailto:mnelson@redhat.com>>, "ceph-users" <ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>>, "ceph-devel" <ceph-devel@vger.kernel.org<mailto:ceph-devel@vger.kernel.org>>, "Milosz Tanski" <milosz@adfin.com<mailto:milosz@adfin.com>>
Envoyé: Vendredi 24 Avril 2015 13:37:52
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi,Alexandre!
Do not try to change the parameter vm.min_free_kbytes?

2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com<mailto:Somnath.Roy@sandisk.com> > :


Alexandre,
You can configure with --with-jemalloc or ./do_autogen -J to build ceph with jemalloc.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com<mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre DERUMIER
Sent: Thursday, April 23, 2015 4:56 AM
To: Mark Nelson
Cc: ceph-users; ceph-devel; Milosz Tanski
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops


If you have the means to compile the same version of ceph with
jemalloc, I would be very interested to see how it does.

Yes, sure. (I have around 3-4 weeks to do all the benchs)

But I don't know how to do it ?
I'm running the cluster on centos7.1, maybe it can be easy to patch the srpms to rebuild the package with jemalloc.



----- Mail original -----
De: "Mark Nelson" < mnelson@redhat.com<mailto:mnelson@redhat.com> >
À: "aderumier" < aderumier@odiso.com<mailto:aderumier@odiso.com> >, "Srinivasula Maram" < Srinivasula.Maram@sandisk.com<mailto:Srinivasula.Maram@sandisk.com> >
Cc: "ceph-users" < ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >, "ceph-devel" < ceph-devel@vger.kernel.org<mailto:ceph-devel@vger.kernel.org> >, "Milosz Tanski" < milosz@adfin.com<mailto:milosz@adfin.com> >
Envoyé: Jeudi 23 Avril 2015 13:33:00
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Thanks for the testing Alexandre!

If you have the means to compile the same version of ceph with jemalloc, I would be very interested to see how it does.

In some ways I'm glad it turned out not to be NUMA. I still suspect we will have to deal with it at some point, but perhaps not today. ;)

Mark

On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote:

Maybe it's tcmalloc related
I thinked to have patched it correctly, but perf show a lot of
tcmalloc::ThreadCache::ReleaseToCentralCache

before osd restart (100k)
------------------
11.66% ceph-osd libtcmalloc.so.4.1.2 [.]
tcmalloc::ThreadCache::ReleaseToCentralCache
8.51% ceph-osd libtcmalloc.so.4.1.2 [.]
tcmalloc::CentralFreeList::FetchFromSpans
3.04% ceph-osd libtcmalloc.so.4.1.2 [.]
tcmalloc::CentralFreeList::ReleaseToSpans
2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper
[kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2
[.] tcmalloc::CentralFreeList::ReleaseListToSpans
1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd
libstdc++.so.6.0.19 [.] std::basic_string<char,
std::char_traits<char>, std::allocator<char> >::basic_string 0.91%
ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd
libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.]
Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k]
copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.]
pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k]
_raw_spin_lock 0.63% swapper [kernel.kallsyms] [k]
native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k]
avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57%
ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd
[kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms]
[k] __schedule


after osd restart (300k iops)
------------------------------
3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd
libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper
[kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19
[.] std::basic_string<char, std::char_traits<char>,
std::allocator<char> >::basic_string 1.34% ceph-osd
libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache
1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd
ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.]
pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k]
copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.]
pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k]
_raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu
0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd
ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k]
__schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65%
ceph-osd libtcmalloc.so.4.1.2 [.]
tcmalloc::CentralFreeList::FetchFromSpans
0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61%
ceph-osd ceph-osd [.]
std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd
libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.]
operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd
libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd
[kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.]
vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light

----- Mail original -----
De: "aderumier" < aderumier@odiso.com<mailto:aderumier@odiso.com> >
À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com<mailto:Srinivasula.Maram@sandisk.com> >
Cc: "ceph-users" < ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
< ceph-devel@vger.kernel.org<mailto:ceph-devel@vger.kernel.org> >, "Milosz Tanski" < milosz@adfin.com<mailto:milosz@adfin.com> >
Envoyé: Jeudi 23 Avril 2015 10:00:34
Objet: Re: [ceph-users] strange benchmark problem : restarting osd
daemon improve performance from 100k iops to 300k iops

Hi,
I'm hitting this bug again today.

So don't seem to be numa related (I have try to flush linux buffer to be sure).

and tcmalloc is patched (I don't known how to verify that it's ok).

I don't have restarted osd yet.

Maybe some perf trace could be usefulll ?


----- Mail original -----
De: "aderumier" < aderumier@odiso.com<mailto:aderumier@odiso.com> >
À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com<mailto:Srinivasula.Maram@sandisk.com> >
Cc: "ceph-users" < ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
< ceph-devel@vger.kernel.org<mailto:ceph-devel@vger.kernel.org> >, "Milosz Tanski" < milosz@adfin.com<mailto:milosz@adfin.com> >
Envoyé: Mercredi 22 Avril 2015 18:30:26
Objet: Re: [ceph-users] strange benchmark problem : restarting osd
daemon improve performance from 100k iops to 300k iops

Hi,

I feel it is due to tcmalloc issue

Indeed, I had patched one of my node, but not the other.
So maybe I have hit this bug. (but I can't confirm, I don't have traces).

But numa interleaving seem to help in my case (maybe not from 100->300k, but 250k->300k).

I need to do more long tests to confirm that.


----- Mail original -----
De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com<mailto:Srinivasula.Maram@sandisk.com> >
À: "Mark Nelson" < mnelson@redhat.com<mailto:mnelson@redhat.com> >, "aderumier"
< aderumier@odiso.com<mailto:aderumier@odiso.com> >, "Milosz Tanski" < milosz@adfin.com<mailto:milosz@adfin.com> >
Cc: "ceph-devel" < ceph-devel@vger.kernel.org<mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
< ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >
Envoyé: Mercredi 22 Avril 2015 16:34:33
Objet: RE: [ceph-users] strange benchmark problem : restarting osd
daemon improve performance from 100k iops to 300k iops

I feel it is due to tcmalloc issue

I have seen similar issue in my setup after 20 days.

Thanks,
Srinivas



-----Original Message-----
From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com<mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf
Of Mark Nelson
Sent: Wednesday, April 22, 2015 7:31 PM
To: Alexandre DERUMIER; Milosz Tanski
Cc: ceph-devel; ceph-users
Subject: Re: [ceph-users] strange benchmark problem : restarting osd
daemon improve performance from 100k iops to 300k iops

Hi Alexandre,

We should discuss this at the perf meeting today. We knew NUMA node affinity issues were going to crop up sooner or later (and indeed already have in some cases), but this is pretty major. It's probably time to really dig in and figure out how to deal with this.

Note: this is one of the reasons I like small nodes with single sockets and fewer OSDs.

Mark

On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
Hi,

I have done a lot of test today, and it seem indeed numa related.

My numastat was

# numastat
node0 node1
numa_hit 99075422 153976877
numa_miss 167490965 1493663
numa_foreign 1493663 167491417
interleave_hit 157745 167015
local_node 99049179 153830554
other_node 167517697 1639986

So, a lot of miss.

In this case , I can reproduce ios going from 85k to 300k iops, up and down.

now setting
echo 0 > /proc/sys/kernel/numa_balancing

and starting osd daemons with

numactl --interleave=all /usr/bin/ceph-osd


I have a constant 300k iops !


I wonder if it could be improve by binding osd daemons to specific numa node.
I have 2 numanode of 10 cores with 6 osd, but I think it also require ceph.conf osd threads tunning.



----- Mail original -----
De: "Milosz Tanski" < milosz@adfin.com<mailto:milosz@adfin.com> >
À: "aderumier" < aderumier@odiso.com<mailto:aderumier@odiso.com> >
Cc: "ceph-devel" < ceph-devel@vger.kernel.org<mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
< ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >
Envoyé: Mercredi 22 Avril 2015 12:54:23
Objet: Re: [ceph-users] strange benchmark problem : restarting osd
daemon improve performance from 100k iops to 300k iops



On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < aderumier@odiso.com<mailto:aderumier@odiso.com> > wrote:


I wonder if it could be numa related,

I'm using centos 7.1,
and auto numa balacning is enabled

cat /proc/sys/kernel/numa_balancing = 1

Maybe osd daemon access to buffer on wrong numa node.

I'll try to reproduce the problem



Can you force the degenerate case using numactl? To either affirm or deny your suspicion.




----- Mail original -----
De: "aderumier" < aderumier@odiso.com<mailto:aderumier@odiso.com> >
À: "ceph-devel" < ceph-devel@vger.kernel.org<mailto:ceph-devel@vger.kernel.org> >, "ceph-users" <
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >
Envoyé: Mercredi 22 Avril 2015 10:40:05
Objet: [ceph-users] strange benchmark problem : restarting osd daemon
improve performance from 100k iops to 300k iops

Hi,

I was doing some benchmarks,
I have found an strange behaviour.

Using fio with rbd engine, I was able to reach around 100k iops.
(osd datas in linux buffer, iostat show 0% disk access)

then after restarting all osd daemons,

the same fio benchmark show now around 300k iops.
(osd datas in linux buffer, iostat show 0% disk access)


any ideas?




before restarting osd
---------------------
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
ioengine=rbd, iodepth=32 ...
fio-2.2.7-10-g51e9
Starting 10 processes
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0
iops] [eta 14m:45s]
fio: terminating on signal 2

rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr
22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt=
26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat
(usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
(usec):
| 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
| 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
| 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
| 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
| 99.99th=[47360]
bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01%
cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
=64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%,
depth=32

Run status group 0 (all jobs):
READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s,
mint=26215msec, maxt=26215msec

Disk stats (read/write):
sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
[root@ceph1-3 fiorbd]# ./fio fiorbd
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
ioengine=rbd, iodepth=32




AFTER RESTARTING OSDS
----------------------
[root@ceph1-3 fiorbd]# ./fio fiorbd
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
ioengine=rbd, iodepth=32 ...
fio-2.2.7-10-g51e9
Starting 10 processes
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
rbd engine: RBD version: 0.1.9
^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0
iops] [eta 01h:09m:27s]
fio: terminating on signal 2

rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr
22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218,
runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35
clat
(usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
(usec):
| 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
| 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
| 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
| 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
| 99.99th=[436224]
bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec)
: 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s,
maxb=1036.8MB/s, mint=7389msec, maxt=7389msec

Disk stats (read/write):
sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%




CEPH LOG
--------

before restarting osd
----------------------

2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
[INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 298 MB/s rd, 76465 op/s
2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
[INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 333 MB/s rd, 85355 op/s
2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
[INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 343 MB/s rd, 87932 op/s
2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
[INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 328 MB/s rd, 84151 op/s
2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
[INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 237 MB/s rd, 60855 op/s
2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
[INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 144 MB/s rd, 36935 op/s
2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
[INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 321 MB/s rd, 82334 op/s
2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
[INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 368 MB/s rd, 94211 op/s
2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
[INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 244 MB/s rd, 62644 op/s
2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
[INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 175 MB/s rd, 44997 op/s
2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
[INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 122 MB/s rd, 31259 op/s
2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
[INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 229 MB/s rd, 58674 op/s
2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
[INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 271 MB/s rd, 69501 op/s
2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
[INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 211 MB/s rd, 54020 op/s
2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
[INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 164 MB/s rd, 42001 op/s
2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
[INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 134 MB/s rd, 34380 op/s
2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
[INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 293 MB/s rd, 75213 op/s
2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
[INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 337 MB/s rd, 86353 op/s
2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
[INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 229 MB/s rd, 58839 op/s
2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
[INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
1295 GB avail; 152 MB/s rd, 39117 op/s


restarting osd
---------------

2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
[INF] osd.0 marked itself down
2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
[INF] osdmap e849: 9 osds: 8 up, 9 in
2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
[INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
stale+active+794
active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516
kB/s rd, 130 op/s
2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
[INF] osdmap e850: 9 osds: 8 up, 9 in
2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
[INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
stale+active+794
active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
[INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
stale+active+794
active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
[INF] osd.1 marked itself down
2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
[INF] osdmap e851: 9 osds: 7 up, 9 in
2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
[INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
stale+active+684
active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
used, 874 GB / 1295 GB avail
2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
[INF] osdmap e852: 9 osds: 7 up, 9 in
2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
[INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
stale+active+684
active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
used, 874 GB / 1295 GB avail
2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
[INF] osd.2 marked itself down
2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
[INF] osdmap e853: 9 osds: 6 up, 9 in
2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
[INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
stale+active+600
active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
used, 874 GB / 1295 GB avail
2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
[INF] osdmap e854: 9 osds: 6 up, 9 in
2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
[INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
stale+active+600
active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
used, 874 GB / 1295 GB avail
2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
[INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
[INF] osd.1 10.7.0.152:6800/14848 boot
2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
[INF] osdmap e855: 9 osds: 7 up, 9 in
2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
[INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
stale+active+600
active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
used, 874 GB / 1295 GB avail
2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
[INF] osd.3 marked itself down
2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
[INF] osdmap e856: 9 osds: 6 up, 9 in
2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
[INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
stale+active+remapped, 394 stale+active+clean, 37 active+remapped,
stale+active+506
active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
used, 874 GB / 1295 GB avail
2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
[INF] osd.6 marked itself down
2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
[INF] osd.2 10.7.0.152:6812/15410 boot
2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
[INF] osdmap e857: 9 osds: 6 up, 9 in
2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
[INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
stale+active+398
active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
used, 874 GB / 1295 GB avail
2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
[INF] osdmap e858: 9 osds: 6 up, 9 in
2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
[INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
stale+active+398
active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
used, 874 GB / 1295 GB avail
2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
[INF] osd.3 10.7.0.152:6816/15971 boot
2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
[INF] osdmap e859: 9 osds: 7 up, 9 in
2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
[INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
stale+active+398
active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
used, 874 GB / 1295 GB avail
2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
[INF] osd.7 marked itself down
2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
[INF] osd.0 10.7.0.152:6804/14481 boot
2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
[INF] osdmap e860: 9 osds: 7 up, 9 in
2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
[INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
stale+active+remapped, 608 stale+active+clean, 27 active+remapped,
stale+active+292
active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
used, 874 GB / 1295 GB avail
2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
[INF] osd.6 10.7.0.153:6808/21955 boot
2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
[INF] osdmap e861: 9 osds: 8 up, 9 in
2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
[INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
stale+active+remapped, 498 stale+active+clean, 2 peering, 28
active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data,
420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
[INF] osd.8 marked itself down
2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
[INF] osdmap e862: 9 osds: 7 up, 9 in
2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
[INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
stale+active+remapped, 587 stale+active+clean, 2 peering, 11
active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data,
420 GB used, 874 GB / 1295 GB avail
2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
[INF] osd.7 10.7.0.153:6804/22536 boot
2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
[INF] osdmap e863: 9 osds: 8 up, 9 in
2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
[INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
stale+active+remapped, 503 stale+active+clean, 4
active+undersized+degraded+remapped, 5 peering, 13 active+remapped,
397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874
GB / 1295 GB avail
2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
[INF] osdmap e864: 9 osds: 8 up, 9 in
2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
[INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
stale+active+remapped, 503 stale+active+clean, 4
active+undersized+degraded+remapped, 5 peering, 13 active


AFTER OSD RESTART
------------------


2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
[INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 786 MB/s rd, 196 kop/s
2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
[INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 1578 MB/s rd, 394 kop/s
2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
[INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 932 MB/s rd, 233 kop/s
2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
[INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 627 MB/s rd, 156 kop/s
2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
[INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 1034 MB/s rd, 258 kop/s
2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
[INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 529 MB/s rd, 132 kop/s
2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
[INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 770 MB/s rd, 192 kop/s
2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
[INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 1358 MB/s rd, 339 kop/s
2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
[INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 649 MB/s rd, 162 kop/s
2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
[INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 580 MB/s rd, 145 kop/s
2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
[INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 962 MB/s rd, 240 kop/s
2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
[INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 506 MB/s rd, 126 kop/s
2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
[INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 774 MB/s rd, 193 kop/s
2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
[INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 1363 MB/s rd, 340 kop/s
2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
[INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 662 MB/s rd, 165 kop/s
2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
[INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 593 MB/s rd, 148 kop/s
2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
[INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
1295 GB avail; 938 MB/s rd, 234 kop/s

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






--
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org<mailto:majordomo@vger.kernel.org>
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #1.2: Type: text/html, Size: 161615 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
  2015-04-24 17:36                                             ` Stefan Priebe - Profihost AG
       [not found]                                               ` <BEEF4E3E-92BD-4E41-8E17-5BF9476A3674-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2015-04-24 18:41                                               ` Somnath Roy
  1 sibling, 0 replies; 35+ messages in thread
From: Somnath Roy @ 2015-04-24 18:41 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Alexandre DERUMIER
  Cc: ceph-users, ceph-devel, Milosz Tanski


[-- Attachment #1.1: Type: text/plain, Size: 1217 bytes --]

<< Does it also work for firefly?
No, it is integrated post Giant.

Thanks & Regards
Somnath

From: Stefan Priebe - Profihost AG [mailto:s.priebe@profihost.ag]
Sent: Friday, April 24, 2015 10:37 AM
To: Alexandre DERUMIER
Cc: ceph-users; ceph-devel; Somnath Roy; Mark Nelson; Milosz Tanski
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Is jemalloc recommanded in general? Does it also work for firefly?

Stefan

Excuse my typo sent from my mobile phone.



________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


[-- Attachment #1.2: Type: text/html, Size: 4625 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                                   ` <553A8527.3020503-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-04-25  4:45                                                     ` Alexandre DERUMIER
       [not found]                                                       ` <1416266012.633995646.1429937143054.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-25  4:45 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-users, ceph-devel, Milosz Tanski

>>We haven't done any kind of real testing on jemalloc, so use at your own 
>>peril. Having said that, we've also been very interested in hearing 
>>community feedback from folks trying it out, so please feel free to give 
>>it a shot. :D 

Some feedback, I have runned bench all the night, no speed regression.

And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse)

with tcmalloc :

10 fio-rbd jobs = 300k iops
15 fio-rbd jobs = 290k iops
20 fio-rbd jobs = 270k iops
40 fio-rbd jobs = 250k iops

(all with up and down values during the fio bench)


with jemalloc:

10 fio-rbd jobs = 300k iops
15 fio-rbd jobs = 320k iops
20 fio-rbd jobs = 330k iops
40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%)

(all with contant values during the fio bench)

----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Vendredi 24 Avril 2015 20:02:15
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

We haven't done any kind of real testing on jemalloc, so use at your own 
peril. Having said that, we've also been very interested in hearing 
community feedback from folks trying it out, so please feel free to give 
it a shot. :D 

Mark 

On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote: 
> Is jemalloc recommanded in general? Does it also work for firefly? 
> 
> Stefan 
> 
> Excuse my typo sent from my mobile phone. 
> 
> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com 
> <mailto:aderumier@odiso.com>>: 
> 
>> Hi, 
>> 
>> I have finished to rebuild ceph with jemalloc, 
>> 
>> all seem to working fine. 
>> 
>> I got a constant 300k iops for the moment, so no speed regression. 
>> 
>> I'll do more long benchmark next week. 
>> 
>> Regards, 
>> 
>> Alexandre 
>> 
>> ----- Mail original ----- 
>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>> 
>> À: "Somnath Roy" <Somnath.Roy@sandisk.com 
>> <mailto:Somnath.Roy@sandisk.com>> 
>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>, 
>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>, 
>> "ceph-users" <ceph-users@lists.ceph.com 
>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel" 
>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>, 
>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>> 
>> Envoyé: Vendredi 24 Avril 2015 13:37:52 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> Hi,Alexandre! 
>> Do not try to change the parameter vm.min_free_kbytes? 
>> 
>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com 
>> <mailto:Somnath.Roy@sandisk.com> > : 
>> 
>> 
>> Alexandre, 
>> You can configure with --with-jemalloc or ./do_autogen -J to build 
>> ceph with jemalloc. 
>> 
>> Thanks & Regards 
>> Somnath 
>> 
>> -----Original Message----- 
>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre 
>> DERUMIER 
>> Sent: Thursday, April 23, 2015 4:56 AM 
>> To: Mark Nelson 
>> Cc: ceph-users; ceph-devel; Milosz Tanski 
>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>>>> If you have the means to compile the same version of ceph with 
>>>> jemalloc, I would be very interested to see how it does. 
>> 
>> Yes, sure. (I have around 3-4 weeks to do all the benchs) 
>> 
>> But I don't know how to do it ? 
>> I'm running the cluster on centos7.1, maybe it can be easy to patch 
>> the srpms to rebuild the package with jemalloc. 
>> 
>> 
>> 
>> ----- Mail original ----- 
>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> > 
>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >, 
>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>> <mailto:Srinivasula.Maram@sandisk.com> > 
>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" < 
>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>> Envoyé: Jeudi 23 Avril 2015 13:33:00 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> Thanks for the testing Alexandre! 
>> 
>> If you have the means to compile the same version of ceph with 
>> jemalloc, I would be very interested to see how it does. 
>> 
>> In some ways I'm glad it turned out not to be NUMA. I still suspect we 
>> will have to deal with it at some point, but perhaps not today. ;) 
>> 
>> Mark 
>> 
>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: 
>>> Maybe it's tcmalloc related 
>>> I thinked to have patched it correctly, but perf show a lot of 
>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>> 
>>> before osd restart (100k) 
>>> ------------------ 
>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::CentralFreeList::ReleaseToSpans 
>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper 
>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 
>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans 
>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd 
>>> libstdc++.so.6.0.19 [.] std::basic_string<char, 
>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91% 
>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd 
>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] 
>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] 
>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] 
>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] 
>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] 
>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] 
>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57% 
>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd 
>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] 
>>> [k] __schedule 
>>> 
>>> 
>>> after osd restart (300k iops) 
>>> ------------------------------ 
>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd 
>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper 
>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 
>>> [.] std::basic_string<char, std::char_traits<char>, 
>>> std::allocator<char> >::basic_string 1.34% ceph-osd 
>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 
>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd 
>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] 
>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] 
>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] 
>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] 
>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd 
>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] 
>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% 
>>> ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% 
>>> ceph-osd ceph-osd [.] 
>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 
>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd 
>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.] 
>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd 
>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd 
>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] 
>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light 
>>> 
>>> ----- Mail original ----- 
>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>> Envoyé: Jeudi 23 Avril 2015 10:00:34 
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Hi, 
>>> I'm hitting this bug again today. 
>>> 
>>> So don't seem to be numa related (I have try to flush linux buffer to 
>>> be sure). 
>>> 
>>> and tcmalloc is patched (I don't known how to verify that it's ok). 
>>> 
>>> I don't have restarted osd yet. 
>>> 
>>> Maybe some perf trace could be usefulll ? 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>> Envoyé: Mercredi 22 Avril 2015 18:30:26 
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Hi, 
>>> 
>>>>> I feel it is due to tcmalloc issue 
>>> 
>>> Indeed, I had patched one of my node, but not the other. 
>>> So maybe I have hit this bug. (but I can't confirm, I don't have 
>>> traces). 
>>> 
>>> But numa interleaving seem to help in my case (maybe not from 
>>> 100->300k, but 250k->300k). 
>>> 
>>> I need to do more long tests to confirm that. 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >, 
>>> "aderumier" 
>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski" 
>>> < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>> Envoyé: Mercredi 22 Avril 2015 16:34:33 
>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> I feel it is due to tcmalloc issue 
>>> 
>>> I have seen similar issue in my setup after 20 days. 
>>> 
>>> Thanks, 
>>> Srinivas 
>>> 
>>> 
>>> 
>>> -----Original Message----- 
>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf 
>>> Of Mark Nelson 
>>> Sent: Wednesday, April 22, 2015 7:31 PM 
>>> To: Alexandre DERUMIER; Milosz Tanski 
>>> Cc: ceph-devel; ceph-users 
>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Hi Alexandre, 
>>> 
>>> We should discuss this at the perf meeting today. We knew NUMA node 
>>> affinity issues were going to crop up sooner or later (and indeed 
>>> already have in some cases), but this is pretty major. It's probably 
>>> time to really dig in and figure out how to deal with this. 
>>> 
>>> Note: this is one of the reasons I like small nodes with single 
>>> sockets and fewer OSDs. 
>>> 
>>> Mark 
>>> 
>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
>>>> Hi, 
>>>> 
>>>> I have done a lot of test today, and it seem indeed numa related. 
>>>> 
>>>> My numastat was 
>>>> 
>>>> # numastat 
>>>> node0 node1 
>>>> numa_hit 99075422 153976877 
>>>> numa_miss 167490965 1493663 
>>>> numa_foreign 1493663 167491417 
>>>> interleave_hit 157745 167015 
>>>> local_node 99049179 153830554 
>>>> other_node 167517697 1639986 
>>>> 
>>>> So, a lot of miss. 
>>>> 
>>>> In this case , I can reproduce ios going from 85k to 300k iops, up 
>>>> and down. 
>>>> 
>>>> now setting 
>>>> echo 0 > /proc/sys/kernel/numa_balancing 
>>>> 
>>>> and starting osd daemons with 
>>>> 
>>>> numactl --interleave=all /usr/bin/ceph-osd 
>>>> 
>>>> 
>>>> I have a constant 300k iops ! 
>>>> 
>>>> 
>>>> I wonder if it could be improve by binding osd daemons to specific 
>>>> numa node. 
>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also 
>>>> require ceph.conf osd threads tunning. 
>>>> 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23 
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < 
>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote: 
>>>> 
>>>> 
>>>> I wonder if it could be numa related, 
>>>> 
>>>> I'm using centos 7.1, 
>>>> and auto numa balacning is enabled 
>>>> 
>>>> cat /proc/sys/kernel/numa_balancing = 1 
>>>> 
>>>> Maybe osd daemon access to buffer on wrong numa node. 
>>>> 
>>>> I'll try to reproduce the problem 
>>>> 
>>>> 
>>>> 
>>>> Can you force the degenerate case using numactl? To either affirm or 
>>>> deny your suspicion. 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" < 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05 
>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
>>>> improve performance from 100k iops to 300k iops 
>>>> 
>>>> Hi, 
>>>> 
>>>> I was doing some benchmarks, 
>>>> I have found an strange behaviour. 
>>>> 
>>>> Using fio with rbd engine, I was able to reach around 100k iops. 
>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>> 
>>>> then after restarting all osd daemons, 
>>>> 
>>>> the same fio benchmark show now around 300k iops. 
>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>> 
>>>> 
>>>> any ideas? 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> before restarting osd 
>>>> --------------------- 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>> ioengine=rbd, iodepth=32 ... 
>>>> fio-2.2.7-10-g51e9 
>>>> Starting 10 processes 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
>>>> iops] [eta 14m:45s] 
>>>> fio: terminating on signal 2 
>>>> 
>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
>>>> (usec): 
>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
>>>> | 99.99th=[47360] 
>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
>>>> depth=32 
>>>> 
>>>> Run status group 0 (all jobs): 
>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
>>>> mint=26215msec, maxt=26215msec 
>>>> 
>>>> Disk stats (read/write): 
>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>> ioengine=rbd, iodepth=32 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> AFTER RESTARTING OSDS 
>>>> ---------------------- 
>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>> ioengine=rbd, iodepth=32 ... 
>>>> fio-2.2.7-10-g51e9 
>>>> Starting 10 processes 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
>>>> iops] [eta 01h:09m:27s] 
>>>> fio: terminating on signal 2 
>>>> 
>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, 
>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
>>>> clat 
>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
>>>> (usec): 
>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
>>>> | 99.99th=[436224] 
>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
>>>> target=0, window=0, percentile=100.00%, depth=32 
>>>> 
>>>> Run status group 0 (all jobs): 
>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, 
>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 
>>>> 
>>>> Disk stats (read/write): 
>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> CEPH LOG 
>>>> -------- 
>>>> 
>>>> before restarting osd 
>>>> ---------------------- 
>>>> 
>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s 
>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s 
>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s 
>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s 
>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s 
>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s 
>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s 
>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s 
>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s 
>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s 
>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s 
>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s 
>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s 
>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s 
>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s 
>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s 
>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s 
>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s 
>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s 
>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s 
>>>> 
>>>> 
>>>> restarting osd 
>>>> --------------- 
>>>> 
>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
>>>> [INF] osd.0 marked itself down 
>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>> stale+active+794 
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
>>>> kB/s rd, 130 op/s 
>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>> stale+active+794 
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>> stale+active+794 
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
>>>> [INF] osd.1 marked itself down 
>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>> stale+active+684 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>> stale+active+684 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
>>>> [INF] osd.2 marked itself down 
>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>> stale+active+600 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>> stale+active+600 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
>>>> [INF] osd.1 10.7.0.152:6800/14848 boot 
>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>> stale+active+600 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
>>>> [INF] osd.3 marked itself down 
>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 
>>>> stale+active+506 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
>>>> [INF] osd.6 marked itself down 
>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
>>>> [INF] osd.2 10.7.0.152:6812/15410 boot 
>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>> stale+active+398 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>> stale+active+398 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
>>>> [INF] osd.3 10.7.0.152:6816/15971 boot 
>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>> stale+active+398 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
>>>> [INF] osd.7 marked itself down 
>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
>>>> [INF] osd.0 10.7.0.152:6804/14481 boot 
>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 
>>>> stale+active+292 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
>>>> [INF] osd.6 10.7.0.153:6808/21955 boot 
>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
>>>> [INF] osd.8 marked itself down 
>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
>>>> [INF] osd.7 10.7.0.153:6804/22536 boot 
>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 
>>>> GB / 1295 GB avail 
>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>> active+undersized+degraded+remapped, 5 peering, 13 active 
>>>> 
>>>> 
>>>> AFTER OSD RESTART 
>>>> ------------------ 
>>>> 
>>>> 
>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s 
>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s 
>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s 
>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s 
>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s 
>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s 
>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s 
>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s 
>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s 
>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s 
>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s 
>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s 
>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s 
>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>>> ________________________________ 
>>> 
>>> PLEASE NOTE: The information contained in this electronic mail 
>>> message is intended only for the use of the designated recipient(s) 
>>> named above. If the reader of this message is not the intended 
>>> recipient, you are hereby notified that you have received this 
>>> message in error and that any review, dissemination, distribution, or 
>>> copying of this message is strictly prohibited. If you have received 
>>> this communication in error, please notify the sender by telephone or 
>>> e-mail (as shown above) immediately and destroy any and all copies of 
>>> this message in your possession (whether hard copies or 
>>> electronically stored copies). 
>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> С уважением, Фасихов Ирек Нургаязович 
>> Моб.: +79229045757 
>> 
>> 
>> 
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>> the body of a message to majordomo@vger.kernel.org 
>> <mailto:majordomo@vger.kernel.org> 
>> More majordomo info at http://vger.kernel.org/majordomo-info.html 


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                                       ` <1416266012.633995646.1429937143054.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2015-04-27  5:01                                                         ` Alexandre DERUMIER
       [not found]                                                           ` <94521284.683547748.1430110881936.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-27  5:01 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-users, ceph-devel, Milosz Tanski

Hi,

also another big difference,

I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc.

I'll retest tcmalloc, because I was prety sure to have patched it correctly.


----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Samedi 25 Avril 2015 06:45:43
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

>>We haven't done any kind of real testing on jemalloc, so use at your own 
>>peril. Having said that, we've also been very interested in hearing 
>>community feedback from folks trying it out, so please feel free to give 
>>it a shot. :D 

Some feedback, I have runned bench all the night, no speed regression. 

And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse) 

with tcmalloc : 

10 fio-rbd jobs = 300k iops 
15 fio-rbd jobs = 290k iops 
20 fio-rbd jobs = 270k iops 
40 fio-rbd jobs = 250k iops 

(all with up and down values during the fio bench) 


with jemalloc: 

10 fio-rbd jobs = 300k iops 
15 fio-rbd jobs = 320k iops 
20 fio-rbd jobs = 330k iops 
40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%) 

(all with contant values during the fio bench) 

----- Mail original ----- 
De: "Mark Nelson" <mnelson@redhat.com> 
À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com> 
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com> 
Envoyé: Vendredi 24 Avril 2015 20:02:15 
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

We haven't done any kind of real testing on jemalloc, so use at your own 
peril. Having said that, we've also been very interested in hearing 
community feedback from folks trying it out, so please feel free to give 
it a shot. :D 

Mark 

On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote: 
> Is jemalloc recommanded in general? Does it also work for firefly? 
> 
> Stefan 
> 
> Excuse my typo sent from my mobile phone. 
> 
> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com 
> <mailto:aderumier@odiso.com>>: 
> 
>> Hi, 
>> 
>> I have finished to rebuild ceph with jemalloc, 
>> 
>> all seem to working fine. 
>> 
>> I got a constant 300k iops for the moment, so no speed regression. 
>> 
>> I'll do more long benchmark next week. 
>> 
>> Regards, 
>> 
>> Alexandre 
>> 
>> ----- Mail original ----- 
>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>> 
>> À: "Somnath Roy" <Somnath.Roy@sandisk.com 
>> <mailto:Somnath.Roy@sandisk.com>> 
>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>, 
>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>, 
>> "ceph-users" <ceph-users@lists.ceph.com 
>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel" 
>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>, 
>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>> 
>> Envoyé: Vendredi 24 Avril 2015 13:37:52 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> Hi,Alexandre! 
>> Do not try to change the parameter vm.min_free_kbytes? 
>> 
>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com 
>> <mailto:Somnath.Roy@sandisk.com> > : 
>> 
>> 
>> Alexandre, 
>> You can configure with --with-jemalloc or ./do_autogen -J to build 
>> ceph with jemalloc. 
>> 
>> Thanks & Regards 
>> Somnath 
>> 
>> -----Original Message----- 
>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre 
>> DERUMIER 
>> Sent: Thursday, April 23, 2015 4:56 AM 
>> To: Mark Nelson 
>> Cc: ceph-users; ceph-devel; Milosz Tanski 
>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>>>> If you have the means to compile the same version of ceph with 
>>>> jemalloc, I would be very interested to see how it does. 
>> 
>> Yes, sure. (I have around 3-4 weeks to do all the benchs) 
>> 
>> But I don't know how to do it ? 
>> I'm running the cluster on centos7.1, maybe it can be easy to patch 
>> the srpms to rebuild the package with jemalloc. 
>> 
>> 
>> 
>> ----- Mail original ----- 
>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> > 
>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >, 
>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>> <mailto:Srinivasula.Maram@sandisk.com> > 
>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" < 
>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>> Envoyé: Jeudi 23 Avril 2015 13:33:00 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> Thanks for the testing Alexandre! 
>> 
>> If you have the means to compile the same version of ceph with 
>> jemalloc, I would be very interested to see how it does. 
>> 
>> In some ways I'm glad it turned out not to be NUMA. I still suspect we 
>> will have to deal with it at some point, but perhaps not today. ;) 
>> 
>> Mark 
>> 
>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: 
>>> Maybe it's tcmalloc related 
>>> I thinked to have patched it correctly, but perf show a lot of 
>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>> 
>>> before osd restart (100k) 
>>> ------------------ 
>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::CentralFreeList::ReleaseToSpans 
>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper 
>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 
>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans 
>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd 
>>> libstdc++.so.6.0.19 [.] std::basic_string<char, 
>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91% 
>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd 
>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] 
>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] 
>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] 
>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] 
>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] 
>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] 
>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57% 
>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd 
>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] 
>>> [k] __schedule 
>>> 
>>> 
>>> after osd restart (300k iops) 
>>> ------------------------------ 
>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd 
>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper 
>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 
>>> [.] std::basic_string<char, std::char_traits<char>, 
>>> std::allocator<char> >::basic_string 1.34% ceph-osd 
>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 
>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd 
>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] 
>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] 
>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] 
>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] 
>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd 
>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] 
>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% 
>>> ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% 
>>> ceph-osd ceph-osd [.] 
>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 
>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd 
>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.] 
>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd 
>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd 
>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] 
>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light 
>>> 
>>> ----- Mail original ----- 
>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>> Envoyé: Jeudi 23 Avril 2015 10:00:34 
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Hi, 
>>> I'm hitting this bug again today. 
>>> 
>>> So don't seem to be numa related (I have try to flush linux buffer to 
>>> be sure). 
>>> 
>>> and tcmalloc is patched (I don't known how to verify that it's ok). 
>>> 
>>> I don't have restarted osd yet. 
>>> 
>>> Maybe some perf trace could be usefulll ? 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>> Envoyé: Mercredi 22 Avril 2015 18:30:26 
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Hi, 
>>> 
>>>>> I feel it is due to tcmalloc issue 
>>> 
>>> Indeed, I had patched one of my node, but not the other. 
>>> So maybe I have hit this bug. (but I can't confirm, I don't have 
>>> traces). 
>>> 
>>> But numa interleaving seem to help in my case (maybe not from 
>>> 100->300k, but 250k->300k). 
>>> 
>>> I need to do more long tests to confirm that. 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >, 
>>> "aderumier" 
>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski" 
>>> < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>> Envoyé: Mercredi 22 Avril 2015 16:34:33 
>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> I feel it is due to tcmalloc issue 
>>> 
>>> I have seen similar issue in my setup after 20 days. 
>>> 
>>> Thanks, 
>>> Srinivas 
>>> 
>>> 
>>> 
>>> -----Original Message----- 
>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf 
>>> Of Mark Nelson 
>>> Sent: Wednesday, April 22, 2015 7:31 PM 
>>> To: Alexandre DERUMIER; Milosz Tanski 
>>> Cc: ceph-devel; ceph-users 
>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Hi Alexandre, 
>>> 
>>> We should discuss this at the perf meeting today. We knew NUMA node 
>>> affinity issues were going to crop up sooner or later (and indeed 
>>> already have in some cases), but this is pretty major. It's probably 
>>> time to really dig in and figure out how to deal with this. 
>>> 
>>> Note: this is one of the reasons I like small nodes with single 
>>> sockets and fewer OSDs. 
>>> 
>>> Mark 
>>> 
>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
>>>> Hi, 
>>>> 
>>>> I have done a lot of test today, and it seem indeed numa related. 
>>>> 
>>>> My numastat was 
>>>> 
>>>> # numastat 
>>>> node0 node1 
>>>> numa_hit 99075422 153976877 
>>>> numa_miss 167490965 1493663 
>>>> numa_foreign 1493663 167491417 
>>>> interleave_hit 157745 167015 
>>>> local_node 99049179 153830554 
>>>> other_node 167517697 1639986 
>>>> 
>>>> So, a lot of miss. 
>>>> 
>>>> In this case , I can reproduce ios going from 85k to 300k iops, up 
>>>> and down. 
>>>> 
>>>> now setting 
>>>> echo 0 > /proc/sys/kernel/numa_balancing 
>>>> 
>>>> and starting osd daemons with 
>>>> 
>>>> numactl --interleave=all /usr/bin/ceph-osd 
>>>> 
>>>> 
>>>> I have a constant 300k iops ! 
>>>> 
>>>> 
>>>> I wonder if it could be improve by binding osd daemons to specific 
>>>> numa node. 
>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also 
>>>> require ceph.conf osd threads tunning. 
>>>> 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23 
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < 
>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote: 
>>>> 
>>>> 
>>>> I wonder if it could be numa related, 
>>>> 
>>>> I'm using centos 7.1, 
>>>> and auto numa balacning is enabled 
>>>> 
>>>> cat /proc/sys/kernel/numa_balancing = 1 
>>>> 
>>>> Maybe osd daemon access to buffer on wrong numa node. 
>>>> 
>>>> I'll try to reproduce the problem 
>>>> 
>>>> 
>>>> 
>>>> Can you force the degenerate case using numactl? To either affirm or 
>>>> deny your suspicion. 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" < 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05 
>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
>>>> improve performance from 100k iops to 300k iops 
>>>> 
>>>> Hi, 
>>>> 
>>>> I was doing some benchmarks, 
>>>> I have found an strange behaviour. 
>>>> 
>>>> Using fio with rbd engine, I was able to reach around 100k iops. 
>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>> 
>>>> then after restarting all osd daemons, 
>>>> 
>>>> the same fio benchmark show now around 300k iops. 
>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>> 
>>>> 
>>>> any ideas? 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> before restarting osd 
>>>> --------------------- 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>> ioengine=rbd, iodepth=32 ... 
>>>> fio-2.2.7-10-g51e9 
>>>> Starting 10 processes 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
>>>> iops] [eta 14m:45s] 
>>>> fio: terminating on signal 2 
>>>> 
>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
>>>> (usec): 
>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
>>>> | 99.99th=[47360] 
>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
>>>> depth=32 
>>>> 
>>>> Run status group 0 (all jobs): 
>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
>>>> mint=26215msec, maxt=26215msec 
>>>> 
>>>> Disk stats (read/write): 
>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>> ioengine=rbd, iodepth=32 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> AFTER RESTARTING OSDS 
>>>> ---------------------- 
>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>> ioengine=rbd, iodepth=32 ... 
>>>> fio-2.2.7-10-g51e9 
>>>> Starting 10 processes 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
>>>> iops] [eta 01h:09m:27s] 
>>>> fio: terminating on signal 2 
>>>> 
>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, 
>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
>>>> clat 
>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
>>>> (usec): 
>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
>>>> | 99.99th=[436224] 
>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
>>>> target=0, window=0, percentile=100.00%, depth=32 
>>>> 
>>>> Run status group 0 (all jobs): 
>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, 
>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 
>>>> 
>>>> Disk stats (read/write): 
>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> CEPH LOG 
>>>> -------- 
>>>> 
>>>> before restarting osd 
>>>> ---------------------- 
>>>> 
>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s 
>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s 
>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s 
>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s 
>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s 
>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s 
>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s 
>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s 
>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s 
>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s 
>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s 
>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s 
>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s 
>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s 
>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s 
>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s 
>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s 
>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s 
>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s 
>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s 
>>>> 
>>>> 
>>>> restarting osd 
>>>> --------------- 
>>>> 
>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
>>>> [INF] osd.0 marked itself down 
>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>> stale+active+794 
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
>>>> kB/s rd, 130 op/s 
>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>> stale+active+794 
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>> stale+active+794 
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
>>>> [INF] osd.1 marked itself down 
>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>> stale+active+684 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>> stale+active+684 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
>>>> [INF] osd.2 marked itself down 
>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>> stale+active+600 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>> stale+active+600 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
>>>> [INF] osd.1 10.7.0.152:6800/14848 boot 
>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>> stale+active+600 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
>>>> [INF] osd.3 marked itself down 
>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 
>>>> stale+active+506 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
>>>> [INF] osd.6 marked itself down 
>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
>>>> [INF] osd.2 10.7.0.152:6812/15410 boot 
>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>> stale+active+398 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>> stale+active+398 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
>>>> [INF] osd.3 10.7.0.152:6816/15971 boot 
>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>> stale+active+398 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
>>>> [INF] osd.7 marked itself down 
>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
>>>> [INF] osd.0 10.7.0.152:6804/14481 boot 
>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 
>>>> stale+active+292 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
>>>> [INF] osd.6 10.7.0.153:6808/21955 boot 
>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
>>>> [INF] osd.8 marked itself down 
>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
>>>> [INF] osd.7 10.7.0.153:6804/22536 boot 
>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 
>>>> GB / 1295 GB avail 
>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>> active+undersized+degraded+remapped, 5 peering, 13 active 
>>>> 
>>>> 
>>>> AFTER OSD RESTART 
>>>> ------------------ 
>>>> 
>>>> 
>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s 
>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s 
>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s 
>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s 
>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s 
>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s 
>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s 
>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s 
>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s 
>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s 
>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s 
>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s 
>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s 
>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>>> ________________________________ 
>>> 
>>> PLEASE NOTE: The information contained in this electronic mail 
>>> message is intended only for the use of the designated recipient(s) 
>>> named above. If the reader of this message is not the intended 
>>> recipient, you are hereby notified that you have received this 
>>> message in error and that any review, dissemination, distribution, or 
>>> copying of this message is strictly prohibited. If you have received 
>>> this communication in error, please notify the sender by telephone or 
>>> e-mail (as shown above) immediately and destroy any and all copies of 
>>> this message in your possession (whether hard copies or 
>>> electronically stored copies). 
>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> С уважением, Фасихов Ирек Нургаязович 
>> Моб.: +79229045757 
>> 
>> 
>> 
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>> the body of a message to majordomo@vger.kernel.org 
>> <mailto:majordomo@vger.kernel.org> 
>> More majordomo info at http://vger.kernel.org/majordomo-info.html 


_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                                           ` <94521284.683547748.1430110881936.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2015-04-27  5:46                                                             ` Alexandre DERUMIER
  2015-04-27  6:12                                                               ` [ceph-users] " Venkateswara Rao Jujjuri
       [not found]                                                               ` <1545533798.684536916.1430113581382.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 2 replies; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-27  5:46 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-users, ceph-devel, Milosz Tanski

>>I'll retest tcmalloc, because I was prety sure to have patched it correctly. 

Ok, I really think I have patched tcmalloc wrongly.
I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread).

So better than jemalloc.


----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Lundi 27 Avril 2015 07:01:21
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi, 

also another big difference, 

I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc. 

I'll retest tcmalloc, because I was prety sure to have patched it correctly. 


----- Mail original ----- 
De: "aderumier" <aderumier@odiso.com> 
À: "Mark Nelson" <mnelson@redhat.com> 
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com> 
Envoyé: Samedi 25 Avril 2015 06:45:43 
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

>>We haven't done any kind of real testing on jemalloc, so use at your own 
>>peril. Having said that, we've also been very interested in hearing 
>>community feedback from folks trying it out, so please feel free to give 
>>it a shot. :D 

Some feedback, I have runned bench all the night, no speed regression. 

And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse) 

with tcmalloc : 

10 fio-rbd jobs = 300k iops 
15 fio-rbd jobs = 290k iops 
20 fio-rbd jobs = 270k iops 
40 fio-rbd jobs = 250k iops 

(all with up and down values during the fio bench) 


with jemalloc: 

10 fio-rbd jobs = 300k iops 
15 fio-rbd jobs = 320k iops 
20 fio-rbd jobs = 330k iops 
40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%) 

(all with contant values during the fio bench) 

----- Mail original ----- 
De: "Mark Nelson" <mnelson@redhat.com> 
À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com> 
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com> 
Envoyé: Vendredi 24 Avril 2015 20:02:15 
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 

We haven't done any kind of real testing on jemalloc, so use at your own 
peril. Having said that, we've also been very interested in hearing 
community feedback from folks trying it out, so please feel free to give 
it a shot. :D 

Mark 

On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote: 
> Is jemalloc recommanded in general? Does it also work for firefly? 
> 
> Stefan 
> 
> Excuse my typo sent from my mobile phone. 
> 
> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com 
> <mailto:aderumier@odiso.com>>: 
> 
>> Hi, 
>> 
>> I have finished to rebuild ceph with jemalloc, 
>> 
>> all seem to working fine. 
>> 
>> I got a constant 300k iops for the moment, so no speed regression. 
>> 
>> I'll do more long benchmark next week. 
>> 
>> Regards, 
>> 
>> Alexandre 
>> 
>> ----- Mail original ----- 
>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>> 
>> À: "Somnath Roy" <Somnath.Roy@sandisk.com 
>> <mailto:Somnath.Roy@sandisk.com>> 
>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>, 
>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>, 
>> "ceph-users" <ceph-users@lists.ceph.com 
>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel" 
>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>, 
>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>> 
>> Envoyé: Vendredi 24 Avril 2015 13:37:52 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> Hi,Alexandre! 
>> Do not try to change the parameter vm.min_free_kbytes? 
>> 
>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com 
>> <mailto:Somnath.Roy@sandisk.com> > : 
>> 
>> 
>> Alexandre, 
>> You can configure with --with-jemalloc or ./do_autogen -J to build 
>> ceph with jemalloc. 
>> 
>> Thanks & Regards 
>> Somnath 
>> 
>> -----Original Message----- 
>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre 
>> DERUMIER 
>> Sent: Thursday, April 23, 2015 4:56 AM 
>> To: Mark Nelson 
>> Cc: ceph-users; ceph-devel; Milosz Tanski 
>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>>>> If you have the means to compile the same version of ceph with 
>>>> jemalloc, I would be very interested to see how it does. 
>> 
>> Yes, sure. (I have around 3-4 weeks to do all the benchs) 
>> 
>> But I don't know how to do it ? 
>> I'm running the cluster on centos7.1, maybe it can be easy to patch 
>> the srpms to rebuild the package with jemalloc. 
>> 
>> 
>> 
>> ----- Mail original ----- 
>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> > 
>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >, 
>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>> <mailto:Srinivasula.Maram@sandisk.com> > 
>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" < 
>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>> Envoyé: Jeudi 23 Avril 2015 13:33:00 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>> daemon improve performance from 100k iops to 300k iops 
>> 
>> Thanks for the testing Alexandre! 
>> 
>> If you have the means to compile the same version of ceph with 
>> jemalloc, I would be very interested to see how it does. 
>> 
>> In some ways I'm glad it turned out not to be NUMA. I still suspect we 
>> will have to deal with it at some point, but perhaps not today. ;) 
>> 
>> Mark 
>> 
>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: 
>>> Maybe it's tcmalloc related 
>>> I thinked to have patched it correctly, but perf show a lot of 
>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>> 
>>> before osd restart (100k) 
>>> ------------------ 
>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::CentralFreeList::ReleaseToSpans 
>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper 
>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 
>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans 
>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd 
>>> libstdc++.so.6.0.19 [.] std::basic_string<char, 
>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91% 
>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd 
>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] 
>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] 
>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] 
>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] 
>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] 
>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] 
>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57% 
>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd 
>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] 
>>> [k] __schedule 
>>> 
>>> 
>>> after osd restart (300k iops) 
>>> ------------------------------ 
>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd 
>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper 
>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 
>>> [.] std::basic_string<char, std::char_traits<char>, 
>>> std::allocator<char> >::basic_string 1.34% ceph-osd 
>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 
>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd 
>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] 
>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] 
>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] 
>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] 
>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd 
>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] 
>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% 
>>> ceph-osd libtcmalloc.so.4.1.2 [.] 
>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% 
>>> ceph-osd ceph-osd [.] 
>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 
>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd 
>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.] 
>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd 
>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd 
>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] 
>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light 
>>> 
>>> ----- Mail original ----- 
>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>> Envoyé: Jeudi 23 Avril 2015 10:00:34 
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Hi, 
>>> I'm hitting this bug again today. 
>>> 
>>> So don't seem to be numa related (I have try to flush linux buffer to 
>>> be sure). 
>>> 
>>> and tcmalloc is patched (I don't known how to verify that it's ok). 
>>> 
>>> I don't have restarted osd yet. 
>>> 
>>> Maybe some perf trace could be usefulll ? 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>> Envoyé: Mercredi 22 Avril 2015 18:30:26 
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Hi, 
>>> 
>>>>> I feel it is due to tcmalloc issue 
>>> 
>>> Indeed, I had patched one of my node, but not the other. 
>>> So maybe I have hit this bug. (but I can't confirm, I don't have 
>>> traces). 
>>> 
>>> But numa interleaving seem to help in my case (maybe not from 
>>> 100->300k, but 250k->300k). 
>>> 
>>> I need to do more long tests to confirm that. 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >, 
>>> "aderumier" 
>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski" 
>>> < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>> Envoyé: Mercredi 22 Avril 2015 16:34:33 
>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> I feel it is due to tcmalloc issue 
>>> 
>>> I have seen similar issue in my setup after 20 days. 
>>> 
>>> Thanks, 
>>> Srinivas 
>>> 
>>> 
>>> 
>>> -----Original Message----- 
>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf 
>>> Of Mark Nelson 
>>> Sent: Wednesday, April 22, 2015 7:31 PM 
>>> To: Alexandre DERUMIER; Milosz Tanski 
>>> Cc: ceph-devel; ceph-users 
>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Hi Alexandre, 
>>> 
>>> We should discuss this at the perf meeting today. We knew NUMA node 
>>> affinity issues were going to crop up sooner or later (and indeed 
>>> already have in some cases), but this is pretty major. It's probably 
>>> time to really dig in and figure out how to deal with this. 
>>> 
>>> Note: this is one of the reasons I like small nodes with single 
>>> sockets and fewer OSDs. 
>>> 
>>> Mark 
>>> 
>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
>>>> Hi, 
>>>> 
>>>> I have done a lot of test today, and it seem indeed numa related. 
>>>> 
>>>> My numastat was 
>>>> 
>>>> # numastat 
>>>> node0 node1 
>>>> numa_hit 99075422 153976877 
>>>> numa_miss 167490965 1493663 
>>>> numa_foreign 1493663 167491417 
>>>> interleave_hit 157745 167015 
>>>> local_node 99049179 153830554 
>>>> other_node 167517697 1639986 
>>>> 
>>>> So, a lot of miss. 
>>>> 
>>>> In this case , I can reproduce ios going from 85k to 300k iops, up 
>>>> and down. 
>>>> 
>>>> now setting 
>>>> echo 0 > /proc/sys/kernel/numa_balancing 
>>>> 
>>>> and starting osd daemons with 
>>>> 
>>>> numactl --interleave=all /usr/bin/ceph-osd 
>>>> 
>>>> 
>>>> I have a constant 300k iops ! 
>>>> 
>>>> 
>>>> I wonder if it could be improve by binding osd daemons to specific 
>>>> numa node. 
>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also 
>>>> require ceph.conf osd threads tunning. 
>>>> 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23 
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < 
>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote: 
>>>> 
>>>> 
>>>> I wonder if it could be numa related, 
>>>> 
>>>> I'm using centos 7.1, 
>>>> and auto numa balacning is enabled 
>>>> 
>>>> cat /proc/sys/kernel/numa_balancing = 1 
>>>> 
>>>> Maybe osd daemon access to buffer on wrong numa node. 
>>>> 
>>>> I'll try to reproduce the problem 
>>>> 
>>>> 
>>>> 
>>>> Can you force the degenerate case using numactl? To either affirm or 
>>>> deny your suspicion. 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" < 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05 
>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
>>>> improve performance from 100k iops to 300k iops 
>>>> 
>>>> Hi, 
>>>> 
>>>> I was doing some benchmarks, 
>>>> I have found an strange behaviour. 
>>>> 
>>>> Using fio with rbd engine, I was able to reach around 100k iops. 
>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>> 
>>>> then after restarting all osd daemons, 
>>>> 
>>>> the same fio benchmark show now around 300k iops. 
>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>> 
>>>> 
>>>> any ideas? 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> before restarting osd 
>>>> --------------------- 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>> ioengine=rbd, iodepth=32 ... 
>>>> fio-2.2.7-10-g51e9 
>>>> Starting 10 processes 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
>>>> iops] [eta 14m:45s] 
>>>> fio: terminating on signal 2 
>>>> 
>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
>>>> (usec): 
>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
>>>> | 99.99th=[47360] 
>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
>>>> depth=32 
>>>> 
>>>> Run status group 0 (all jobs): 
>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
>>>> mint=26215msec, maxt=26215msec 
>>>> 
>>>> Disk stats (read/write): 
>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>> ioengine=rbd, iodepth=32 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> AFTER RESTARTING OSDS 
>>>> ---------------------- 
>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>> ioengine=rbd, iodepth=32 ... 
>>>> fio-2.2.7-10-g51e9 
>>>> Starting 10 processes 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> rbd engine: RBD version: 0.1.9 
>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
>>>> iops] [eta 01h:09m:27s] 
>>>> fio: terminating on signal 2 
>>>> 
>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, 
>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
>>>> clat 
>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
>>>> (usec): 
>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
>>>> | 99.99th=[436224] 
>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
>>>> target=0, window=0, percentile=100.00%, depth=32 
>>>> 
>>>> Run status group 0 (all jobs): 
>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, 
>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 
>>>> 
>>>> Disk stats (read/write): 
>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> CEPH LOG 
>>>> -------- 
>>>> 
>>>> before restarting osd 
>>>> ---------------------- 
>>>> 
>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s 
>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s 
>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s 
>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s 
>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s 
>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s 
>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s 
>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s 
>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s 
>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s 
>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s 
>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s 
>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s 
>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s 
>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s 
>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s 
>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s 
>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s 
>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s 
>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s 
>>>> 
>>>> 
>>>> restarting osd 
>>>> --------------- 
>>>> 
>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
>>>> [INF] osd.0 marked itself down 
>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>> stale+active+794 
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
>>>> kB/s rd, 130 op/s 
>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>> stale+active+794 
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>> stale+active+794 
>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
>>>> [INF] osd.1 marked itself down 
>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>> stale+active+684 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>> stale+active+684 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
>>>> [INF] osd.2 marked itself down 
>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>> stale+active+600 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>> stale+active+600 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
>>>> [INF] osd.1 10.7.0.152:6800/14848 boot 
>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>> stale+active+600 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
>>>> [INF] osd.3 marked itself down 
>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 
>>>> stale+active+506 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
>>>> [INF] osd.6 marked itself down 
>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
>>>> [INF] osd.2 10.7.0.152:6812/15410 boot 
>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>> stale+active+398 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in 
>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>> stale+active+398 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
>>>> [INF] osd.3 10.7.0.152:6816/15971 boot 
>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>> stale+active+398 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
>>>> [INF] osd.7 marked itself down 
>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
>>>> [INF] osd.0 10.7.0.152:6804/14481 boot 
>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 
>>>> stale+active+292 
>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>> used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
>>>> [INF] osd.6 10.7.0.153:6808/21955 boot 
>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
>>>> [INF] osd.8 marked itself down 
>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in 
>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
>>>> [INF] osd.7 10.7.0.153:6804/22536 boot 
>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 
>>>> GB / 1295 GB avail 
>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in 
>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>> active+undersized+degraded+remapped, 5 peering, 13 active 
>>>> 
>>>> 
>>>> AFTER OSD RESTART 
>>>> ------------------ 
>>>> 
>>>> 
>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s 
>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s 
>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s 
>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s 
>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s 
>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s 
>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s 
>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s 
>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s 
>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s 
>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s 
>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s 
>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s 
>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>>> ________________________________ 
>>> 
>>> PLEASE NOTE: The information contained in this electronic mail 
>>> message is intended only for the use of the designated recipient(s) 
>>> named above. If the reader of this message is not the intended 
>>> recipient, you are hereby notified that you have received this 
>>> message in error and that any review, dissemination, distribution, or 
>>> copying of this message is strictly prohibited. If you have received 
>>> this communication in error, please notify the sender by telephone or 
>>> e-mail (as shown above) immediately and destroy any and all copies of 
>>> this message in your possession (whether hard copies or 
>>> electronically stored copies). 
>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> С уважением, Фасихов Ирек Нургаязович 
>> Моб.: +79229045757 
>> 
>> 
>> 
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>> the body of a message to majordomo@vger.kernel.org 
>> <mailto:majordomo@vger.kernel.org> 
>> More majordomo info at http://vger.kernel.org/majordomo-info.html 


_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 



_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
  2015-04-27  5:46                                                             ` Alexandre DERUMIER
@ 2015-04-27  6:12                                                               ` Venkateswara Rao Jujjuri
       [not found]                                                                 ` <CAKKTCLV9fVjprDFPs8YO+zrtox8g_1oOASWVxvY0RBwg8gFbRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found]                                                               ` <1545533798.684536916.1430113581382.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  1 sibling, 1 reply; 35+ messages in thread
From: Venkateswara Rao Jujjuri @ 2015-04-27  6:12 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: Mark Nelson, ceph-users, ceph-devel, Milosz Tanski

If I want to use librados API for performance testing, are there any
existing benchmark tools which directly accesses librados (not through
rbd or gateway)

Thanks in advance,
JV

On Sun, Apr 26, 2015 at 10:46 PM, Alexandre DERUMIER
<aderumier@odiso.com> wrote:
>>>I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>
> Ok, I really think I have patched tcmalloc wrongly.
> I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread).
>
> So better than jemalloc.
>
>
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Mark Nelson" <mnelson@redhat.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Lundi 27 Avril 2015 07:01:21
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> Hi,
>
> also another big difference,
>
> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc.
>
> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>
>
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Mark Nelson" <mnelson@redhat.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Samedi 25 Avril 2015 06:45:43
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
>>>We haven't done any kind of real testing on jemalloc, so use at your own
>>>peril. Having said that, we've also been very interested in hearing
>>>community feedback from folks trying it out, so please feel free to give
>>>it a shot. :D
>
> Some feedback, I have runned bench all the night, no speed regression.
>
> And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse)
>
> with tcmalloc :
>
> 10 fio-rbd jobs = 300k iops
> 15 fio-rbd jobs = 290k iops
> 20 fio-rbd jobs = 270k iops
> 40 fio-rbd jobs = 250k iops
>
> (all with up and down values during the fio bench)
>
>
> with jemalloc:
>
> 10 fio-rbd jobs = 300k iops
> 15 fio-rbd jobs = 320k iops
> 20 fio-rbd jobs = 330k iops
> 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%)
>
> (all with contant values during the fio bench)
>
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Vendredi 24 Avril 2015 20:02:15
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> We haven't done any kind of real testing on jemalloc, so use at your own
> peril. Having said that, we've also been very interested in hearing
> community feedback from folks trying it out, so please feel free to give
> it a shot. :D
>
> Mark
>
> On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote:
>> Is jemalloc recommanded in general? Does it also work for firefly?
>>
>> Stefan
>>
>> Excuse my typo sent from my mobile phone.
>>
>> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com
>> <mailto:aderumier@odiso.com>>:
>>
>>> Hi,
>>>
>>> I have finished to rebuild ceph with jemalloc,
>>>
>>> all seem to working fine.
>>>
>>> I got a constant 300k iops for the moment, so no speed regression.
>>>
>>> I'll do more long benchmark next week.
>>>
>>> Regards,
>>>
>>> Alexandre
>>>
>>> ----- Mail original -----
>>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>>
>>> À: "Somnath Roy" <Somnath.Roy@sandisk.com
>>> <mailto:Somnath.Roy@sandisk.com>>
>>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>,
>>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>,
>>> "ceph-users" <ceph-users@lists.ceph.com
>>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel"
>>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>,
>>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>>
>>> Envoyé: Vendredi 24 Avril 2015 13:37:52
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> Hi,Alexandre!
>>> Do not try to change the parameter vm.min_free_kbytes?
>>>
>>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com
>>> <mailto:Somnath.Roy@sandisk.com> > :
>>>
>>>
>>> Alexandre,
>>> You can configure with --with-jemalloc or ./do_autogen -J to build
>>> ceph with jemalloc.
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre
>>> DERUMIER
>>> Sent: Thursday, April 23, 2015 4:56 AM
>>> To: Mark Nelson
>>> Cc: ceph-users; ceph-devel; Milosz Tanski
>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>>>> If you have the means to compile the same version of ceph with
>>>>> jemalloc, I would be very interested to see how it does.
>>>
>>> Yes, sure. (I have around 3-4 weeks to do all the benchs)
>>>
>>> But I don't know how to do it ?
>>> I'm running the cluster on centos7.1, maybe it can be easy to patch
>>> the srpms to rebuild the package with jemalloc.
>>>
>>>
>>>
>>> ----- Mail original -----
>>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >
>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >,
>>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" <
>>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>> Envoyé: Jeudi 23 Avril 2015 13:33:00
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> Thanks for the testing Alexandre!
>>>
>>> If you have the means to compile the same version of ceph with
>>> jemalloc, I would be very interested to see how it does.
>>>
>>> In some ways I'm glad it turned out not to be NUMA. I still suspect we
>>> will have to deal with it at some point, but perhaps not today. ;)
>>>
>>> Mark
>>>
>>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote:
>>>> Maybe it's tcmalloc related
>>>> I thinked to have patched it correctly, but perf show a lot of
>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>
>>>> before osd restart (100k)
>>>> ------------------
>>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>> tcmalloc::CentralFreeList::ReleaseToSpans
>>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper
>>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2
>>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans
>>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd
>>>> libstdc++.so.6.0.19 [.] std::basic_string<char,
>>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91%
>>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd
>>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.]
>>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k]
>>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.]
>>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k]
>>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k]
>>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k]
>>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57%
>>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd
>>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms]
>>>> [k] __schedule
>>>>
>>>>
>>>> after osd restart (300k iops)
>>>> ------------------------------
>>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd
>>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper
>>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19
>>>> [.] std::basic_string<char, std::char_traits<char>,
>>>> std::allocator<char> >::basic_string 1.34% ceph-osd
>>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache
>>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd
>>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.]
>>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k]
>>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.]
>>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k]
>>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu
>>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd
>>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k]
>>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65%
>>>> ceph-osd libtcmalloc.so.4.1.2 [.]
>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61%
>>>> ceph-osd ceph-osd [.]
>>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
>>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd
>>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.]
>>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd
>>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd
>>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.]
>>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light
>>>>
>>>> ----- Mail original -----
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>> Envoyé: Jeudi 23 Avril 2015 10:00:34
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> Hi,
>>>> I'm hitting this bug again today.
>>>>
>>>> So don't seem to be numa related (I have try to flush linux buffer to
>>>> be sure).
>>>>
>>>> and tcmalloc is patched (I don't known how to verify that it's ok).
>>>>
>>>> I don't have restarted osd yet.
>>>>
>>>> Maybe some perf trace could be usefulll ?
>>>>
>>>>
>>>> ----- Mail original -----
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>> Envoyé: Mercredi 22 Avril 2015 18:30:26
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> Hi,
>>>>
>>>>>> I feel it is due to tcmalloc issue
>>>>
>>>> Indeed, I had patched one of my node, but not the other.
>>>> So maybe I have hit this bug. (but I can't confirm, I don't have
>>>> traces).
>>>>
>>>> But numa interleaving seem to help in my case (maybe not from
>>>> 100->300k, but 250k->300k).
>>>>
>>>> I need to do more long tests to confirm that.
>>>>
>>>>
>>>> ----- Mail original -----
>>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >,
>>>> "aderumier"
>>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski"
>>>> < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>> Envoyé: Mercredi 22 Avril 2015 16:34:33
>>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> I feel it is due to tcmalloc issue
>>>>
>>>> I have seen similar issue in my setup after 20 days.
>>>>
>>>> Thanks,
>>>> Srinivas
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf
>>>> Of Mark Nelson
>>>> Sent: Wednesday, April 22, 2015 7:31 PM
>>>> To: Alexandre DERUMIER; Milosz Tanski
>>>> Cc: ceph-devel; ceph-users
>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> Hi Alexandre,
>>>>
>>>> We should discuss this at the perf meeting today. We knew NUMA node
>>>> affinity issues were going to crop up sooner or later (and indeed
>>>> already have in some cases), but this is pretty major. It's probably
>>>> time to really dig in and figure out how to deal with this.
>>>>
>>>> Note: this is one of the reasons I like small nodes with single
>>>> sockets and fewer OSDs.
>>>>
>>>> Mark
>>>>
>>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
>>>>> Hi,
>>>>>
>>>>> I have done a lot of test today, and it seem indeed numa related.
>>>>>
>>>>> My numastat was
>>>>>
>>>>> # numastat
>>>>> node0 node1
>>>>> numa_hit 99075422 153976877
>>>>> numa_miss 167490965 1493663
>>>>> numa_foreign 1493663 167491417
>>>>> interleave_hit 157745 167015
>>>>> local_node 99049179 153830554
>>>>> other_node 167517697 1639986
>>>>>
>>>>> So, a lot of miss.
>>>>>
>>>>> In this case , I can reproduce ios going from 85k to 300k iops, up
>>>>> and down.
>>>>>
>>>>> now setting
>>>>> echo 0 > /proc/sys/kernel/numa_balancing
>>>>>
>>>>> and starting osd daemons with
>>>>>
>>>>> numactl --interleave=all /usr/bin/ceph-osd
>>>>>
>>>>>
>>>>> I have a constant 300k iops !
>>>>>
>>>>>
>>>>> I wonder if it could be improve by binding osd daemons to specific
>>>>> numa node.
>>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also
>>>>> require ceph.conf osd threads tunning.
>>>>>
>>>>>
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER <
>>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote:
>>>>>
>>>>>
>>>>> I wonder if it could be numa related,
>>>>>
>>>>> I'm using centos 7.1,
>>>>> and auto numa balacning is enabled
>>>>>
>>>>> cat /proc/sys/kernel/numa_balancing = 1
>>>>>
>>>>> Maybe osd daemon access to buffer on wrong numa node.
>>>>>
>>>>> I'll try to reproduce the problem
>>>>>
>>>>>
>>>>>
>>>>> Can you force the degenerate case using numactl? To either affirm or
>>>>> deny your suspicion.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" <
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05
>>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon
>>>>> improve performance from 100k iops to 300k iops
>>>>>
>>>>> Hi,
>>>>>
>>>>> I was doing some benchmarks,
>>>>> I have found an strange behaviour.
>>>>>
>>>>> Using fio with rbd engine, I was able to reach around 100k iops.
>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>
>>>>> then after restarting all osd daemons,
>>>>>
>>>>> the same fio benchmark show now around 300k iops.
>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>
>>>>>
>>>>> any ideas?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> before restarting osd
>>>>> ---------------------
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>> ioengine=rbd, iodepth=32 ...
>>>>> fio-2.2.7-10-g51e9
>>>>> Starting 10 processes
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0
>>>>> iops] [eta 14m:45s]
>>>>> fio: terminating on signal 2
>>>>>
>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr
>>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt=
>>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat
>>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
>>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
>>>>> (usec):
>>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
>>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
>>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
>>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
>>>>> | 99.99th=[47360]
>>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
>>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
>>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01%
>>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
>>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
>>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
>>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
>>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%,
>>>>> depth=32
>>>>>
>>>>> Run status group 0 (all jobs):
>>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s,
>>>>> mint=26215msec, maxt=26215msec
>>>>>
>>>>> Disk stats (read/write):
>>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>> ioengine=rbd, iodepth=32
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> AFTER RESTARTING OSDS
>>>>> ----------------------
>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>> ioengine=rbd, iodepth=32 ...
>>>>> fio-2.2.7-10-g51e9
>>>>> Starting 10 processes
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0
>>>>> iops] [eta 01h:09m:27s]
>>>>> fio: terminating on signal 2
>>>>>
>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr
>>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218,
>>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35
>>>>> clat
>>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
>>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
>>>>> (usec):
>>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
>>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
>>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
>>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
>>>>> | 99.99th=[436224]
>>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
>>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec)
>>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
>>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
>>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
>>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
>>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
>>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
>>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
>>>>> target=0, window=0, percentile=100.00%, depth=32
>>>>>
>>>>> Run status group 0 (all jobs):
>>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s,
>>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec
>>>>>
>>>>> Disk stats (read/write):
>>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> CEPH LOG
>>>>> --------
>>>>>
>>>>> before restarting osd
>>>>> ----------------------
>>>>>
>>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
>>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s
>>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
>>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s
>>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
>>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s
>>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
>>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s
>>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
>>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s
>>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
>>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s
>>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
>>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s
>>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
>>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s
>>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
>>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s
>>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
>>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s
>>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
>>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s
>>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
>>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s
>>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
>>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s
>>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
>>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s
>>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
>>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s
>>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
>>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s
>>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
>>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s
>>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
>>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s
>>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
>>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s
>>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
>>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s
>>>>>
>>>>>
>>>>> restarting osd
>>>>> ---------------
>>>>>
>>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
>>>>> [INF] osd.0 marked itself down
>>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
>>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in
>>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
>>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>>> stale+active+794
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516
>>>>> kB/s rd, 130 op/s
>>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
>>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in
>>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
>>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>>> stale+active+794
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
>>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>>> stale+active+794
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
>>>>> [INF] osd.1 marked itself down
>>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
>>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
>>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
>>>>> stale+active+684
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
>>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
>>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
>>>>> stale+active+684
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
>>>>> [INF] osd.2 marked itself down
>>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
>>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in
>>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
>>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>>> stale+active+600
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
>>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in
>>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
>>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>>> stale+active+600
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
>>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
>>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
>>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
>>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
>>>>> [INF] osd.1 10.7.0.152:6800/14848 boot
>>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
>>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
>>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>>> stale+active+600
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
>>>>> [INF] osd.3 marked itself down
>>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
>>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in
>>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
>>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
>>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped,
>>>>> stale+active+506
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
>>>>> [INF] osd.6 marked itself down
>>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
>>>>> [INF] osd.2 10.7.0.152:6812/15410 boot
>>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
>>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in
>>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
>>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>>> stale+active+398
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
>>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in
>>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
>>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>>> stale+active+398
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
>>>>> [INF] osd.3 10.7.0.152:6816/15971 boot
>>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
>>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
>>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>>> stale+active+398
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
>>>>> [INF] osd.7 marked itself down
>>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
>>>>> [INF] osd.0 10.7.0.152:6804/14481 boot
>>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
>>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
>>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
>>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped,
>>>>> stale+active+292
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
>>>>> [INF] osd.6 10.7.0.153:6808/21955 boot
>>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
>>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in
>>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
>>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
>>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
>>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data,
>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
>>>>> [INF] osd.8 marked itself down
>>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
>>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
>>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
>>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
>>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data,
>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
>>>>> [INF] osd.7 10.7.0.153:6804/22536 boot
>>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
>>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in
>>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
>>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped,
>>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874
>>>>> GB / 1295 GB avail
>>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
>>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in
>>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
>>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>> active+undersized+degraded+remapped, 5 peering, 13 active
>>>>>
>>>>>
>>>>> AFTER OSD RESTART
>>>>> ------------------
>>>>>
>>>>>
>>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
>>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s
>>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
>>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s
>>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
>>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s
>>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
>>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s
>>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
>>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s
>>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
>>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s
>>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
>>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s
>>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
>>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s
>>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
>>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s
>>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
>>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s
>>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
>>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s
>>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
>>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s
>>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
>>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s
>>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
>>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s
>>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
>>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s
>>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
>>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s
>>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
>>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> ________________________________
>>>>
>>>> PLEASE NOTE: The information contained in this electronic mail
>>>> message is intended only for the use of the designated recipient(s)
>>>> named above. If the reader of this message is not the intended
>>>> recipient, you are hereby notified that you have received this
>>>> message in error and that any review, dissemination, distribution, or
>>>> copying of this message is strictly prohibited. If you have received
>>>> this communication in error, please notify the sender by telephone or
>>>> e-mail (as shown above) immediately and destroy any and all copies of
>>>> this message in your possession (whether hard copies or
>>>> electronically stored copies).
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> С уважением, Фасихов Ирек Нургаязович
>>> Моб.: +79229045757
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> <mailto:majordomo@vger.kernel.org>
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you,
then you win. - Mahatma Gandhi
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                                                 ` <CAKKTCLV9fVjprDFPs8YO+zrtox8g_1oOASWVxvY0RBwg8gFbRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-04-27  6:31                                                                   ` Alexandre DERUMIER
       [not found]                                                                     ` <195300554.685766176.1430116301190.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-27  6:31 UTC (permalink / raw)
  To: Venkateswara Rao Jujjuri; +Cc: ceph-users, ceph-devel, Milosz Tanski

>>If I want to use librados API for performance testing, are there any 
>>existing benchmark tools which directly accesses librados (not through 
>>rbd or gateway) 

you can use "rados bench" from ceph packages

http://ceph.com/docs/master/man/8/rados/

"
bench seconds mode [ -b objsize ] [ -t threads ]
Benchmark for seconds. The mode can be write, seq, or rand. seq and rand are read benchmarks, either sequential or random. Before running one of the reading benchmarks, run a write benchmark with the –no-cleanup option. The default object size is 4 MB, and the default number of simulated threads (parallel writes) is 16.
"


----- Mail original -----
De: "Venkateswara Rao Jujjuri" <jujjuri@gmail.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Lundi 27 Avril 2015 08:12:49
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

If I want to use librados API for performance testing, are there any 
existing benchmark tools which directly accesses librados (not through 
rbd or gateway) 

Thanks in advance, 
JV 

On Sun, Apr 26, 2015 at 10:46 PM, Alexandre DERUMIER 
<aderumier@odiso.com> wrote: 
>>>I'll retest tcmalloc, because I was prety sure to have patched it correctly. 
> 
> Ok, I really think I have patched tcmalloc wrongly. 
> I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread). 
> 
> So better than jemalloc. 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" <aderumier@odiso.com> 
> À: "Mark Nelson" <mnelson@redhat.com> 
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com> 
> Envoyé: Lundi 27 Avril 2015 07:01:21 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> Hi, 
> 
> also another big difference, 
> 
> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc. 
> 
> I'll retest tcmalloc, because I was prety sure to have patched it correctly. 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" <aderumier@odiso.com> 
> À: "Mark Nelson" <mnelson@redhat.com> 
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com> 
> Envoyé: Samedi 25 Avril 2015 06:45:43 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
>>>We haven't done any kind of real testing on jemalloc, so use at your own 
>>>peril. Having said that, we've also been very interested in hearing 
>>>community feedback from folks trying it out, so please feel free to give 
>>>it a shot. :D 
> 
> Some feedback, I have runned bench all the night, no speed regression. 
> 
> And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse) 
> 
> with tcmalloc : 
> 
> 10 fio-rbd jobs = 300k iops 
> 15 fio-rbd jobs = 290k iops 
> 20 fio-rbd jobs = 270k iops 
> 40 fio-rbd jobs = 250k iops 
> 
> (all with up and down values during the fio bench) 
> 
> 
> with jemalloc: 
> 
> 10 fio-rbd jobs = 300k iops 
> 15 fio-rbd jobs = 320k iops 
> 20 fio-rbd jobs = 330k iops 
> 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%) 
> 
> (all with contant values during the fio bench) 
> 
> ----- Mail original ----- 
> De: "Mark Nelson" <mnelson@redhat.com> 
> À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com> 
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com> 
> Envoyé: Vendredi 24 Avril 2015 20:02:15 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> We haven't done any kind of real testing on jemalloc, so use at your own 
> peril. Having said that, we've also been very interested in hearing 
> community feedback from folks trying it out, so please feel free to give 
> it a shot. :D 
> 
> Mark 
> 
> On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote: 
>> Is jemalloc recommanded in general? Does it also work for firefly? 
>> 
>> Stefan 
>> 
>> Excuse my typo sent from my mobile phone. 
>> 
>> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com 
>> <mailto:aderumier@odiso.com>>: 
>> 
>>> Hi, 
>>> 
>>> I have finished to rebuild ceph with jemalloc, 
>>> 
>>> all seem to working fine. 
>>> 
>>> I got a constant 300k iops for the moment, so no speed regression. 
>>> 
>>> I'll do more long benchmark next week. 
>>> 
>>> Regards, 
>>> 
>>> Alexandre 
>>> 
>>> ----- Mail original ----- 
>>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>> 
>>> À: "Somnath Roy" <Somnath.Roy@sandisk.com 
>>> <mailto:Somnath.Roy@sandisk.com>> 
>>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>, 
>>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>, 
>>> "ceph-users" <ceph-users@lists.ceph.com 
>>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel" 
>>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>, 
>>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>> 
>>> Envoyé: Vendredi 24 Avril 2015 13:37:52 
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Hi,Alexandre! 
>>> Do not try to change the parameter vm.min_free_kbytes? 
>>> 
>>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com 
>>> <mailto:Somnath.Roy@sandisk.com> > : 
>>> 
>>> 
>>> Alexandre, 
>>> You can configure with --with-jemalloc or ./do_autogen -J to build 
>>> ceph with jemalloc. 
>>> 
>>> Thanks & Regards 
>>> Somnath 
>>> 
>>> -----Original Message----- 
>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre 
>>> DERUMIER 
>>> Sent: Thursday, April 23, 2015 4:56 AM 
>>> To: Mark Nelson 
>>> Cc: ceph-users; ceph-devel; Milosz Tanski 
>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>>>> If you have the means to compile the same version of ceph with 
>>>>> jemalloc, I would be very interested to see how it does. 
>>> 
>>> Yes, sure. (I have around 3-4 weeks to do all the benchs) 
>>> 
>>> But I don't know how to do it ? 
>>> I'm running the cluster on centos7.1, maybe it can be easy to patch 
>>> the srpms to rebuild the package with jemalloc. 
>>> 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> > 
>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >, 
>>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" < 
>>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>> Envoyé: Jeudi 23 Avril 2015 13:33:00 
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Thanks for the testing Alexandre! 
>>> 
>>> If you have the means to compile the same version of ceph with 
>>> jemalloc, I would be very interested to see how it does. 
>>> 
>>> In some ways I'm glad it turned out not to be NUMA. I still suspect we 
>>> will have to deal with it at some point, but perhaps not today. ;) 
>>> 
>>> Mark 
>>> 
>>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: 
>>>> Maybe it's tcmalloc related 
>>>> I thinked to have patched it correctly, but perf show a lot of 
>>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>>> 
>>>> before osd restart (100k) 
>>>> ------------------ 
>>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>> tcmalloc::CentralFreeList::ReleaseToSpans 
>>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper 
>>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 
>>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans 
>>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd 
>>>> libstdc++.so.6.0.19 [.] std::basic_string<char, 
>>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91% 
>>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd 
>>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] 
>>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] 
>>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] 
>>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] 
>>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] 
>>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] 
>>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57% 
>>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd 
>>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] 
>>>> [k] __schedule 
>>>> 
>>>> 
>>>> after osd restart (300k iops) 
>>>> ------------------------------ 
>>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd 
>>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper 
>>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 
>>>> [.] std::basic_string<char, std::char_traits<char>, 
>>>> std::allocator<char> >::basic_string 1.34% ceph-osd 
>>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 
>>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd 
>>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] 
>>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] 
>>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] 
>>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] 
>>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
>>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd 
>>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] 
>>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% 
>>>> ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% 
>>>> ceph-osd ceph-osd [.] 
>>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 
>>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd 
>>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.] 
>>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd 
>>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd 
>>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] 
>>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>> Envoyé: Jeudi 23 Avril 2015 10:00:34 
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> Hi, 
>>>> I'm hitting this bug again today. 
>>>> 
>>>> So don't seem to be numa related (I have try to flush linux buffer to 
>>>> be sure). 
>>>> 
>>>> and tcmalloc is patched (I don't known how to verify that it's ok). 
>>>> 
>>>> I don't have restarted osd yet. 
>>>> 
>>>> Maybe some perf trace could be usefulll ? 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>> Envoyé: Mercredi 22 Avril 2015 18:30:26 
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> Hi, 
>>>> 
>>>>>> I feel it is due to tcmalloc issue 
>>>> 
>>>> Indeed, I had patched one of my node, but not the other. 
>>>> So maybe I have hit this bug. (but I can't confirm, I don't have 
>>>> traces). 
>>>> 
>>>> But numa interleaving seem to help in my case (maybe not from 
>>>> 100->300k, but 250k->300k). 
>>>> 
>>>> I need to do more long tests to confirm that. 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >, 
>>>> "aderumier" 
>>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski" 
>>>> < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>> Envoyé: Mercredi 22 Avril 2015 16:34:33 
>>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> I feel it is due to tcmalloc issue 
>>>> 
>>>> I have seen similar issue in my setup after 20 days. 
>>>> 
>>>> Thanks, 
>>>> Srinivas 
>>>> 
>>>> 
>>>> 
>>>> -----Original Message----- 
>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf 
>>>> Of Mark Nelson 
>>>> Sent: Wednesday, April 22, 2015 7:31 PM 
>>>> To: Alexandre DERUMIER; Milosz Tanski 
>>>> Cc: ceph-devel; ceph-users 
>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> Hi Alexandre, 
>>>> 
>>>> We should discuss this at the perf meeting today. We knew NUMA node 
>>>> affinity issues were going to crop up sooner or later (and indeed 
>>>> already have in some cases), but this is pretty major. It's probably 
>>>> time to really dig in and figure out how to deal with this. 
>>>> 
>>>> Note: this is one of the reasons I like small nodes with single 
>>>> sockets and fewer OSDs. 
>>>> 
>>>> Mark 
>>>> 
>>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
>>>>> Hi, 
>>>>> 
>>>>> I have done a lot of test today, and it seem indeed numa related. 
>>>>> 
>>>>> My numastat was 
>>>>> 
>>>>> # numastat 
>>>>> node0 node1 
>>>>> numa_hit 99075422 153976877 
>>>>> numa_miss 167490965 1493663 
>>>>> numa_foreign 1493663 167491417 
>>>>> interleave_hit 157745 167015 
>>>>> local_node 99049179 153830554 
>>>>> other_node 167517697 1639986 
>>>>> 
>>>>> So, a lot of miss. 
>>>>> 
>>>>> In this case , I can reproduce ios going from 85k to 300k iops, up 
>>>>> and down. 
>>>>> 
>>>>> now setting 
>>>>> echo 0 > /proc/sys/kernel/numa_balancing 
>>>>> 
>>>>> and starting osd daemons with 
>>>>> 
>>>>> numactl --interleave=all /usr/bin/ceph-osd 
>>>>> 
>>>>> 
>>>>> I have a constant 300k iops ! 
>>>>> 
>>>>> 
>>>>> I wonder if it could be improve by binding osd daemons to specific 
>>>>> numa node. 
>>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also 
>>>>> require ceph.conf osd threads tunning. 
>>>>> 
>>>>> 
>>>>> 
>>>>> ----- Mail original ----- 
>>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23 
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>>> daemon improve performance from 100k iops to 300k iops 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < 
>>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote: 
>>>>> 
>>>>> 
>>>>> I wonder if it could be numa related, 
>>>>> 
>>>>> I'm using centos 7.1, 
>>>>> and auto numa balacning is enabled 
>>>>> 
>>>>> cat /proc/sys/kernel/numa_balancing = 1 
>>>>> 
>>>>> Maybe osd daemon access to buffer on wrong numa node. 
>>>>> 
>>>>> I'll try to reproduce the problem 
>>>>> 
>>>>> 
>>>>> 
>>>>> Can you force the degenerate case using numactl? To either affirm or 
>>>>> deny your suspicion. 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ----- Mail original ----- 
>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" < 
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05 
>>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
>>>>> improve performance from 100k iops to 300k iops 
>>>>> 
>>>>> Hi, 
>>>>> 
>>>>> I was doing some benchmarks, 
>>>>> I have found an strange behaviour. 
>>>>> 
>>>>> Using fio with rbd engine, I was able to reach around 100k iops. 
>>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>>> 
>>>>> then after restarting all osd daemons, 
>>>>> 
>>>>> the same fio benchmark show now around 300k iops. 
>>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>>> 
>>>>> 
>>>>> any ideas? 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> before restarting osd 
>>>>> --------------------- 
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>>> ioengine=rbd, iodepth=32 ... 
>>>>> fio-2.2.7-10-g51e9 
>>>>> Starting 10 processes 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
>>>>> iops] [eta 14m:45s] 
>>>>> fio: terminating on signal 2 
>>>>> 
>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
>>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
>>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
>>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
>>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
>>>>> (usec): 
>>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
>>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
>>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
>>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
>>>>> | 99.99th=[47360] 
>>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
>>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
>>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
>>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
>>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
>>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
>>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
>>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
>>>>> depth=32 
>>>>> 
>>>>> Run status group 0 (all jobs): 
>>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
>>>>> mint=26215msec, maxt=26215msec 
>>>>> 
>>>>> Disk stats (read/write): 
>>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>>> ioengine=rbd, iodepth=32 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> AFTER RESTARTING OSDS 
>>>>> ---------------------- 
>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>>> ioengine=rbd, iodepth=32 ... 
>>>>> fio-2.2.7-10-g51e9 
>>>>> Starting 10 processes 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
>>>>> iops] [eta 01h:09m:27s] 
>>>>> fio: terminating on signal 2 
>>>>> 
>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
>>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, 
>>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
>>>>> clat 
>>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
>>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
>>>>> (usec): 
>>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
>>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
>>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
>>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
>>>>> | 99.99th=[436224] 
>>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
>>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
>>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
>>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
>>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
>>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
>>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
>>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
>>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
>>>>> target=0, window=0, percentile=100.00%, depth=32 
>>>>> 
>>>>> Run status group 0 (all jobs): 
>>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, 
>>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 
>>>>> 
>>>>> Disk stats (read/write): 
>>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> CEPH LOG 
>>>>> -------- 
>>>>> 
>>>>> before restarting osd 
>>>>> ---------------------- 
>>>>> 
>>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
>>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s 
>>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
>>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s 
>>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
>>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s 
>>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
>>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s 
>>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
>>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s 
>>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
>>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s 
>>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
>>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s 
>>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
>>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s 
>>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
>>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s 
>>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
>>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s 
>>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
>>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s 
>>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
>>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s 
>>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
>>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s 
>>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
>>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s 
>>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
>>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s 
>>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
>>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s 
>>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
>>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s 
>>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
>>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s 
>>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
>>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s 
>>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
>>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s 
>>>>> 
>>>>> 
>>>>> restarting osd 
>>>>> --------------- 
>>>>> 
>>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
>>>>> [INF] osd.0 marked itself down 
>>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
>>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in 
>>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
>>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>>> stale+active+794 
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
>>>>> kB/s rd, 130 op/s 
>>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
>>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in 
>>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
>>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>>> stale+active+794 
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
>>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>>> stale+active+794 
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
>>>>> [INF] osd.1 marked itself down 
>>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
>>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
>>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>>> stale+active+684 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
>>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
>>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>>> stale+active+684 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
>>>>> [INF] osd.2 marked itself down 
>>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
>>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in 
>>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
>>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>>> stale+active+600 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
>>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in 
>>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
>>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>>> stale+active+600 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
>>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
>>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
>>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
>>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
>>>>> [INF] osd.1 10.7.0.152:6800/14848 boot 
>>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
>>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
>>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>>> stale+active+600 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
>>>>> [INF] osd.3 marked itself down 
>>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
>>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in 
>>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
>>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
>>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 
>>>>> stale+active+506 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
>>>>> [INF] osd.6 marked itself down 
>>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
>>>>> [INF] osd.2 10.7.0.152:6812/15410 boot 
>>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
>>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in 
>>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
>>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>>> stale+active+398 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
>>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in 
>>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
>>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>>> stale+active+398 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
>>>>> [INF] osd.3 10.7.0.152:6816/15971 boot 
>>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
>>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
>>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>>> stale+active+398 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
>>>>> [INF] osd.7 marked itself down 
>>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
>>>>> [INF] osd.0 10.7.0.152:6804/14481 boot 
>>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
>>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
>>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
>>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 
>>>>> stale+active+292 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
>>>>> [INF] osd.6 10.7.0.153:6808/21955 boot 
>>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
>>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in 
>>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
>>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
>>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
>>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
>>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
>>>>> [INF] osd.8 marked itself down 
>>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
>>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
>>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
>>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
>>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
>>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
>>>>> [INF] osd.7 10.7.0.153:6804/22536 boot 
>>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
>>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in 
>>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
>>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
>>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
>>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 
>>>>> GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
>>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in 
>>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
>>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
>>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>>> active+undersized+degraded+remapped, 5 peering, 13 active 
>>>>> 
>>>>> 
>>>>> AFTER OSD RESTART 
>>>>> ------------------ 
>>>>> 
>>>>> 
>>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
>>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s 
>>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
>>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
>>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
>>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s 
>>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
>>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s 
>>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
>>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
>>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
>>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s 
>>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
>>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s 
>>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
>>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
>>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
>>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s 
>>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
>>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s 
>>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
>>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s 
>>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
>>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s 
>>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
>>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s 
>>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
>>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
>>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
>>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s 
>>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
>>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s 
>>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
>>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s 
>>>>> 
>>>>> _______________________________________________ 
>>>>> ceph-users mailing list 
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> 
>>>> ________________________________ 
>>>> 
>>>> PLEASE NOTE: The information contained in this electronic mail 
>>>> message is intended only for the use of the designated recipient(s) 
>>>> named above. If the reader of this message is not the intended 
>>>> recipient, you are hereby notified that you have received this 
>>>> message in error and that any review, dissemination, distribution, or 
>>>> copying of this message is strictly prohibited. If you have received 
>>>> this communication in error, please notify the sender by telephone or 
>>>> e-mail (as shown above) immediately and destroy any and all copies of 
>>>> this message in your possession (whether hard copies or 
>>>> electronically stored copies). 
>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> С уважением, Фасихов Ирек Нургаязович 
>>> Моб.: +79229045757 
>>> 
>>> 
>>> 
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>>> the body of a message to majordomo@vger.kernel.org 
>>> <mailto:majordomo@vger.kernel.org> 
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 



-- 
Jvrao 
--- 
First they ignore you, then they laugh at you, then they fight you, 
then you win. - Mahatma Gandhi 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                                               ` <1545533798.684536916.1430113581382.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2015-04-27 14:54                                                                 ` Mark Nelson
       [not found]                                                                   ` <553E4DAA.2070206-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Mark Nelson @ 2015-04-27 14:54 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-users, ceph-devel, Milosz Tanski

Hi Alex,

Is it possible that you were suffering from the bug during the first 
test but once reinstalled you hadn't hit it yet?  That's a pretty major 
performance swing.  I'm not sure if we can draw any conclusions about 
jemalloc vs tcmalloc until we can figure out what went wrong.

Mark

On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote:
>>> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>
> Ok, I really think I have patched tcmalloc wrongly.
> I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread).
>
> So better than jemalloc.
>
>
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Mark Nelson" <mnelson@redhat.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Lundi 27 Avril 2015 07:01:21
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> Hi,
>
> also another big difference,
>
> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc.
>
> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>
>
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Mark Nelson" <mnelson@redhat.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Samedi 25 Avril 2015 06:45:43
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
>>> We haven't done any kind of real testing on jemalloc, so use at your own
>>> peril. Having said that, we've also been very interested in hearing
>>> community feedback from folks trying it out, so please feel free to give
>>> it a shot. :D
>
> Some feedback, I have runned bench all the night, no speed regression.
>
> And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse)
>
> with tcmalloc :
>
> 10 fio-rbd jobs = 300k iops
> 15 fio-rbd jobs = 290k iops
> 20 fio-rbd jobs = 270k iops
> 40 fio-rbd jobs = 250k iops
>
> (all with up and down values during the fio bench)
>
>
> with jemalloc:
>
> 10 fio-rbd jobs = 300k iops
> 15 fio-rbd jobs = 320k iops
> 20 fio-rbd jobs = 330k iops
> 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%)
>
> (all with contant values during the fio bench)
>
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Vendredi 24 Avril 2015 20:02:15
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> We haven't done any kind of real testing on jemalloc, so use at your own
> peril. Having said that, we've also been very interested in hearing
> community feedback from folks trying it out, so please feel free to give
> it a shot. :D
>
> Mark
>
> On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote:
>> Is jemalloc recommanded in general? Does it also work for firefly?
>>
>> Stefan
>>
>> Excuse my typo sent from my mobile phone.
>>
>> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com
>> <mailto:aderumier@odiso.com>>:
>>
>>> Hi,
>>>
>>> I have finished to rebuild ceph with jemalloc,
>>>
>>> all seem to working fine.
>>>
>>> I got a constant 300k iops for the moment, so no speed regression.
>>>
>>> I'll do more long benchmark next week.
>>>
>>> Regards,
>>>
>>> Alexandre
>>>
>>> ----- Mail original -----
>>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>>
>>> À: "Somnath Roy" <Somnath.Roy@sandisk.com
>>> <mailto:Somnath.Roy@sandisk.com>>
>>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>,
>>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>,
>>> "ceph-users" <ceph-users@lists.ceph.com
>>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel"
>>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>,
>>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>>
>>> Envoyé: Vendredi 24 Avril 2015 13:37:52
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> Hi,Alexandre!
>>> Do not try to change the parameter vm.min_free_kbytes?
>>>
>>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com
>>> <mailto:Somnath.Roy@sandisk.com> > :
>>>
>>>
>>> Alexandre,
>>> You can configure with --with-jemalloc or ./do_autogen -J to build
>>> ceph with jemalloc.
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre
>>> DERUMIER
>>> Sent: Thursday, April 23, 2015 4:56 AM
>>> To: Mark Nelson
>>> Cc: ceph-users; ceph-devel; Milosz Tanski
>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>>>> If you have the means to compile the same version of ceph with
>>>>> jemalloc, I would be very interested to see how it does.
>>>
>>> Yes, sure. (I have around 3-4 weeks to do all the benchs)
>>>
>>> But I don't know how to do it ?
>>> I'm running the cluster on centos7.1, maybe it can be easy to patch
>>> the srpms to rebuild the package with jemalloc.
>>>
>>>
>>>
>>> ----- Mail original -----
>>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >
>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >,
>>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" <
>>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>> Envoyé: Jeudi 23 Avril 2015 13:33:00
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> Thanks for the testing Alexandre!
>>>
>>> If you have the means to compile the same version of ceph with
>>> jemalloc, I would be very interested to see how it does.
>>>
>>> In some ways I'm glad it turned out not to be NUMA. I still suspect we
>>> will have to deal with it at some point, but perhaps not today. ;)
>>>
>>> Mark
>>>
>>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote:
>>>> Maybe it's tcmalloc related
>>>> I thinked to have patched it correctly, but perf show a lot of
>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>
>>>> before osd restart (100k)
>>>> ------------------
>>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>> tcmalloc::CentralFreeList::ReleaseToSpans
>>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper
>>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2
>>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans
>>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd
>>>> libstdc++.so.6.0.19 [.] std::basic_string<char,
>>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91%
>>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd
>>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.]
>>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k]
>>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.]
>>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k]
>>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k]
>>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k]
>>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57%
>>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd
>>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms]
>>>> [k] __schedule
>>>>
>>>>
>>>> after osd restart (300k iops)
>>>> ------------------------------
>>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd
>>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper
>>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19
>>>> [.] std::basic_string<char, std::char_traits<char>,
>>>> std::allocator<char> >::basic_string 1.34% ceph-osd
>>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache
>>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd
>>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.]
>>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k]
>>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.]
>>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k]
>>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu
>>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd
>>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k]
>>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65%
>>>> ceph-osd libtcmalloc.so.4.1.2 [.]
>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61%
>>>> ceph-osd ceph-osd [.]
>>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
>>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd
>>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.]
>>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd
>>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd
>>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.]
>>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light
>>>>
>>>> ----- Mail original -----
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>> Envoyé: Jeudi 23 Avril 2015 10:00:34
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> Hi,
>>>> I'm hitting this bug again today.
>>>>
>>>> So don't seem to be numa related (I have try to flush linux buffer to
>>>> be sure).
>>>>
>>>> and tcmalloc is patched (I don't known how to verify that it's ok).
>>>>
>>>> I don't have restarted osd yet.
>>>>
>>>> Maybe some perf trace could be usefulll ?
>>>>
>>>>
>>>> ----- Mail original -----
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>> Envoyé: Mercredi 22 Avril 2015 18:30:26
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> Hi,
>>>>
>>>>>> I feel it is due to tcmalloc issue
>>>>
>>>> Indeed, I had patched one of my node, but not the other.
>>>> So maybe I have hit this bug. (but I can't confirm, I don't have
>>>> traces).
>>>>
>>>> But numa interleaving seem to help in my case (maybe not from
>>>> 100->300k, but 250k->300k).
>>>>
>>>> I need to do more long tests to confirm that.
>>>>
>>>>
>>>> ----- Mail original -----
>>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >,
>>>> "aderumier"
>>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski"
>>>> < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>> Envoyé: Mercredi 22 Avril 2015 16:34:33
>>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> I feel it is due to tcmalloc issue
>>>>
>>>> I have seen similar issue in my setup after 20 days.
>>>>
>>>> Thanks,
>>>> Srinivas
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf
>>>> Of Mark Nelson
>>>> Sent: Wednesday, April 22, 2015 7:31 PM
>>>> To: Alexandre DERUMIER; Milosz Tanski
>>>> Cc: ceph-devel; ceph-users
>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> Hi Alexandre,
>>>>
>>>> We should discuss this at the perf meeting today. We knew NUMA node
>>>> affinity issues were going to crop up sooner or later (and indeed
>>>> already have in some cases), but this is pretty major. It's probably
>>>> time to really dig in and figure out how to deal with this.
>>>>
>>>> Note: this is one of the reasons I like small nodes with single
>>>> sockets and fewer OSDs.
>>>>
>>>> Mark
>>>>
>>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
>>>>> Hi,
>>>>>
>>>>> I have done a lot of test today, and it seem indeed numa related.
>>>>>
>>>>> My numastat was
>>>>>
>>>>> # numastat
>>>>> node0 node1
>>>>> numa_hit 99075422 153976877
>>>>> numa_miss 167490965 1493663
>>>>> numa_foreign 1493663 167491417
>>>>> interleave_hit 157745 167015
>>>>> local_node 99049179 153830554
>>>>> other_node 167517697 1639986
>>>>>
>>>>> So, a lot of miss.
>>>>>
>>>>> In this case , I can reproduce ios going from 85k to 300k iops, up
>>>>> and down.
>>>>>
>>>>> now setting
>>>>> echo 0 > /proc/sys/kernel/numa_balancing
>>>>>
>>>>> and starting osd daemons with
>>>>>
>>>>> numactl --interleave=all /usr/bin/ceph-osd
>>>>>
>>>>>
>>>>> I have a constant 300k iops !
>>>>>
>>>>>
>>>>> I wonder if it could be improve by binding osd daemons to specific
>>>>> numa node.
>>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also
>>>>> require ceph.conf osd threads tunning.
>>>>>
>>>>>
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER <
>>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote:
>>>>>
>>>>>
>>>>> I wonder if it could be numa related,
>>>>>
>>>>> I'm using centos 7.1,
>>>>> and auto numa balacning is enabled
>>>>>
>>>>> cat /proc/sys/kernel/numa_balancing = 1
>>>>>
>>>>> Maybe osd daemon access to buffer on wrong numa node.
>>>>>
>>>>> I'll try to reproduce the problem
>>>>>
>>>>>
>>>>>
>>>>> Can you force the degenerate case using numactl? To either affirm or
>>>>> deny your suspicion.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" <
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05
>>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon
>>>>> improve performance from 100k iops to 300k iops
>>>>>
>>>>> Hi,
>>>>>
>>>>> I was doing some benchmarks,
>>>>> I have found an strange behaviour.
>>>>>
>>>>> Using fio with rbd engine, I was able to reach around 100k iops.
>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>
>>>>> then after restarting all osd daemons,
>>>>>
>>>>> the same fio benchmark show now around 300k iops.
>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>
>>>>>
>>>>> any ideas?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> before restarting osd
>>>>> ---------------------
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>> ioengine=rbd, iodepth=32 ...
>>>>> fio-2.2.7-10-g51e9
>>>>> Starting 10 processes
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0
>>>>> iops] [eta 14m:45s]
>>>>> fio: terminating on signal 2
>>>>>
>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr
>>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt=
>>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat
>>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
>>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
>>>>> (usec):
>>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
>>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
>>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
>>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
>>>>> | 99.99th=[47360]
>>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
>>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
>>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01%
>>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
>>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
>>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
>>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
>>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%,
>>>>> depth=32
>>>>>
>>>>> Run status group 0 (all jobs):
>>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s,
>>>>> mint=26215msec, maxt=26215msec
>>>>>
>>>>> Disk stats (read/write):
>>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>> ioengine=rbd, iodepth=32
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> AFTER RESTARTING OSDS
>>>>> ----------------------
>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>> ioengine=rbd, iodepth=32 ...
>>>>> fio-2.2.7-10-g51e9
>>>>> Starting 10 processes
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> rbd engine: RBD version: 0.1.9
>>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0
>>>>> iops] [eta 01h:09m:27s]
>>>>> fio: terminating on signal 2
>>>>>
>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr
>>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218,
>>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35
>>>>> clat
>>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
>>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
>>>>> (usec):
>>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
>>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
>>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
>>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
>>>>> | 99.99th=[436224]
>>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
>>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec)
>>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
>>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
>>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
>>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
>>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
>>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
>>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
>>>>> target=0, window=0, percentile=100.00%, depth=32
>>>>>
>>>>> Run status group 0 (all jobs):
>>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s,
>>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec
>>>>>
>>>>> Disk stats (read/write):
>>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> CEPH LOG
>>>>> --------
>>>>>
>>>>> before restarting osd
>>>>> ----------------------
>>>>>
>>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
>>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s
>>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
>>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s
>>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
>>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s
>>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
>>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s
>>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
>>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s
>>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
>>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s
>>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
>>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s
>>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
>>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s
>>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
>>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s
>>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
>>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s
>>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
>>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s
>>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
>>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s
>>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
>>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s
>>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
>>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s
>>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
>>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s
>>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
>>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s
>>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
>>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s
>>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
>>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s
>>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
>>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s
>>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
>>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s
>>>>>
>>>>>
>>>>> restarting osd
>>>>> ---------------
>>>>>
>>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
>>>>> [INF] osd.0 marked itself down
>>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
>>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in
>>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
>>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>>> stale+active+794
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516
>>>>> kB/s rd, 130 op/s
>>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
>>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in
>>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
>>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>>> stale+active+794
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
>>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>>> stale+active+794
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
>>>>> [INF] osd.1 marked itself down
>>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
>>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
>>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
>>>>> stale+active+684
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
>>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
>>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
>>>>> stale+active+684
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
>>>>> [INF] osd.2 marked itself down
>>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
>>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in
>>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
>>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>>> stale+active+600
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
>>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in
>>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
>>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>>> stale+active+600
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
>>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
>>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
>>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
>>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
>>>>> [INF] osd.1 10.7.0.152:6800/14848 boot
>>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
>>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
>>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>>> stale+active+600
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
>>>>> [INF] osd.3 marked itself down
>>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
>>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in
>>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
>>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
>>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped,
>>>>> stale+active+506
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
>>>>> [INF] osd.6 marked itself down
>>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
>>>>> [INF] osd.2 10.7.0.152:6812/15410 boot
>>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
>>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in
>>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
>>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>>> stale+active+398
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
>>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in
>>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
>>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>>> stale+active+398
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
>>>>> [INF] osd.3 10.7.0.152:6816/15971 boot
>>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
>>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
>>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>>> stale+active+398
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
>>>>> [INF] osd.7 marked itself down
>>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
>>>>> [INF] osd.0 10.7.0.152:6804/14481 boot
>>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
>>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
>>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
>>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped,
>>>>> stale+active+292
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>> used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
>>>>> [INF] osd.6 10.7.0.153:6808/21955 boot
>>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
>>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in
>>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
>>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
>>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
>>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data,
>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
>>>>> [INF] osd.8 marked itself down
>>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
>>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in
>>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
>>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
>>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
>>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data,
>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
>>>>> [INF] osd.7 10.7.0.153:6804/22536 boot
>>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
>>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in
>>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
>>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped,
>>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874
>>>>> GB / 1295 GB avail
>>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
>>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in
>>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
>>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>> active+undersized+degraded+remapped, 5 peering, 13 active
>>>>>
>>>>>
>>>>> AFTER OSD RESTART
>>>>> ------------------
>>>>>
>>>>>
>>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
>>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s
>>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
>>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s
>>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
>>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s
>>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
>>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s
>>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
>>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s
>>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
>>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s
>>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
>>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s
>>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
>>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s
>>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
>>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s
>>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
>>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s
>>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
>>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s
>>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
>>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s
>>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
>>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s
>>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
>>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s
>>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
>>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s
>>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
>>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s
>>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
>>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> ________________________________
>>>>
>>>> PLEASE NOTE: The information contained in this electronic mail
>>>> message is intended only for the use of the designated recipient(s)
>>>> named above. If the reader of this message is not the intended
>>>> recipient, you are hereby notified that you have received this
>>>> message in error and that any review, dissemination, distribution, or
>>>> copying of this message is strictly prohibited. If you have received
>>>> this communication in error, please notify the sender by telephone or
>>>> e-mail (as shown above) immediately and destroy any and all copies of
>>>> this message in your possession (whether hard copies or
>>>> electronically stored copies).
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> С уважением, Фасихов Ирек Нургаязович
>>> Моб.: +79229045757
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> <mailto:majordomo@vger.kernel.org>
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                                                   ` <553E4DAA.2070206-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-04-27 15:11                                                                     ` Alexandre DERUMIER
  2015-04-27 16:34                                                                       ` [ceph-users] " Mark Nelson
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-27 15:11 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-users, ceph-devel, Milosz Tanski

>>Is it possible that you were suffering from the bug during the first 
>>test but once reinstalled you hadn't hit it yet?  

yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
I had patched it, but I think it's not enough.
I had always this bug in random, but mainly when I have a "lot" of concurrent client (20 -40).
more client increase - lower iops .


Today,I had try to start osd with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M , 
and now it's working fine in all my benchs.


>>That's a pretty major 
>>performance swing.  I'm not sure if we can draw any conclusions about 
>>jemalloc vs tcmalloc until we can figure out what went wrong.

From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% or 2%).
Tcmalloc seem to works better, with correct tuning of thread_cache_bytes.


But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES correctly.
Maybe Sommath can tell us ?


----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Lundi 27 Avril 2015 16:54:34
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi Alex, 

Is it possible that you were suffering from the bug during the first 
test but once reinstalled you hadn't hit it yet? That's a pretty major 
performance swing. I'm not sure if we can draw any conclusions about 
jemalloc vs tcmalloc until we can figure out what went wrong. 

Mark 

On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote: 
>>> I'll retest tcmalloc, because I was prety sure to have patched it correctly. 
> 
> Ok, I really think I have patched tcmalloc wrongly. 
> I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread). 
> 
> So better than jemalloc. 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" <aderumier@odiso.com> 
> À: "Mark Nelson" <mnelson@redhat.com> 
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com> 
> Envoyé: Lundi 27 Avril 2015 07:01:21 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> Hi, 
> 
> also another big difference, 
> 
> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc. 
> 
> I'll retest tcmalloc, because I was prety sure to have patched it correctly. 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" <aderumier@odiso.com> 
> À: "Mark Nelson" <mnelson@redhat.com> 
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com> 
> Envoyé: Samedi 25 Avril 2015 06:45:43 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
>>> We haven't done any kind of real testing on jemalloc, so use at your own 
>>> peril. Having said that, we've also been very interested in hearing 
>>> community feedback from folks trying it out, so please feel free to give 
>>> it a shot. :D 
> 
> Some feedback, I have runned bench all the night, no speed regression. 
> 
> And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse) 
> 
> with tcmalloc : 
> 
> 10 fio-rbd jobs = 300k iops 
> 15 fio-rbd jobs = 290k iops 
> 20 fio-rbd jobs = 270k iops 
> 40 fio-rbd jobs = 250k iops 
> 
> (all with up and down values during the fio bench) 
> 
> 
> with jemalloc: 
> 
> 10 fio-rbd jobs = 300k iops 
> 15 fio-rbd jobs = 320k iops 
> 20 fio-rbd jobs = 330k iops 
> 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%) 
> 
> (all with contant values during the fio bench) 
> 
> ----- Mail original ----- 
> De: "Mark Nelson" <mnelson@redhat.com> 
> À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com> 
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com> 
> Envoyé: Vendredi 24 Avril 2015 20:02:15 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> We haven't done any kind of real testing on jemalloc, so use at your own 
> peril. Having said that, we've also been very interested in hearing 
> community feedback from folks trying it out, so please feel free to give 
> it a shot. :D 
> 
> Mark 
> 
> On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote: 
>> Is jemalloc recommanded in general? Does it also work for firefly? 
>> 
>> Stefan 
>> 
>> Excuse my typo sent from my mobile phone. 
>> 
>> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com 
>> <mailto:aderumier@odiso.com>>: 
>> 
>>> Hi, 
>>> 
>>> I have finished to rebuild ceph with jemalloc, 
>>> 
>>> all seem to working fine. 
>>> 
>>> I got a constant 300k iops for the moment, so no speed regression. 
>>> 
>>> I'll do more long benchmark next week. 
>>> 
>>> Regards, 
>>> 
>>> Alexandre 
>>> 
>>> ----- Mail original ----- 
>>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>> 
>>> À: "Somnath Roy" <Somnath.Roy@sandisk.com 
>>> <mailto:Somnath.Roy@sandisk.com>> 
>>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>, 
>>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>, 
>>> "ceph-users" <ceph-users@lists.ceph.com 
>>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel" 
>>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>, 
>>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>> 
>>> Envoyé: Vendredi 24 Avril 2015 13:37:52 
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Hi,Alexandre! 
>>> Do not try to change the parameter vm.min_free_kbytes? 
>>> 
>>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com 
>>> <mailto:Somnath.Roy@sandisk.com> > : 
>>> 
>>> 
>>> Alexandre, 
>>> You can configure with --with-jemalloc or ./do_autogen -J to build 
>>> ceph with jemalloc. 
>>> 
>>> Thanks & Regards 
>>> Somnath 
>>> 
>>> -----Original Message----- 
>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre 
>>> DERUMIER 
>>> Sent: Thursday, April 23, 2015 4:56 AM 
>>> To: Mark Nelson 
>>> Cc: ceph-users; ceph-devel; Milosz Tanski 
>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>>>> If you have the means to compile the same version of ceph with 
>>>>> jemalloc, I would be very interested to see how it does. 
>>> 
>>> Yes, sure. (I have around 3-4 weeks to do all the benchs) 
>>> 
>>> But I don't know how to do it ? 
>>> I'm running the cluster on centos7.1, maybe it can be easy to patch 
>>> the srpms to rebuild the package with jemalloc. 
>>> 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> > 
>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >, 
>>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" < 
>>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>> Envoyé: Jeudi 23 Avril 2015 13:33:00 
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>> daemon improve performance from 100k iops to 300k iops 
>>> 
>>> Thanks for the testing Alexandre! 
>>> 
>>> If you have the means to compile the same version of ceph with 
>>> jemalloc, I would be very interested to see how it does. 
>>> 
>>> In some ways I'm glad it turned out not to be NUMA. I still suspect we 
>>> will have to deal with it at some point, but perhaps not today. ;) 
>>> 
>>> Mark 
>>> 
>>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: 
>>>> Maybe it's tcmalloc related 
>>>> I thinked to have patched it correctly, but perf show a lot of 
>>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>>> 
>>>> before osd restart (100k) 
>>>> ------------------ 
>>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>> tcmalloc::CentralFreeList::ReleaseToSpans 
>>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper 
>>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 
>>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans 
>>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd 
>>>> libstdc++.so.6.0.19 [.] std::basic_string<char, 
>>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91% 
>>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd 
>>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] 
>>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] 
>>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] 
>>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] 
>>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] 
>>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] 
>>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57% 
>>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd 
>>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] 
>>>> [k] __schedule 
>>>> 
>>>> 
>>>> after osd restart (300k iops) 
>>>> ------------------------------ 
>>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd 
>>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper 
>>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 
>>>> [.] std::basic_string<char, std::char_traits<char>, 
>>>> std::allocator<char> >::basic_string 1.34% ceph-osd 
>>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 
>>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd 
>>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] 
>>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] 
>>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] 
>>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] 
>>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
>>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd 
>>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] 
>>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% 
>>>> ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% 
>>>> ceph-osd ceph-osd [.] 
>>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 
>>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd 
>>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.] 
>>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd 
>>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd 
>>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] 
>>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>> Envoyé: Jeudi 23 Avril 2015 10:00:34 
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> Hi, 
>>>> I'm hitting this bug again today. 
>>>> 
>>>> So don't seem to be numa related (I have try to flush linux buffer to 
>>>> be sure). 
>>>> 
>>>> and tcmalloc is patched (I don't known how to verify that it's ok). 
>>>> 
>>>> I don't have restarted osd yet. 
>>>> 
>>>> Maybe some perf trace could be usefulll ? 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>> Envoyé: Mercredi 22 Avril 2015 18:30:26 
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> Hi, 
>>>> 
>>>>>> I feel it is due to tcmalloc issue 
>>>> 
>>>> Indeed, I had patched one of my node, but not the other. 
>>>> So maybe I have hit this bug. (but I can't confirm, I don't have 
>>>> traces). 
>>>> 
>>>> But numa interleaving seem to help in my case (maybe not from 
>>>> 100->300k, but 250k->300k). 
>>>> 
>>>> I need to do more long tests to confirm that. 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >, 
>>>> "aderumier" 
>>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski" 
>>>> < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>> Envoyé: Mercredi 22 Avril 2015 16:34:33 
>>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> I feel it is due to tcmalloc issue 
>>>> 
>>>> I have seen similar issue in my setup after 20 days. 
>>>> 
>>>> Thanks, 
>>>> Srinivas 
>>>> 
>>>> 
>>>> 
>>>> -----Original Message----- 
>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf 
>>>> Of Mark Nelson 
>>>> Sent: Wednesday, April 22, 2015 7:31 PM 
>>>> To: Alexandre DERUMIER; Milosz Tanski 
>>>> Cc: ceph-devel; ceph-users 
>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> Hi Alexandre, 
>>>> 
>>>> We should discuss this at the perf meeting today. We knew NUMA node 
>>>> affinity issues were going to crop up sooner or later (and indeed 
>>>> already have in some cases), but this is pretty major. It's probably 
>>>> time to really dig in and figure out how to deal with this. 
>>>> 
>>>> Note: this is one of the reasons I like small nodes with single 
>>>> sockets and fewer OSDs. 
>>>> 
>>>> Mark 
>>>> 
>>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
>>>>> Hi, 
>>>>> 
>>>>> I have done a lot of test today, and it seem indeed numa related. 
>>>>> 
>>>>> My numastat was 
>>>>> 
>>>>> # numastat 
>>>>> node0 node1 
>>>>> numa_hit 99075422 153976877 
>>>>> numa_miss 167490965 1493663 
>>>>> numa_foreign 1493663 167491417 
>>>>> interleave_hit 157745 167015 
>>>>> local_node 99049179 153830554 
>>>>> other_node 167517697 1639986 
>>>>> 
>>>>> So, a lot of miss. 
>>>>> 
>>>>> In this case , I can reproduce ios going from 85k to 300k iops, up 
>>>>> and down. 
>>>>> 
>>>>> now setting 
>>>>> echo 0 > /proc/sys/kernel/numa_balancing 
>>>>> 
>>>>> and starting osd daemons with 
>>>>> 
>>>>> numactl --interleave=all /usr/bin/ceph-osd 
>>>>> 
>>>>> 
>>>>> I have a constant 300k iops ! 
>>>>> 
>>>>> 
>>>>> I wonder if it could be improve by binding osd daemons to specific 
>>>>> numa node. 
>>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also 
>>>>> require ceph.conf osd threads tunning. 
>>>>> 
>>>>> 
>>>>> 
>>>>> ----- Mail original ----- 
>>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23 
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>>> daemon improve performance from 100k iops to 300k iops 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < 
>>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote: 
>>>>> 
>>>>> 
>>>>> I wonder if it could be numa related, 
>>>>> 
>>>>> I'm using centos 7.1, 
>>>>> and auto numa balacning is enabled 
>>>>> 
>>>>> cat /proc/sys/kernel/numa_balancing = 1 
>>>>> 
>>>>> Maybe osd daemon access to buffer on wrong numa node. 
>>>>> 
>>>>> I'll try to reproduce the problem 
>>>>> 
>>>>> 
>>>>> 
>>>>> Can you force the degenerate case using numactl? To either affirm or 
>>>>> deny your suspicion. 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ----- Mail original ----- 
>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" < 
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05 
>>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
>>>>> improve performance from 100k iops to 300k iops 
>>>>> 
>>>>> Hi, 
>>>>> 
>>>>> I was doing some benchmarks, 
>>>>> I have found an strange behaviour. 
>>>>> 
>>>>> Using fio with rbd engine, I was able to reach around 100k iops. 
>>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>>> 
>>>>> then after restarting all osd daemons, 
>>>>> 
>>>>> the same fio benchmark show now around 300k iops. 
>>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>>> 
>>>>> 
>>>>> any ideas? 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> before restarting osd 
>>>>> --------------------- 
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>>> ioengine=rbd, iodepth=32 ... 
>>>>> fio-2.2.7-10-g51e9 
>>>>> Starting 10 processes 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
>>>>> iops] [eta 14m:45s] 
>>>>> fio: terminating on signal 2 
>>>>> 
>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
>>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
>>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
>>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
>>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
>>>>> (usec): 
>>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
>>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
>>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
>>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
>>>>> | 99.99th=[47360] 
>>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
>>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
>>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
>>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
>>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
>>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
>>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
>>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
>>>>> depth=32 
>>>>> 
>>>>> Run status group 0 (all jobs): 
>>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
>>>>> mint=26215msec, maxt=26215msec 
>>>>> 
>>>>> Disk stats (read/write): 
>>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>>> ioengine=rbd, iodepth=32 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> AFTER RESTARTING OSDS 
>>>>> ---------------------- 
>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>>> ioengine=rbd, iodepth=32 ... 
>>>>> fio-2.2.7-10-g51e9 
>>>>> Starting 10 processes 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> rbd engine: RBD version: 0.1.9 
>>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
>>>>> iops] [eta 01h:09m:27s] 
>>>>> fio: terminating on signal 2 
>>>>> 
>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
>>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, 
>>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
>>>>> clat 
>>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
>>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
>>>>> (usec): 
>>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
>>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
>>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
>>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
>>>>> | 99.99th=[436224] 
>>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
>>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
>>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
>>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
>>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
>>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
>>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
>>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
>>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
>>>>> target=0, window=0, percentile=100.00%, depth=32 
>>>>> 
>>>>> Run status group 0 (all jobs): 
>>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, 
>>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 
>>>>> 
>>>>> Disk stats (read/write): 
>>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> CEPH LOG 
>>>>> -------- 
>>>>> 
>>>>> before restarting osd 
>>>>> ---------------------- 
>>>>> 
>>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
>>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s 
>>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
>>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s 
>>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
>>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s 
>>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
>>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s 
>>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
>>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s 
>>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
>>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s 
>>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
>>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s 
>>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
>>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s 
>>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
>>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s 
>>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
>>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s 
>>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
>>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s 
>>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
>>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s 
>>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
>>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s 
>>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
>>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s 
>>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
>>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s 
>>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
>>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s 
>>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
>>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s 
>>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
>>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s 
>>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
>>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s 
>>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
>>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s 
>>>>> 
>>>>> 
>>>>> restarting osd 
>>>>> --------------- 
>>>>> 
>>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
>>>>> [INF] osd.0 marked itself down 
>>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
>>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in 
>>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
>>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>>> stale+active+794 
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
>>>>> kB/s rd, 130 op/s 
>>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
>>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in 
>>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
>>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>>> stale+active+794 
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
>>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>>> stale+active+794 
>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
>>>>> [INF] osd.1 marked itself down 
>>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
>>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
>>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>>> stale+active+684 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
>>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
>>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>>> stale+active+684 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
>>>>> [INF] osd.2 marked itself down 
>>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
>>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in 
>>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
>>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>>> stale+active+600 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
>>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in 
>>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
>>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>>> stale+active+600 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
>>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
>>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
>>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
>>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
>>>>> [INF] osd.1 10.7.0.152:6800/14848 boot 
>>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
>>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
>>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>>> stale+active+600 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
>>>>> [INF] osd.3 marked itself down 
>>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
>>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in 
>>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
>>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
>>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 
>>>>> stale+active+506 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
>>>>> [INF] osd.6 marked itself down 
>>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
>>>>> [INF] osd.2 10.7.0.152:6812/15410 boot 
>>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
>>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in 
>>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
>>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>>> stale+active+398 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
>>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in 
>>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
>>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>>> stale+active+398 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
>>>>> [INF] osd.3 10.7.0.152:6816/15971 boot 
>>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
>>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
>>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>>> stale+active+398 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
>>>>> [INF] osd.7 marked itself down 
>>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
>>>>> [INF] osd.0 10.7.0.152:6804/14481 boot 
>>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
>>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
>>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
>>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 
>>>>> stale+active+292 
>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>> used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
>>>>> [INF] osd.6 10.7.0.153:6808/21955 boot 
>>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
>>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in 
>>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
>>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
>>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
>>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
>>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
>>>>> [INF] osd.8 marked itself down 
>>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
>>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in 
>>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
>>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
>>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
>>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
>>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
>>>>> [INF] osd.7 10.7.0.153:6804/22536 boot 
>>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
>>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in 
>>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
>>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
>>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
>>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 
>>>>> GB / 1295 GB avail 
>>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
>>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in 
>>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
>>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
>>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>>> active+undersized+degraded+remapped, 5 peering, 13 active 
>>>>> 
>>>>> 
>>>>> AFTER OSD RESTART 
>>>>> ------------------ 
>>>>> 
>>>>> 
>>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
>>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s 
>>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
>>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
>>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
>>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s 
>>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
>>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s 
>>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
>>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
>>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
>>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s 
>>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
>>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s 
>>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
>>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
>>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
>>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s 
>>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
>>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s 
>>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
>>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s 
>>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
>>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s 
>>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
>>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s 
>>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
>>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
>>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
>>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s 
>>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
>>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s 
>>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
>>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s 
>>>>> 
>>>>> _______________________________________________ 
>>>>> ceph-users mailing list 
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> 
>>>> ________________________________ 
>>>> 
>>>> PLEASE NOTE: The information contained in this electronic mail 
>>>> message is intended only for the use of the designated recipient(s) 
>>>> named above. If the reader of this message is not the intended 
>>>> recipient, you are hereby notified that you have received this 
>>>> message in error and that any review, dissemination, distribution, or 
>>>> copying of this message is strictly prohibited. If you have received 
>>>> this communication in error, please notify the sender by telephone or 
>>>> e-mail (as shown above) immediately and destroy any and all copies of 
>>>> this message in your possession (whether hard copies or 
>>>> electronically stored copies). 
>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> С уважением, Фасихов Ирек Нургаязович 
>>> Моб.: +79229045757 
>>> 
>>> 
>>> 
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>>> the body of a message to majordomo@vger.kernel.org 
>>> <mailto:majordomo@vger.kernel.org> 
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                                                     ` <195300554.685766176.1430116301190.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2015-04-27 15:27                                                                       ` Sage Weil
       [not found]                                                                         ` <alpine.DEB.2.00.1504270827080.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Sage Weil @ 2015-04-27 15:27 UTC (permalink / raw)
  To: Alexandre DERUMIER
  Cc: ceph-users, ceph-devel, Venkateswara Rao Jujjuri, Milosz Tanski

[-- Attachment #1: Type: TEXT/PLAIN, Size: 52714 bytes --]

On Mon, 27 Apr 2015, Alexandre DERUMIER wrote:
> >>If I want to use librados API for performance testing, are there any 
> >>existing benchmark tools which directly accesses librados (not through 
> >>rbd or gateway) 
> 
> you can use "rados bench" from ceph packages
> 
> http://ceph.com/docs/master/man/8/rados/
> 
> "
> bench seconds mode [ -b objsize ] [ -t threads ]
> Benchmark for seconds. The mode can be write, seq, or rand. seq and rand are read benchmarks, either sequential or random. Before running one of the reading benchmarks, run a write benchmark with the ?no-cleanup option. The default object size is 4 MB, and the default number of simulated threads (parallel writes) is 16.
> "

This one creates whole objects.  You might also look at ceph_smalliobench 
(in the ceph-tests package) which is a bit more featureful but less 
friendly to use.

Also, fio has an rbd driver.

sage


> 
> 
> ----- Mail original -----
> De: "Venkateswara Rao Jujjuri" <jujjuri-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> À: "aderumier" <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
> Cc: "Mark Nelson" <mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "Milosz Tanski" <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
> Envoyé: Lundi 27 Avril 2015 08:12:49
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
> 
> If I want to use librados API for performance testing, are there any 
> existing benchmark tools which directly accesses librados (not through 
> rbd or gateway) 
> 
> Thanks in advance, 
> JV 
> 
> On Sun, Apr 26, 2015 at 10:46 PM, Alexandre DERUMIER 
> <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org> wrote: 
> >>>I'll retest tcmalloc, because I was prety sure to have patched it correctly. 
> > 
> > Ok, I really think I have patched tcmalloc wrongly. 
> > I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread). 
> > 
> > So better than jemalloc. 
> > 
> > 
> > ----- Mail original ----- 
> > De: "aderumier" <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org> 
> > À: "Mark Nelson" <mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 
> > Cc: "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org> 
> > Envoyé: Lundi 27 Avril 2015 07:01:21 
> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> > 
> > Hi, 
> > 
> > also another big difference, 
> > 
> > I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc. 
> > 
> > I'll retest tcmalloc, because I was prety sure to have patched it correctly. 
> > 
> > 
> > ----- Mail original ----- 
> > De: "aderumier" <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org> 
> > À: "Mark Nelson" <mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 
> > Cc: "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org> 
> > Envoyé: Samedi 25 Avril 2015 06:45:43 
> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> > 
> >>>We haven't done any kind of real testing on jemalloc, so use at your own 
> >>>peril. Having said that, we've also been very interested in hearing 
> >>>community feedback from folks trying it out, so please feel free to give 
> >>>it a shot. :D 
> > 
> > Some feedback, I have runned bench all the night, no speed regression. 
> > 
> > And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse) 
> > 
> > with tcmalloc : 
> > 
> > 10 fio-rbd jobs = 300k iops 
> > 15 fio-rbd jobs = 290k iops 
> > 20 fio-rbd jobs = 270k iops 
> > 40 fio-rbd jobs = 250k iops 
> > 
> > (all with up and down values during the fio bench) 
> > 
> > 
> > with jemalloc: 
> > 
> > 10 fio-rbd jobs = 300k iops 
> > 15 fio-rbd jobs = 320k iops 
> > 20 fio-rbd jobs = 330k iops 
> > 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%) 
> > 
> > (all with contant values during the fio bench) 
> > 
> > ----- Mail original ----- 
> > De: "Mark Nelson" <mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 
> > À: "Stefan Priebe" <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>, "aderumier" <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org> 
> > Cc: "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Somnath Roy" <Somnath.Roy-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Milosz Tanski" <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org> 
> > Envoyé: Vendredi 24 Avril 2015 20:02:15 
> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> > 
> > We haven't done any kind of real testing on jemalloc, so use at your own 
> > peril. Having said that, we've also been very interested in hearing 
> > community feedback from folks trying it out, so please feel free to give 
> > it a shot. :D 
> > 
> > Mark 
> > 
> > On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote: 
> >> Is jemalloc recommanded in general? Does it also work for firefly? 
> >> 
> >> Stefan 
> >> 
> >> Excuse my typo sent from my mobile phone. 
> >> 
> >> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com 
> >> <mailto:aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>>: 
> >> 
> >>> Hi, 
> >>> 
> >>> I have finished to rebuild ceph with jemalloc, 
> >>> 
> >>> all seem to working fine. 
> >>> 
> >>> I got a constant 300k iops for the moment, so no speed regression. 
> >>> 
> >>> I'll do more long benchmark next week. 
> >>> 
> >>> Regards, 
> >>> 
> >>> Alexandre 
> >>> 
> >>> ----- Mail original ----- 
> >>> De: "Irek Fasikhov" <malmyzh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <mailto:malmyzh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> 
> >>> À: "Somnath Roy" <Somnath.Roy-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org 
> >>> <mailto:Somnath.Roy-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>> 
> >>> Cc: "aderumier" <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org <mailto:aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>>, 
> >>> "Mark Nelson" <mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org <mailto:mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>>, 
> >>> "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> >>> <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>>, "ceph-devel" 
> >>> <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org <mailto:ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>>, 
> >>> "Milosz Tanski" <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org <mailto:milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>> 
> >>> Envoyé: Vendredi 24 Avril 2015 13:37:52 
> >>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> >>> daemon improve performance from 100k iops to 300k iops 
> >>> 
> >>> Hi,Alexandre! 
> >>> Do not try to change the parameter vm.min_free_kbytes? 
> >>> 
> >>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org 
> >>> <mailto:Somnath.Roy-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> > : 
> >>> 
> >>> 
> >>> Alexandre, 
> >>> You can configure with --with-jemalloc or ./do_autogen -J to build 
> >>> ceph with jemalloc. 
> >>> 
> >>> Thanks & Regards 
> >>> Somnath 
> >>> 
> >>> -----Original Message----- 
> >>> From: ceph-users [mailto: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> >>> <mailto:ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> ] On Behalf Of Alexandre 
> >>> DERUMIER 
> >>> Sent: Thursday, April 23, 2015 4:56 AM 
> >>> To: Mark Nelson 
> >>> Cc: ceph-users; ceph-devel; Milosz Tanski 
> >>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
> >>> daemon improve performance from 100k iops to 300k iops 
> >>> 
> >>>>> If you have the means to compile the same version of ceph with 
> >>>>> jemalloc, I would be very interested to see how it does. 
> >>> 
> >>> Yes, sure. (I have around 3-4 weeks to do all the benchs) 
> >>> 
> >>> But I don't know how to do it ? 
> >>> I'm running the cluster on centos7.1, maybe it can be easy to patch 
> >>> the srpms to rebuild the package with jemalloc. 
> >>> 
> >>> 
> >>> 
> >>> ----- Mail original ----- 
> >>> De: "Mark Nelson" < mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org <mailto:mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > 
> >>> À: "aderumier" < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org <mailto:aderumier@odiso.com> >, 
> >>> "Srinivasula Maram" < Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org 
> >>> <mailto:Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> > 
> >>> Cc: "ceph-users" < ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> >>> <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> >, "ceph-devel" < 
> >>> ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org <mailto:ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> >, 
> >>> "Milosz Tanski" < milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org <mailto:milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org> > 
> >>> Envoyé: Jeudi 23 Avril 2015 13:33:00 
> >>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> >>> daemon improve performance from 100k iops to 300k iops 
> >>> 
> >>> Thanks for the testing Alexandre! 
> >>> 
> >>> If you have the means to compile the same version of ceph with 
> >>> jemalloc, I would be very interested to see how it does. 
> >>> 
> >>> In some ways I'm glad it turned out not to be NUMA. I still suspect we 
> >>> will have to deal with it at some point, but perhaps not today. ;) 
> >>> 
> >>> Mark 
> >>> 
> >>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: 
> >>>> Maybe it's tcmalloc related 
> >>>> I thinked to have patched it correctly, but perf show a lot of 
> >>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
> >>>> 
> >>>> before osd restart (100k) 
> >>>> ------------------ 
> >>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] 
> >>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
> >>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] 
> >>>> tcmalloc::CentralFreeList::FetchFromSpans 
> >>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] 
> >>>> tcmalloc::CentralFreeList::ReleaseToSpans 
> >>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper 
> >>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 
> >>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans 
> >>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd 
> >>>> libstdc++.so.6.0.19 [.] std::basic_string<char, 
> >>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91% 
> >>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd 
> >>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] 
> >>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] 
> >>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] 
> >>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] 
> >>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] 
> >>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] 
> >>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57% 
> >>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd 
> >>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] 
> >>>> [k] __schedule 
> >>>> 
> >>>> 
> >>>> after osd restart (300k iops) 
> >>>> ------------------------------ 
> >>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd 
> >>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper 
> >>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 
> >>>> [.] std::basic_string<char, std::char_traits<char>, 
> >>>> std::allocator<char> >::basic_string 1.34% ceph-osd 
> >>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 
> >>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd 
> >>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] 
> >>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] 
> >>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] 
> >>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] 
> >>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
> >>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd 
> >>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] 
> >>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% 
> >>>> ceph-osd libtcmalloc.so.4.1.2 [.] 
> >>>> tcmalloc::CentralFreeList::FetchFromSpans 
> >>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% 
> >>>> ceph-osd ceph-osd [.] 
> >>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 
> >>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd 
> >>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.] 
> >>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd 
> >>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd 
> >>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] 
> >>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light 
> >>>> 
> >>>> ----- Mail original ----- 
> >>>> De: "aderumier" < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org <mailto:aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org> > 
> >>>> À: "Srinivasula Maram" < Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org 
> >>>> <mailto:Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> > 
> >>>> Cc: "ceph-users" < ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> >>>> <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> >, "ceph-devel" 
> >>>> < ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org <mailto:ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> >, 
> >>>> "Milosz Tanski" < milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org <mailto:milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org> > 
> >>>> Envoyé: Jeudi 23 Avril 2015 10:00:34 
> >>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> >>>> daemon improve performance from 100k iops to 300k iops 
> >>>> 
> >>>> Hi, 
> >>>> I'm hitting this bug again today. 
> >>>> 
> >>>> So don't seem to be numa related (I have try to flush linux buffer to 
> >>>> be sure). 
> >>>> 
> >>>> and tcmalloc is patched (I don't known how to verify that it's ok). 
> >>>> 
> >>>> I don't have restarted osd yet. 
> >>>> 
> >>>> Maybe some perf trace could be usefulll ? 
> >>>> 
> >>>> 
> >>>> ----- Mail original ----- 
> >>>> De: "aderumier" < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org <mailto:aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org> > 
> >>>> À: "Srinivasula Maram" < Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org 
> >>>> <mailto:Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> > 
> >>>> Cc: "ceph-users" < ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> >>>> <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> >, "ceph-devel" 
> >>>> < ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org <mailto:ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> >, 
> >>>> "Milosz Tanski" < milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org <mailto:milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org> > 
> >>>> Envoyé: Mercredi 22 Avril 2015 18:30:26 
> >>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> >>>> daemon improve performance from 100k iops to 300k iops 
> >>>> 
> >>>> Hi, 
> >>>> 
> >>>>>> I feel it is due to tcmalloc issue 
> >>>> 
> >>>> Indeed, I had patched one of my node, but not the other. 
> >>>> So maybe I have hit this bug. (but I can't confirm, I don't have 
> >>>> traces). 
> >>>> 
> >>>> But numa interleaving seem to help in my case (maybe not from 
> >>>> 100->300k, but 250k->300k). 
> >>>> 
> >>>> I need to do more long tests to confirm that. 
> >>>> 
> >>>> 
> >>>> ----- Mail original ----- 
> >>>> De: "Srinivasula Maram" < Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org 
> >>>> <mailto:Srinivasula.Maram-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> > 
> >>>> À: "Mark Nelson" < mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org <mailto:mnelson@redhat.com> >, 
> >>>> "aderumier" 
> >>>> < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org <mailto:aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org> >, "Milosz Tanski" 
> >>>> < milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org <mailto:milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org> > 
> >>>> Cc: "ceph-devel" < ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org 
> >>>> <mailto:ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> >, "ceph-users" 
> >>>> < ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> > 
> >>>> Envoyé: Mercredi 22 Avril 2015 16:34:33 
> >>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd 
> >>>> daemon improve performance from 100k iops to 300k iops 
> >>>> 
> >>>> I feel it is due to tcmalloc issue 
> >>>> 
> >>>> I have seen similar issue in my setup after 20 days. 
> >>>> 
> >>>> Thanks, 
> >>>> Srinivas 
> >>>> 
> >>>> 
> >>>> 
> >>>> -----Original Message----- 
> >>>> From: ceph-users [mailto: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> >>>> <mailto:ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> ] On Behalf 
> >>>> Of Mark Nelson 
> >>>> Sent: Wednesday, April 22, 2015 7:31 PM 
> >>>> To: Alexandre DERUMIER; Milosz Tanski 
> >>>> Cc: ceph-devel; ceph-users 
> >>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
> >>>> daemon improve performance from 100k iops to 300k iops 
> >>>> 
> >>>> Hi Alexandre, 
> >>>> 
> >>>> We should discuss this at the perf meeting today. We knew NUMA node 
> >>>> affinity issues were going to crop up sooner or later (and indeed 
> >>>> already have in some cases), but this is pretty major. It's probably 
> >>>> time to really dig in and figure out how to deal with this. 
> >>>> 
> >>>> Note: this is one of the reasons I like small nodes with single 
> >>>> sockets and fewer OSDs. 
> >>>> 
> >>>> Mark 
> >>>> 
> >>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
> >>>>> Hi, 
> >>>>> 
> >>>>> I have done a lot of test today, and it seem indeed numa related. 
> >>>>> 
> >>>>> My numastat was 
> >>>>> 
> >>>>> # numastat 
> >>>>> node0 node1 
> >>>>> numa_hit 99075422 153976877 
> >>>>> numa_miss 167490965 1493663 
> >>>>> numa_foreign 1493663 167491417 
> >>>>> interleave_hit 157745 167015 
> >>>>> local_node 99049179 153830554 
> >>>>> other_node 167517697 1639986 
> >>>>> 
> >>>>> So, a lot of miss. 
> >>>>> 
> >>>>> In this case , I can reproduce ios going from 85k to 300k iops, up 
> >>>>> and down. 
> >>>>> 
> >>>>> now setting 
> >>>>> echo 0 > /proc/sys/kernel/numa_balancing 
> >>>>> 
> >>>>> and starting osd daemons with 
> >>>>> 
> >>>>> numactl --interleave=all /usr/bin/ceph-osd 
> >>>>> 
> >>>>> 
> >>>>> I have a constant 300k iops ! 
> >>>>> 
> >>>>> 
> >>>>> I wonder if it could be improve by binding osd daemons to specific 
> >>>>> numa node. 
> >>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also 
> >>>>> require ceph.conf osd threads tunning. 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> ----- Mail original ----- 
> >>>>> De: "Milosz Tanski" < milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org <mailto:milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org> > 
> >>>>> À: "aderumier" < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org <mailto:aderumier@odiso.com> > 
> >>>>> Cc: "ceph-devel" < ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org 
> >>>>> <mailto:ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> >, "ceph-users" 
> >>>>> < ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> > 
> >>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23 
> >>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
> >>>>> daemon improve performance from 100k iops to 300k iops 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < 
> >>>>> aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org <mailto:aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org> > wrote: 
> >>>>> 
> >>>>> 
> >>>>> I wonder if it could be numa related, 
> >>>>> 
> >>>>> I'm using centos 7.1, 
> >>>>> and auto numa balacning is enabled 
> >>>>> 
> >>>>> cat /proc/sys/kernel/numa_balancing = 1 
> >>>>> 
> >>>>> Maybe osd daemon access to buffer on wrong numa node. 
> >>>>> 
> >>>>> I'll try to reproduce the problem 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> Can you force the degenerate case using numactl? To either affirm or 
> >>>>> deny your suspicion. 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> ----- Mail original ----- 
> >>>>> De: "aderumier" < aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org <mailto:aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org> > 
> >>>>> À: "ceph-devel" < ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org 
> >>>>> <mailto:ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> >, "ceph-users" < 
> >>>>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> > 
> >>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05 
> >>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
> >>>>> improve performance from 100k iops to 300k iops 
> >>>>> 
> >>>>> Hi, 
> >>>>> 
> >>>>> I was doing some benchmarks, 
> >>>>> I have found an strange behaviour. 
> >>>>> 
> >>>>> Using fio with rbd engine, I was able to reach around 100k iops. 
> >>>>> (osd datas in linux buffer, iostat show 0% disk access) 
> >>>>> 
> >>>>> then after restarting all osd daemons, 
> >>>>> 
> >>>>> the same fio benchmark show now around 300k iops. 
> >>>>> (osd datas in linux buffer, iostat show 0% disk access) 
> >>>>> 
> >>>>> 
> >>>>> any ideas? 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> before restarting osd 
> >>>>> --------------------- 
> >>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> >>>>> ioengine=rbd, iodepth=32 ... 
> >>>>> fio-2.2.7-10-g51e9 
> >>>>> Starting 10 processes 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
> >>>>> iops] [eta 14m:45s] 
> >>>>> fio: terminating on signal 2 
> >>>>> 
> >>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
> >>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
> >>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
> >>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
> >>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
> >>>>> (usec): 
> >>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
> >>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
> >>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
> >>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
> >>>>> | 99.99th=[47360] 
> >>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
> >>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
> >>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
> >>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
> >>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
> >>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
> >>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
> >>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
> >>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
> >>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
> >>>>> depth=32 
> >>>>> 
> >>>>> Run status group 0 (all jobs): 
> >>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
> >>>>> mint=26215msec, maxt=26215msec 
> >>>>> 
> >>>>> Disk stats (read/write): 
> >>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
> >>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> >>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> >>>>> ioengine=rbd, iodepth=32 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> AFTER RESTARTING OSDS 
> >>>>> ---------------------- 
> >>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
> >>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
> >>>>> ioengine=rbd, iodepth=32 ... 
> >>>>> fio-2.2.7-10-g51e9 
> >>>>> Starting 10 processes 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> rbd engine: RBD version: 0.1.9 
> >>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
> >>>>> iops] [eta 01h:09m:27s] 
> >>>>> fio: terminating on signal 2 
> >>>>> 
> >>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
> >>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, 
> >>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
> >>>>> clat 
> >>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
> >>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
> >>>>> (usec): 
> >>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
> >>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
> >>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
> >>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
> >>>>> | 99.99th=[436224] 
> >>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
> >>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
> >>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
> >>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
> >>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
> >>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
> >>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
> >>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
> >>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
> >>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
> >>>>> target=0, window=0, percentile=100.00%, depth=32 
> >>>>> 
> >>>>> Run status group 0 (all jobs): 
> >>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, 
> >>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 
> >>>>> 
> >>>>> Disk stats (read/write): 
> >>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> CEPH LOG 
> >>>>> -------- 
> >>>>> 
> >>>>> before restarting osd 
> >>>>> ---------------------- 
> >>>>> 
> >>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
> >>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s 
> >>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
> >>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s 
> >>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
> >>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s 
> >>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
> >>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s 
> >>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
> >>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s 
> >>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
> >>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s 
> >>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
> >>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s 
> >>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
> >>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s 
> >>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
> >>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s 
> >>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
> >>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s 
> >>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
> >>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s 
> >>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
> >>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s 
> >>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
> >>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s 
> >>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
> >>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s 
> >>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
> >>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s 
> >>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
> >>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s 
> >>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
> >>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s 
> >>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
> >>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s 
> >>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
> >>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s 
> >>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
> >>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
> >>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s 
> >>>>> 
> >>>>> 
> >>>>> restarting osd 
> >>>>> --------------- 
> >>>>> 
> >>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
> >>>>> [INF] osd.0 marked itself down 
> >>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
> >>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in 
> >>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
> >>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
> >>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
> >>>>> stale+active+794 
> >>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
> >>>>> kB/s rd, 130 op/s 
> >>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
> >>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in 
> >>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
> >>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
> >>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
> >>>>> stale+active+794 
> >>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
> >>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
> >>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
> >>>>> stale+active+794 
> >>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
> >>>>> [INF] osd.1 marked itself down 
> >>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
> >>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in 
> >>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
> >>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
> >>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
> >>>>> stale+active+684 
> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> >>>>> used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
> >>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in 
> >>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
> >>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
> >>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
> >>>>> stale+active+684 
> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> >>>>> used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
> >>>>> [INF] osd.2 marked itself down 
> >>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
> >>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in 
> >>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
> >>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
> >>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
> >>>>> stale+active+600 
> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> >>>>> used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
> >>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in 
> >>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
> >>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
> >>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
> >>>>> stale+active+600 
> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> >>>>> used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
> >>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
> >>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
> >>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
> >>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
> >>>>> [INF] osd.1 10.7.0.152:6800/14848 boot 
> >>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
> >>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in 
> >>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
> >>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
> >>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
> >>>>> stale+active+600 
> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> >>>>> used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
> >>>>> [INF] osd.3 marked itself down 
> >>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
> >>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in 
> >>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
> >>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
> >>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 
> >>>>> stale+active+506 
> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> >>>>> used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
> >>>>> [INF] osd.6 marked itself down 
> >>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
> >>>>> [INF] osd.2 10.7.0.152:6812/15410 boot 
> >>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
> >>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in 
> >>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
> >>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
> >>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
> >>>>> stale+active+398 
> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> >>>>> used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
> >>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in 
> >>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
> >>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
> >>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
> >>>>> stale+active+398 
> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> >>>>> used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
> >>>>> [INF] osd.3 10.7.0.152:6816/15971 boot 
> >>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
> >>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in 
> >>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
> >>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
> >>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
> >>>>> stale+active+398 
> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> >>>>> used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
> >>>>> [INF] osd.7 marked itself down 
> >>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
> >>>>> [INF] osd.0 10.7.0.152:6804/14481 boot 
> >>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
> >>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in 
> >>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
> >>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
> >>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 
> >>>>> stale+active+292 
> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
> >>>>> used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
> >>>>> [INF] osd.6 10.7.0.153:6808/21955 boot 
> >>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
> >>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in 
> >>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
> >>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
> >>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
> >>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
> >>>>> 420 GB used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
> >>>>> [INF] osd.8 marked itself down 
> >>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
> >>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in 
> >>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
> >>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
> >>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
> >>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
> >>>>> 420 GB used, 874 GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
> >>>>> [INF] osd.7 10.7.0.153:6804/22536 boot 
> >>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
> >>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in 
> >>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
> >>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
> >>>>> stale+active+remapped, 503 stale+active+clean, 4 
> >>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
> >>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 
> >>>>> GB / 1295 GB avail 
> >>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
> >>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in 
> >>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
> >>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
> >>>>> stale+active+remapped, 503 stale+active+clean, 4 
> >>>>> active+undersized+degraded+remapped, 5 peering, 13 active 
> >>>>> 
> >>>>> 
> >>>>> AFTER OSD RESTART 
> >>>>> ------------------ 
> >>>>> 
> >>>>> 
> >>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
> >>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s 
> >>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
> >>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
> >>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
> >>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s 
> >>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
> >>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s 
> >>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
> >>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
> >>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
> >>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s 
> >>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
> >>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s 
> >>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
> >>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
> >>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
> >>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s 
> >>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
> >>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s 
> >>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
> >>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s 
> >>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
> >>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s 
> >>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
> >>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s 
> >>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
> >>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
> >>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
> >>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s 
> >>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
> >>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s 
> >>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
> >>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
> >>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s 
> >>>>> 
> >>>>> _______________________________________________ 
> >>>>> ceph-users mailing list 
> >>>>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> 
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> >>>>> 
> >>>> _______________________________________________ 
> >>>> ceph-users mailing list 
> >>>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> 
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> >>>> 
> >>>> ________________________________ 
> >>>> 
> >>>> PLEASE NOTE: The information contained in this electronic mail 
> >>>> message is intended only for the use of the designated recipient(s) 
> >>>> named above. If the reader of this message is not the intended 
> >>>> recipient, you are hereby notified that you have received this 
> >>>> message in error and that any review, dissemination, distribution, or 
> >>>> copying of this message is strictly prohibited. If you have received 
> >>>> this communication in error, please notify the sender by telephone or 
> >>>> e-mail (as shown above) immediately and destroy any and all copies of 
> >>>> this message in your possession (whether hard copies or 
> >>>> electronically stored copies). 
> >>>> 
> >>>> _______________________________________________ 
> >>>> ceph-users mailing list 
> >>>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> 
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> >>>> 
> >>>> _______________________________________________ 
> >>>> ceph-users mailing list 
> >>>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> 
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> >>>> 
> >>> _______________________________________________ 
> >>> ceph-users mailing list 
> >>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> 
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> >>> _______________________________________________ 
> >>> ceph-users mailing list 
> >>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org <mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org> 
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> -- 
> >>> ? ?????????, ??????? ???? ??????????? 
> >>> ???.: +79229045757 
> >>> 
> >>> 
> >>> 
> >>> -- 
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> >>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org 
> >>> <mailto:majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> 
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> > 
> > 
> > _______________________________________________ 
> > ceph-users mailing list 
> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > 
> > 
> > 
> > _______________________________________________ 
> > ceph-users mailing list 
> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > 
> > 
> > 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org 
> > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 
> 
> -- 
> Jvrao 
> --- 
> First they ignore you, then they laugh at you, then they fight you, 
> then you win. - Mahatma Gandhi 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                                                         ` <alpine.DEB.2.00.1504270827080.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2015-04-27 15:50                                                                           ` Dan van der Ster
  0 siblings, 0 replies; 35+ messages in thread
From: Dan van der Ster @ 2015-04-27 15:50 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-users, Milosz Tanski, ceph-devel, Venkateswara Rao Jujjuri

Hi Sage, Alexandre et al.

Here's another data point... we noticed something similar awhile ago.

After we restart our OSDs the "4kB object write latency" [1]
temporarily drops from ~8-10ms down to around 3-4ms. Then slowly over
time the latency increases back to 8-10ms. The time that the OSDs stay
with low latency is a function of how much work those OSDs are doing
(i.e. on our idle test cluster, they stay with low latency for a
couple hours; on our production cluster the latency is high again
pretty much immediately).

We also attributed this to the
tcmalloc::ThreadCache::ReleaseToCentralCache issue, since that
function is always very high %-wise in perf top. And finally today we
managed to get the fixed tcmalloc [2] on our el6 servers and tried the
larger cache. And as we expected, with 128M cache size [3] the latency
is staying low (actually below 3ms on the test cluster vs 9ms earlier
today).

We should probably send a patched init script option to make this configurable.

Cheers, Dan


[1] rados bench -p test -b 4096 -t 1
[2] rpmbuild --rebuild
https://kojipkgs.fedoraproject.org//packages/gperftools/2.4/1.fc23/src/gperftools-2.4-1.fc23.src.rpm
[3]

--- /tmp/ceph 2015-04-27 17:43:56.726216645 +0200
+++ /etc/init.d/ceph 2015-04-27 17:21:58.567859403 +0200
@@ -306,7 +306,7 @@
     if [ -n "$SYSTEMD_RUN" ]; then
  cmd="$SYSTEMD_RUN -r bash -c '$files $cmd --cluster $cluster -f'"
     else
- cmd="$files $wrap $cmd --cluster $cluster $runmode"
+ cmd="export TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728; $files
$wrap $cmd --cluster $cluster $runmode"
     fi

     if [ $dofsmount -eq 1 ] && [ -n "$fs_devs" ]; then

On Mon, Apr 27, 2015 at 5:27 PM, Sage Weil <sage@newdream.net> wrote:
> On Mon, 27 Apr 2015, Alexandre DERUMIER wrote:
>> >>If I want to use librados API for performance testing, are there any
>> >>existing benchmark tools which directly accesses librados (not through
>> >>rbd or gateway)
>>
>> you can use "rados bench" from ceph packages
>>
>> http://ceph.com/docs/master/man/8/rados/
>>
>> "
>> bench seconds mode [ -b objsize ] [ -t threads ]
>> Benchmark for seconds. The mode can be write, seq, or rand. seq and rand are read benchmarks, either sequential or random. Before running one of the reading benchmarks, run a write benchmark with the ?no-cleanup option. The default object size is 4 MB, and the default number of simulated threads (parallel writes) is 16.
>> "
>
> This one creates whole objects.  You might also look at ceph_smalliobench
> (in the ceph-tests package) which is a bit more featureful but less
> friendly to use.
>
> Also, fio has an rbd driver.
>
> sage
>
>
>>
>>
>> ----- Mail original -----
>> De: "Venkateswara Rao Jujjuri" <jujjuri@gmail.com>
>> À: "aderumier" <aderumier@odiso.com>
>> Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>> Envoyé: Lundi 27 Avril 2015 08:12:49
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>>
>> If I want to use librados API for performance testing, are there any
>> existing benchmark tools which directly accesses librados (not through
>> rbd or gateway)
>>
>> Thanks in advance,
>> JV
>>
>> On Sun, Apr 26, 2015 at 10:46 PM, Alexandre DERUMIER
>> <aderumier@odiso.com> wrote:
>> >>>I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>> >
>> > Ok, I really think I have patched tcmalloc wrongly.
>> > I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread).
>> >
>> > So better than jemalloc.
>> >
>> >
>> > ----- Mail original -----
>> > De: "aderumier" <aderumier@odiso.com>
>> > À: "Mark Nelson" <mnelson@redhat.com>
>> > Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>> > Envoyé: Lundi 27 Avril 2015 07:01:21
>> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>> >
>> > Hi,
>> >
>> > also another big difference,
>> >
>> > I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc.
>> >
>> > I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>> >
>> >
>> > ----- Mail original -----
>> > De: "aderumier" <aderumier@odiso.com>
>> > À: "Mark Nelson" <mnelson@redhat.com>
>> > Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>> > Envoyé: Samedi 25 Avril 2015 06:45:43
>> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>> >
>> >>>We haven't done any kind of real testing on jemalloc, so use at your own
>> >>>peril. Having said that, we've also been very interested in hearing
>> >>>community feedback from folks trying it out, so please feel free to give
>> >>>it a shot. :D
>> >
>> > Some feedback, I have runned bench all the night, no speed regression.
>> >
>> > And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse)
>> >
>> > with tcmalloc :
>> >
>> > 10 fio-rbd jobs = 300k iops
>> > 15 fio-rbd jobs = 290k iops
>> > 20 fio-rbd jobs = 270k iops
>> > 40 fio-rbd jobs = 250k iops
>> >
>> > (all with up and down values during the fio bench)
>> >
>> >
>> > with jemalloc:
>> >
>> > 10 fio-rbd jobs = 300k iops
>> > 15 fio-rbd jobs = 320k iops
>> > 20 fio-rbd jobs = 330k iops
>> > 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%)
>> >
>> > (all with contant values during the fio bench)
>> >
>> > ----- Mail original -----
>> > De: "Mark Nelson" <mnelson@redhat.com>
>> > À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>
>> > Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com>
>> > Envoyé: Vendredi 24 Avril 2015 20:02:15
>> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>> >
>> > We haven't done any kind of real testing on jemalloc, so use at your own
>> > peril. Having said that, we've also been very interested in hearing
>> > community feedback from folks trying it out, so please feel free to give
>> > it a shot. :D
>> >
>> > Mark
>> >
>> > On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote:
>> >> Is jemalloc recommanded in general? Does it also work for firefly?
>> >>
>> >> Stefan
>> >>
>> >> Excuse my typo sent from my mobile phone.
>> >>
>> >> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com
>> >> <mailto:aderumier@odiso.com>>:
>> >>
>> >>> Hi,
>> >>>
>> >>> I have finished to rebuild ceph with jemalloc,
>> >>>
>> >>> all seem to working fine.
>> >>>
>> >>> I got a constant 300k iops for the moment, so no speed regression.
>> >>>
>> >>> I'll do more long benchmark next week.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Alexandre
>> >>>
>> >>> ----- Mail original -----
>> >>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>>
>> >>> À: "Somnath Roy" <Somnath.Roy@sandisk.com
>> >>> <mailto:Somnath.Roy@sandisk.com>>
>> >>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>,
>> >>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>,
>> >>> "ceph-users" <ceph-users@lists.ceph.com
>> >>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel"
>> >>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>,
>> >>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>>
>> >>> Envoyé: Vendredi 24 Avril 2015 13:37:52
>> >>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> >>> daemon improve performance from 100k iops to 300k iops
>> >>>
>> >>> Hi,Alexandre!
>> >>> Do not try to change the parameter vm.min_free_kbytes?
>> >>>
>> >>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com
>> >>> <mailto:Somnath.Roy@sandisk.com> > :
>> >>>
>> >>>
>> >>> Alexandre,
>> >>> You can configure with --with-jemalloc or ./do_autogen -J to build
>> >>> ceph with jemalloc.
>> >>>
>> >>> Thanks & Regards
>> >>> Somnath
>> >>>
>> >>> -----Original Message-----
>> >>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>> >>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre
>> >>> DERUMIER
>> >>> Sent: Thursday, April 23, 2015 4:56 AM
>> >>> To: Mark Nelson
>> >>> Cc: ceph-users; ceph-devel; Milosz Tanski
>> >>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd
>> >>> daemon improve performance from 100k iops to 300k iops
>> >>>
>> >>>>> If you have the means to compile the same version of ceph with
>> >>>>> jemalloc, I would be very interested to see how it does.
>> >>>
>> >>> Yes, sure. (I have around 3-4 weeks to do all the benchs)
>> >>>
>> >>> But I don't know how to do it ?
>> >>> I'm running the cluster on centos7.1, maybe it can be easy to patch
>> >>> the srpms to rebuild the package with jemalloc.
>> >>>
>> >>>
>> >>>
>> >>> ----- Mail original -----
>> >>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >
>> >>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >,
>> >>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>> >>> <mailto:Srinivasula.Maram@sandisk.com> >
>> >>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>> >>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" <
>> >>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>> >>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>> >>> Envoyé: Jeudi 23 Avril 2015 13:33:00
>> >>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> >>> daemon improve performance from 100k iops to 300k iops
>> >>>
>> >>> Thanks for the testing Alexandre!
>> >>>
>> >>> If you have the means to compile the same version of ceph with
>> >>> jemalloc, I would be very interested to see how it does.
>> >>>
>> >>> In some ways I'm glad it turned out not to be NUMA. I still suspect we
>> >>> will have to deal with it at some point, but perhaps not today. ;)
>> >>>
>> >>> Mark
>> >>>
>> >>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote:
>> >>>> Maybe it's tcmalloc related
>> >>>> I thinked to have patched it correctly, but perf show a lot of
>> >>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>> >>>>
>> >>>> before osd restart (100k)
>> >>>> ------------------
>> >>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.]
>> >>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>> >>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.]
>> >>>> tcmalloc::CentralFreeList::FetchFromSpans
>> >>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.]
>> >>>> tcmalloc::CentralFreeList::ReleaseToSpans
>> >>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper
>> >>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2
>> >>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans
>> >>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd
>> >>>> libstdc++.so.6.0.19 [.] std::basic_string<char,
>> >>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91%
>> >>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd
>> >>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.]
>> >>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k]
>> >>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.]
>> >>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k]
>> >>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k]
>> >>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k]
>> >>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57%
>> >>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd
>> >>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms]
>> >>>> [k] __schedule
>> >>>>
>> >>>>
>> >>>> after osd restart (300k iops)
>> >>>> ------------------------------
>> >>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd
>> >>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper
>> >>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19
>> >>>> [.] std::basic_string<char, std::char_traits<char>,
>> >>>> std::allocator<char> >::basic_string 1.34% ceph-osd
>> >>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache
>> >>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd
>> >>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.]
>> >>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k]
>> >>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.]
>> >>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k]
>> >>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu
>> >>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd
>> >>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k]
>> >>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65%
>> >>>> ceph-osd libtcmalloc.so.4.1.2 [.]
>> >>>> tcmalloc::CentralFreeList::FetchFromSpans
>> >>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61%
>> >>>> ceph-osd ceph-osd [.]
>> >>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
>> >>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd
>> >>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.]
>> >>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd
>> >>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd
>> >>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.]
>> >>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light
>> >>>>
>> >>>> ----- Mail original -----
>> >>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>> >>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>> >>>> <mailto:Srinivasula.Maram@sandisk.com> >
>> >>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>> >>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>> >>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>> >>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>> >>>> Envoyé: Jeudi 23 Avril 2015 10:00:34
>> >>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> >>>> daemon improve performance from 100k iops to 300k iops
>> >>>>
>> >>>> Hi,
>> >>>> I'm hitting this bug again today.
>> >>>>
>> >>>> So don't seem to be numa related (I have try to flush linux buffer to
>> >>>> be sure).
>> >>>>
>> >>>> and tcmalloc is patched (I don't known how to verify that it's ok).
>> >>>>
>> >>>> I don't have restarted osd yet.
>> >>>>
>> >>>> Maybe some perf trace could be usefulll ?
>> >>>>
>> >>>>
>> >>>> ----- Mail original -----
>> >>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>> >>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>> >>>> <mailto:Srinivasula.Maram@sandisk.com> >
>> >>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>> >>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>> >>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>> >>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>> >>>> Envoyé: Mercredi 22 Avril 2015 18:30:26
>> >>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> >>>> daemon improve performance from 100k iops to 300k iops
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>>>> I feel it is due to tcmalloc issue
>> >>>>
>> >>>> Indeed, I had patched one of my node, but not the other.
>> >>>> So maybe I have hit this bug. (but I can't confirm, I don't have
>> >>>> traces).
>> >>>>
>> >>>> But numa interleaving seem to help in my case (maybe not from
>> >>>> 100->300k, but 250k->300k).
>> >>>>
>> >>>> I need to do more long tests to confirm that.
>> >>>>
>> >>>>
>> >>>> ----- Mail original -----
>> >>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>> >>>> <mailto:Srinivasula.Maram@sandisk.com> >
>> >>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >,
>> >>>> "aderumier"
>> >>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski"
>> >>>> < milosz@adfin.com <mailto:milosz@adfin.com> >
>> >>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>> >>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>> >>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>> >>>> Envoyé: Mercredi 22 Avril 2015 16:34:33
>> >>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd
>> >>>> daemon improve performance from 100k iops to 300k iops
>> >>>>
>> >>>> I feel it is due to tcmalloc issue
>> >>>>
>> >>>> I have seen similar issue in my setup after 20 days.
>> >>>>
>> >>>> Thanks,
>> >>>> Srinivas
>> >>>>
>> >>>>
>> >>>>
>> >>>> -----Original Message-----
>> >>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>> >>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf
>> >>>> Of Mark Nelson
>> >>>> Sent: Wednesday, April 22, 2015 7:31 PM
>> >>>> To: Alexandre DERUMIER; Milosz Tanski
>> >>>> Cc: ceph-devel; ceph-users
>> >>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd
>> >>>> daemon improve performance from 100k iops to 300k iops
>> >>>>
>> >>>> Hi Alexandre,
>> >>>>
>> >>>> We should discuss this at the perf meeting today. We knew NUMA node
>> >>>> affinity issues were going to crop up sooner or later (and indeed
>> >>>> already have in some cases), but this is pretty major. It's probably
>> >>>> time to really dig in and figure out how to deal with this.
>> >>>>
>> >>>> Note: this is one of the reasons I like small nodes with single
>> >>>> sockets and fewer OSDs.
>> >>>>
>> >>>> Mark
>> >>>>
>> >>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
>> >>>>> Hi,
>> >>>>>
>> >>>>> I have done a lot of test today, and it seem indeed numa related.
>> >>>>>
>> >>>>> My numastat was
>> >>>>>
>> >>>>> # numastat
>> >>>>> node0 node1
>> >>>>> numa_hit 99075422 153976877
>> >>>>> numa_miss 167490965 1493663
>> >>>>> numa_foreign 1493663 167491417
>> >>>>> interleave_hit 157745 167015
>> >>>>> local_node 99049179 153830554
>> >>>>> other_node 167517697 1639986
>> >>>>>
>> >>>>> So, a lot of miss.
>> >>>>>
>> >>>>> In this case , I can reproduce ios going from 85k to 300k iops, up
>> >>>>> and down.
>> >>>>>
>> >>>>> now setting
>> >>>>> echo 0 > /proc/sys/kernel/numa_balancing
>> >>>>>
>> >>>>> and starting osd daemons with
>> >>>>>
>> >>>>> numactl --interleave=all /usr/bin/ceph-osd
>> >>>>>
>> >>>>>
>> >>>>> I have a constant 300k iops !
>> >>>>>
>> >>>>>
>> >>>>> I wonder if it could be improve by binding osd daemons to specific
>> >>>>> numa node.
>> >>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also
>> >>>>> require ceph.conf osd threads tunning.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> ----- Mail original -----
>> >>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>> >>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>> >>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>> >>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>> >>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>> >>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23
>> >>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> >>>>> daemon improve performance from 100k iops to 300k iops
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER <
>> >>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote:
>> >>>>>
>> >>>>>
>> >>>>> I wonder if it could be numa related,
>> >>>>>
>> >>>>> I'm using centos 7.1,
>> >>>>> and auto numa balacning is enabled
>> >>>>>
>> >>>>> cat /proc/sys/kernel/numa_balancing = 1
>> >>>>>
>> >>>>> Maybe osd daemon access to buffer on wrong numa node.
>> >>>>>
>> >>>>> I'll try to reproduce the problem
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Can you force the degenerate case using numactl? To either affirm or
>> >>>>> deny your suspicion.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> ----- Mail original -----
>> >>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>> >>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org
>> >>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" <
>> >>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>> >>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05
>> >>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon
>> >>>>> improve performance from 100k iops to 300k iops
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> I was doing some benchmarks,
>> >>>>> I have found an strange behaviour.
>> >>>>>
>> >>>>> Using fio with rbd engine, I was able to reach around 100k iops.
>> >>>>> (osd datas in linux buffer, iostat show 0% disk access)
>> >>>>>
>> >>>>> then after restarting all osd daemons,
>> >>>>>
>> >>>>> the same fio benchmark show now around 300k iops.
>> >>>>> (osd datas in linux buffer, iostat show 0% disk access)
>> >>>>>
>> >>>>>
>> >>>>> any ideas?
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> before restarting osd
>> >>>>> ---------------------
>> >>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>> >>>>> ioengine=rbd, iodepth=32 ...
>> >>>>> fio-2.2.7-10-g51e9
>> >>>>> Starting 10 processes
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0
>> >>>>> iops] [eta 14m:45s]
>> >>>>> fio: terminating on signal 2
>> >>>>>
>> >>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr
>> >>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt=
>> >>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat
>> >>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
>> >>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
>> >>>>> (usec):
>> >>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
>> >>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
>> >>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
>> >>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
>> >>>>> | 99.99th=[47360]
>> >>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
>> >>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>> >>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
>> >>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01%
>> >>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
>> >>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
>> >>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>> >>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
>> >>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
>> >>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%,
>> >>>>> depth=32
>> >>>>>
>> >>>>> Run status group 0 (all jobs):
>> >>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s,
>> >>>>> mint=26215msec, maxt=26215msec
>> >>>>>
>> >>>>> Disk stats (read/write):
>> >>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
>> >>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>> >>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>> >>>>> ioengine=rbd, iodepth=32
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> AFTER RESTARTING OSDS
>> >>>>> ----------------------
>> >>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>> >>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>> >>>>> ioengine=rbd, iodepth=32 ...
>> >>>>> fio-2.2.7-10-g51e9
>> >>>>> Starting 10 processes
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> rbd engine: RBD version: 0.1.9
>> >>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0
>> >>>>> iops] [eta 01h:09m:27s]
>> >>>>> fio: terminating on signal 2
>> >>>>>
>> >>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr
>> >>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218,
>> >>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35
>> >>>>> clat
>> >>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
>> >>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
>> >>>>> (usec):
>> >>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
>> >>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
>> >>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
>> >>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
>> >>>>> | 99.99th=[436224]
>> >>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
>> >>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>> >>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec)
>> >>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
>> >>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
>> >>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
>> >>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
>> >>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
>> >>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
>> >>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
>> >>>>> target=0, window=0, percentile=100.00%, depth=32
>> >>>>>
>> >>>>> Run status group 0 (all jobs):
>> >>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s,
>> >>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec
>> >>>>>
>> >>>>> Disk stats (read/write):
>> >>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> CEPH LOG
>> >>>>> --------
>> >>>>>
>> >>>>> before restarting osd
>> >>>>> ----------------------
>> >>>>>
>> >>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
>> >>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s
>> >>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
>> >>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s
>> >>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
>> >>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s
>> >>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
>> >>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s
>> >>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
>> >>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s
>> >>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
>> >>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s
>> >>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
>> >>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s
>> >>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
>> >>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s
>> >>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
>> >>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s
>> >>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
>> >>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s
>> >>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
>> >>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s
>> >>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
>> >>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s
>> >>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
>> >>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s
>> >>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
>> >>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s
>> >>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
>> >>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s
>> >>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
>> >>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s
>> >>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
>> >>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s
>> >>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
>> >>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s
>> >>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
>> >>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s
>> >>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
>> >>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>> >>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s
>> >>>>>
>> >>>>>
>> >>>>> restarting osd
>> >>>>> ---------------
>> >>>>>
>> >>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
>> >>>>> [INF] osd.0 marked itself down
>> >>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
>> >>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in
>> >>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
>> >>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
>> >>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>> >>>>> stale+active+794
>> >>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516
>> >>>>> kB/s rd, 130 op/s
>> >>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
>> >>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in
>> >>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
>> >>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
>> >>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>> >>>>> stale+active+794
>> >>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
>> >>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
>> >>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>> >>>>> stale+active+794
>> >>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
>> >>>>> [INF] osd.1 marked itself down
>> >>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
>> >>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in
>> >>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
>> >>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
>> >>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
>> >>>>> stale+active+684
>> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> >>>>> used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
>> >>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in
>> >>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
>> >>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
>> >>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
>> >>>>> stale+active+684
>> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> >>>>> used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
>> >>>>> [INF] osd.2 marked itself down
>> >>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
>> >>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in
>> >>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
>> >>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
>> >>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>> >>>>> stale+active+600
>> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> >>>>> used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
>> >>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in
>> >>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
>> >>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
>> >>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>> >>>>> stale+active+600
>> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> >>>>> used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
>> >>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
>> >>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
>> >>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
>> >>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
>> >>>>> [INF] osd.1 10.7.0.152:6800/14848 boot
>> >>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
>> >>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in
>> >>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
>> >>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
>> >>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>> >>>>> stale+active+600
>> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> >>>>> used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
>> >>>>> [INF] osd.3 marked itself down
>> >>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
>> >>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in
>> >>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
>> >>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
>> >>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped,
>> >>>>> stale+active+506
>> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> >>>>> used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
>> >>>>> [INF] osd.6 marked itself down
>> >>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
>> >>>>> [INF] osd.2 10.7.0.152:6812/15410 boot
>> >>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
>> >>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in
>> >>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
>> >>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
>> >>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>> >>>>> stale+active+398
>> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> >>>>> used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
>> >>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in
>> >>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
>> >>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
>> >>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>> >>>>> stale+active+398
>> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> >>>>> used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
>> >>>>> [INF] osd.3 10.7.0.152:6816/15971 boot
>> >>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
>> >>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in
>> >>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
>> >>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
>> >>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>> >>>>> stale+active+398
>> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> >>>>> used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
>> >>>>> [INF] osd.7 marked itself down
>> >>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
>> >>>>> [INF] osd.0 10.7.0.152:6804/14481 boot
>> >>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
>> >>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in
>> >>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
>> >>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
>> >>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped,
>> >>>>> stale+active+292
>> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>> >>>>> used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
>> >>>>> [INF] osd.6 10.7.0.153:6808/21955 boot
>> >>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
>> >>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in
>> >>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
>> >>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
>> >>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
>> >>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data,
>> >>>>> 420 GB used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
>> >>>>> [INF] osd.8 marked itself down
>> >>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
>> >>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in
>> >>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
>> >>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
>> >>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
>> >>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data,
>> >>>>> 420 GB used, 874 GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
>> >>>>> [INF] osd.7 10.7.0.153:6804/22536 boot
>> >>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
>> >>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in
>> >>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
>> >>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
>> >>>>> stale+active+remapped, 503 stale+active+clean, 4
>> >>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped,
>> >>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874
>> >>>>> GB / 1295 GB avail
>> >>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
>> >>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in
>> >>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
>> >>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
>> >>>>> stale+active+remapped, 503 stale+active+clean, 4
>> >>>>> active+undersized+degraded+remapped, 5 peering, 13 active
>> >>>>>
>> >>>>>
>> >>>>> AFTER OSD RESTART
>> >>>>> ------------------
>> >>>>>
>> >>>>>
>> >>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
>> >>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s
>> >>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
>> >>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s
>> >>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
>> >>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s
>> >>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
>> >>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s
>> >>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
>> >>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s
>> >>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
>> >>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s
>> >>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
>> >>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s
>> >>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
>> >>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s
>> >>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
>> >>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s
>> >>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
>> >>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s
>> >>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
>> >>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s
>> >>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
>> >>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s
>> >>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
>> >>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s
>> >>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
>> >>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s
>> >>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
>> >>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s
>> >>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
>> >>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s
>> >>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
>> >>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
>> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>> >>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> ceph-users mailing list
>> >>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>>
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list
>> >>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>
>> >>>> ________________________________
>> >>>>
>> >>>> PLEASE NOTE: The information contained in this electronic mail
>> >>>> message is intended only for the use of the designated recipient(s)
>> >>>> named above. If the reader of this message is not the intended
>> >>>> recipient, you are hereby notified that you have received this
>> >>>> message in error and that any review, dissemination, distribution, or
>> >>>> copying of this message is strictly prohibited. If you have received
>> >>>> this communication in error, please notify the sender by telephone or
>> >>>> e-mail (as shown above) immediately and destroy any and all copies of
>> >>>> this message in your possession (whether hard copies or
>> >>>> electronically stored copies).
>> >>>>
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list
>> >>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list
>> >>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> ? ?????????, ??????? ???? ???????????
>> >>> ???.: +79229045757
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>> the body of a message to majordomo@vger.kernel.org
>> >>> <mailto:majordomo@vger.kernel.org>
>> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Jvrao
>> ---
>> First they ignore you, then they laugh at you, then they fight you,
>> then you win. - Mahatma Gandhi
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
  2015-04-27 15:11                                                                     ` Alexandre DERUMIER
@ 2015-04-27 16:34                                                                       ` Mark Nelson
       [not found]                                                                         ` <553E652A.2060607-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Mark Nelson @ 2015-04-27 16:34 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-users, ceph-devel, Milosz Tanski



On 04/27/2015 10:11 AM, Alexandre DERUMIER wrote:
>>> Is it possible that you were suffering from the bug during the first
>>> test but once reinstalled you hadn't hit it yet?
>
> yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
> I had patched it, but I think it's not enough.
> I had always this bug in random, but mainly when I have a "lot" of concurrent client (20 -40).
> more client increase - lower iops .
>
>
> Today,I had try to start osd with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M ,
> and now it's working fine in all my benchs.
>
>
>>> That's a pretty major
>>> performance swing.  I'm not sure if we can draw any conclusions about
>>> jemalloc vs tcmalloc until we can figure out what went wrong.
>
>  From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% or 2%).
> Tcmalloc seem to works better, with correct tuning of thread_cache_bytes.
>
>
> But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES correctly.
> Maybe Sommath can tell us ?

Ok, just to make sure that I understand:

tcmalloc un-tuned: ~50k IOPS once bug sets in
tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS
jemalloc un-tuned: ~150k IOPS

Is that correct?  Are there configurations/results I'm missing?

Mark

>
>
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Lundi 27 Avril 2015 16:54:34
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> Hi Alex,
>
> Is it possible that you were suffering from the bug during the first
> test but once reinstalled you hadn't hit it yet? That's a pretty major
> performance swing. I'm not sure if we can draw any conclusions about
> jemalloc vs tcmalloc until we can figure out what went wrong.
>
> Mark
>
> On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote:
>>>> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>>
>> Ok, I really think I have patched tcmalloc wrongly.
>> I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread).
>>
>> So better than jemalloc.
>>
>>
>> ----- Mail original -----
>> De: "aderumier" <aderumier@odiso.com>
>> À: "Mark Nelson" <mnelson@redhat.com>
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>> Envoyé: Lundi 27 Avril 2015 07:01:21
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>>
>> Hi,
>>
>> also another big difference,
>>
>> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc.
>>
>> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>>
>>
>> ----- Mail original -----
>> De: "aderumier" <aderumier@odiso.com>
>> À: "Mark Nelson" <mnelson@redhat.com>
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>> Envoyé: Samedi 25 Avril 2015 06:45:43
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>>
>>>> We haven't done any kind of real testing on jemalloc, so use at your own
>>>> peril. Having said that, we've also been very interested in hearing
>>>> community feedback from folks trying it out, so please feel free to give
>>>> it a shot. :D
>>
>> Some feedback, I have runned bench all the night, no speed regression.
>>
>> And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse)
>>
>> with tcmalloc :
>>
>> 10 fio-rbd jobs = 300k iops
>> 15 fio-rbd jobs = 290k iops
>> 20 fio-rbd jobs = 270k iops
>> 40 fio-rbd jobs = 250k iops
>>
>> (all with up and down values during the fio bench)
>>
>>
>> with jemalloc:
>>
>> 10 fio-rbd jobs = 300k iops
>> 15 fio-rbd jobs = 320k iops
>> 20 fio-rbd jobs = 330k iops
>> 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%)
>>
>> (all with contant values during the fio bench)
>>
>> ----- Mail original -----
>> De: "Mark Nelson" <mnelson@redhat.com>
>> À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com>
>> Envoyé: Vendredi 24 Avril 2015 20:02:15
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>>
>> We haven't done any kind of real testing on jemalloc, so use at your own
>> peril. Having said that, we've also been very interested in hearing
>> community feedback from folks trying it out, so please feel free to give
>> it a shot. :D
>>
>> Mark
>>
>> On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote:
>>> Is jemalloc recommanded in general? Does it also work for firefly?
>>>
>>> Stefan
>>>
>>> Excuse my typo sent from my mobile phone.
>>>
>>> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com
>>> <mailto:aderumier@odiso.com>>:
>>>
>>>> Hi,
>>>>
>>>> I have finished to rebuild ceph with jemalloc,
>>>>
>>>> all seem to working fine.
>>>>
>>>> I got a constant 300k iops for the moment, so no speed regression.
>>>>
>>>> I'll do more long benchmark next week.
>>>>
>>>> Regards,
>>>>
>>>> Alexandre
>>>>
>>>> ----- Mail original -----
>>>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>>
>>>> À: "Somnath Roy" <Somnath.Roy@sandisk.com
>>>> <mailto:Somnath.Roy@sandisk.com>>
>>>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>,
>>>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>,
>>>> "ceph-users" <ceph-users@lists.ceph.com
>>>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel"
>>>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>,
>>>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>>
>>>> Envoyé: Vendredi 24 Avril 2015 13:37:52
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> Hi,Alexandre!
>>>> Do not try to change the parameter vm.min_free_kbytes?
>>>>
>>>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com
>>>> <mailto:Somnath.Roy@sandisk.com> > :
>>>>
>>>>
>>>> Alexandre,
>>>> You can configure with --with-jemalloc or ./do_autogen -J to build
>>>> ceph with jemalloc.
>>>>
>>>> Thanks & Regards
>>>> Somnath
>>>>
>>>> -----Original Message-----
>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre
>>>> DERUMIER
>>>> Sent: Thursday, April 23, 2015 4:56 AM
>>>> To: Mark Nelson
>>>> Cc: ceph-users; ceph-devel; Milosz Tanski
>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>>>> If you have the means to compile the same version of ceph with
>>>>>> jemalloc, I would be very interested to see how it does.
>>>>
>>>> Yes, sure. (I have around 3-4 weeks to do all the benchs)
>>>>
>>>> But I don't know how to do it ?
>>>> I'm running the cluster on centos7.1, maybe it can be easy to patch
>>>> the srpms to rebuild the package with jemalloc.
>>>>
>>>>
>>>>
>>>> ----- Mail original -----
>>>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >
>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >,
>>>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" <
>>>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>> Envoyé: Jeudi 23 Avril 2015 13:33:00
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> Thanks for the testing Alexandre!
>>>>
>>>> If you have the means to compile the same version of ceph with
>>>> jemalloc, I would be very interested to see how it does.
>>>>
>>>> In some ways I'm glad it turned out not to be NUMA. I still suspect we
>>>> will have to deal with it at some point, but perhaps not today. ;)
>>>>
>>>> Mark
>>>>
>>>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote:
>>>>> Maybe it's tcmalloc related
>>>>> I thinked to have patched it correctly, but perf show a lot of
>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>>
>>>>> before osd restart (100k)
>>>>> ------------------
>>>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>> tcmalloc::CentralFreeList::ReleaseToSpans
>>>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper
>>>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2
>>>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans
>>>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd
>>>>> libstdc++.so.6.0.19 [.] std::basic_string<char,
>>>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91%
>>>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd
>>>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.]
>>>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k]
>>>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.]
>>>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k]
>>>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k]
>>>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k]
>>>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57%
>>>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd
>>>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms]
>>>>> [k] __schedule
>>>>>
>>>>>
>>>>> after osd restart (300k iops)
>>>>> ------------------------------
>>>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd
>>>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper
>>>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19
>>>>> [.] std::basic_string<char, std::char_traits<char>,
>>>>> std::allocator<char> >::basic_string 1.34% ceph-osd
>>>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd
>>>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.]
>>>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k]
>>>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.]
>>>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k]
>>>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu
>>>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd
>>>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k]
>>>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65%
>>>>> ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61%
>>>>> ceph-osd ceph-osd [.]
>>>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
>>>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd
>>>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.]
>>>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd
>>>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd
>>>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.]
>>>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>> Envoyé: Jeudi 23 Avril 2015 10:00:34
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> Hi,
>>>>> I'm hitting this bug again today.
>>>>>
>>>>> So don't seem to be numa related (I have try to flush linux buffer to
>>>>> be sure).
>>>>>
>>>>> and tcmalloc is patched (I don't known how to verify that it's ok).
>>>>>
>>>>> I don't have restarted osd yet.
>>>>>
>>>>> Maybe some perf trace could be usefulll ?
>>>>>
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>> Envoyé: Mercredi 22 Avril 2015 18:30:26
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> Hi,
>>>>>
>>>>>>> I feel it is due to tcmalloc issue
>>>>>
>>>>> Indeed, I had patched one of my node, but not the other.
>>>>> So maybe I have hit this bug. (but I can't confirm, I don't have
>>>>> traces).
>>>>>
>>>>> But numa interleaving seem to help in my case (maybe not from
>>>>> 100->300k, but 250k->300k).
>>>>>
>>>>> I need to do more long tests to confirm that.
>>>>>
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >,
>>>>> "aderumier"
>>>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski"
>>>>> < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>> Envoyé: Mercredi 22 Avril 2015 16:34:33
>>>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> I feel it is due to tcmalloc issue
>>>>>
>>>>> I have seen similar issue in my setup after 20 days.
>>>>>
>>>>> Thanks,
>>>>> Srinivas
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf
>>>>> Of Mark Nelson
>>>>> Sent: Wednesday, April 22, 2015 7:31 PM
>>>>> To: Alexandre DERUMIER; Milosz Tanski
>>>>> Cc: ceph-devel; ceph-users
>>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> Hi Alexandre,
>>>>>
>>>>> We should discuss this at the perf meeting today. We knew NUMA node
>>>>> affinity issues were going to crop up sooner or later (and indeed
>>>>> already have in some cases), but this is pretty major. It's probably
>>>>> time to really dig in and figure out how to deal with this.
>>>>>
>>>>> Note: this is one of the reasons I like small nodes with single
>>>>> sockets and fewer OSDs.
>>>>>
>>>>> Mark
>>>>>
>>>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have done a lot of test today, and it seem indeed numa related.
>>>>>>
>>>>>> My numastat was
>>>>>>
>>>>>> # numastat
>>>>>> node0 node1
>>>>>> numa_hit 99075422 153976877
>>>>>> numa_miss 167490965 1493663
>>>>>> numa_foreign 1493663 167491417
>>>>>> interleave_hit 157745 167015
>>>>>> local_node 99049179 153830554
>>>>>> other_node 167517697 1639986
>>>>>>
>>>>>> So, a lot of miss.
>>>>>>
>>>>>> In this case , I can reproduce ios going from 85k to 300k iops, up
>>>>>> and down.
>>>>>>
>>>>>> now setting
>>>>>> echo 0 > /proc/sys/kernel/numa_balancing
>>>>>>
>>>>>> and starting osd daemons with
>>>>>>
>>>>>> numactl --interleave=all /usr/bin/ceph-osd
>>>>>>
>>>>>>
>>>>>> I have a constant 300k iops !
>>>>>>
>>>>>>
>>>>>> I wonder if it could be improve by binding osd daemons to specific
>>>>>> numa node.
>>>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also
>>>>>> require ceph.conf osd threads tunning.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Mail original -----
>>>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23
>>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER <
>>>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote:
>>>>>>
>>>>>>
>>>>>> I wonder if it could be numa related,
>>>>>>
>>>>>> I'm using centos 7.1,
>>>>>> and auto numa balacning is enabled
>>>>>>
>>>>>> cat /proc/sys/kernel/numa_balancing = 1
>>>>>>
>>>>>> Maybe osd daemon access to buffer on wrong numa node.
>>>>>>
>>>>>> I'll try to reproduce the problem
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can you force the degenerate case using numactl? To either affirm or
>>>>>> deny your suspicion.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Mail original -----
>>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >
>>>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" <
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05
>>>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon
>>>>>> improve performance from 100k iops to 300k iops
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was doing some benchmarks,
>>>>>> I have found an strange behaviour.
>>>>>>
>>>>>> Using fio with rbd engine, I was able to reach around 100k iops.
>>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>>
>>>>>> then after restarting all osd daemons,
>>>>>>
>>>>>> the same fio benchmark show now around 300k iops.
>>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>>
>>>>>>
>>>>>> any ideas?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> before restarting osd
>>>>>> ---------------------
>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>> ioengine=rbd, iodepth=32 ...
>>>>>> fio-2.2.7-10-g51e9
>>>>>> Starting 10 processes
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0
>>>>>> iops] [eta 14m:45s]
>>>>>> fio: terminating on signal 2
>>>>>>
>>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr
>>>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt=
>>>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat
>>>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
>>>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
>>>>>> (usec):
>>>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
>>>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
>>>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656],
>>>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728],
>>>>>> | 99.99th=[47360]
>>>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
>>>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>>>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
>>>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01%
>>>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
>>>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
>>>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
>>>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
>>>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%,
>>>>>> depth=32
>>>>>>
>>>>>> Run status group 0 (all jobs):
>>>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s,
>>>>>> mint=26215msec, maxt=26215msec
>>>>>>
>>>>>> Disk stats (read/write):
>>>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
>>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>> ioengine=rbd, iodepth=32
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> AFTER RESTARTING OSDS
>>>>>> ----------------------
>>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>> ioengine=rbd, iodepth=32 ...
>>>>>> fio-2.2.7-10-g51e9
>>>>>> Starting 10 processes
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0
>>>>>> iops] [eta 01h:09m:27s]
>>>>>> fio: terminating on signal 2
>>>>>>
>>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr
>>>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218,
>>>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35
>>>>>> clat
>>>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
>>>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
>>>>>> (usec):
>>>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
>>>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
>>>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720],
>>>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792],
>>>>>> | 99.99th=[436224]
>>>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
>>>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%,
>>>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec)
>>>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
>>>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
>>>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
>>>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
>>>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
>>>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
>>>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
>>>>>> target=0, window=0, percentile=100.00%, depth=32
>>>>>>
>>>>>> Run status group 0 (all jobs):
>>>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s,
>>>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec
>>>>>>
>>>>>> Disk stats (read/write):
>>>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> CEPH LOG
>>>>>> --------
>>>>>>
>>>>>> before restarting osd
>>>>>> ----------------------
>>>>>>
>>>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
>>>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s
>>>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
>>>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s
>>>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
>>>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s
>>>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
>>>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s
>>>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
>>>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s
>>>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
>>>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s
>>>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
>>>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s
>>>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
>>>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s
>>>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
>>>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s
>>>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
>>>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s
>>>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
>>>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s
>>>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
>>>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s
>>>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
>>>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s
>>>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
>>>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s
>>>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
>>>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s
>>>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
>>>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s
>>>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
>>>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s
>>>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
>>>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s
>>>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
>>>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s
>>>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
>>>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB /
>>>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s
>>>>>>
>>>>>>
>>>>>> restarting osd
>>>>>> ---------------
>>>>>>
>>>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
>>>>>> [INF] osd.0 marked itself down
>>>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
>>>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in
>>>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
>>>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
>>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>>>> stale+active+794
>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516
>>>>>> kB/s rd, 130 op/s
>>>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
>>>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in
>>>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
>>>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
>>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>>>> stale+active+794
>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
>>>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
>>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped,
>>>>>> stale+active+794
>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
>>>>>> [INF] osd.1 marked itself down
>>>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
>>>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
>>>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
>>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
>>>>>> stale+active+684
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
>>>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
>>>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
>>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped,
>>>>>> stale+active+684
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
>>>>>> [INF] osd.2 marked itself down
>>>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
>>>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in
>>>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
>>>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
>>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>>>> stale+active+600
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
>>>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in
>>>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
>>>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
>>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>>>> stale+active+600
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
>>>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
>>>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
>>>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2
>>>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
>>>>>> [INF] osd.1 10.7.0.152:6800/14848 boot
>>>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
>>>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
>>>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
>>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped,
>>>>>> stale+active+600
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
>>>>>> [INF] osd.3 marked itself down
>>>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
>>>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in
>>>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
>>>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
>>>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped,
>>>>>> stale+active+506
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
>>>>>> [INF] osd.6 marked itself down
>>>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
>>>>>> [INF] osd.2 10.7.0.152:6812/15410 boot
>>>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
>>>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in
>>>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
>>>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
>>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>>>> stale+active+398
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
>>>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in
>>>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
>>>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
>>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>>>> stale+active+398
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
>>>>>> [INF] osd.3 10.7.0.152:6816/15971 boot
>>>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
>>>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
>>>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
>>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped,
>>>>>> stale+active+398
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
>>>>>> [INF] osd.7 marked itself down
>>>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
>>>>>> [INF] osd.0 10.7.0.152:6804/14481 boot
>>>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
>>>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
>>>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
>>>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped,
>>>>>> stale+active+292
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
>>>>>> [INF] osd.6 10.7.0.153:6808/21955 boot
>>>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
>>>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in
>>>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
>>>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
>>>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
>>>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data,
>>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
>>>>>> [INF] osd.8 marked itself down
>>>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
>>>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
>>>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
>>>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
>>>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data,
>>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
>>>>>> [INF] osd.7 10.7.0.153:6804/22536 boot
>>>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
>>>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in
>>>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
>>>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
>>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped,
>>>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874
>>>>>> GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
>>>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in
>>>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
>>>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
>>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>>> active+undersized+degraded+remapped, 5 peering, 13 active
>>>>>>
>>>>>>
>>>>>> AFTER OSD RESTART
>>>>>> ------------------
>>>>>>
>>>>>>
>>>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
>>>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s
>>>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
>>>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s
>>>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
>>>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s
>>>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
>>>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s
>>>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
>>>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s
>>>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
>>>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s
>>>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
>>>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s
>>>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
>>>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s
>>>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
>>>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s
>>>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
>>>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s
>>>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
>>>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s
>>>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
>>>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s
>>>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
>>>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s
>>>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
>>>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s
>>>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
>>>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s
>>>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
>>>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s
>>>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
>>>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB /
>>>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>> ________________________________
>>>>>
>>>>> PLEASE NOTE: The information contained in this electronic mail
>>>>> message is intended only for the use of the designated recipient(s)
>>>>> named above. If the reader of this message is not the intended
>>>>> recipient, you are hereby notified that you have received this
>>>>> message in error and that any review, dissemination, distribution, or
>>>>> copying of this message is strictly prohibited. If you have received
>>>>> this communication in error, please notify the sender by telephone or
>>>>> e-mail (as shown above) immediately and destroy any and all copies of
>>>>> this message in your possession (whether hard copies or
>>>>> electronically stored copies).
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> С уважением, Фасихов Ирек Нургаязович
>>>> Моб.: +79229045757
>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> <mailto:majordomo@vger.kernel.org>
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                                                         ` <553E652A.2060607-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-04-27 16:45                                                                           ` Alexandre DERUMIER
       [not found]                                                                             ` <205623974.712193182.1430153140877.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Alexandre DERUMIER @ 2015-04-27 16:45 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-users, ceph-devel, Milosz Tanski

Ok, just to make sure that I understand:

>>tcmalloc un-tuned: ~50k IOPS once bug sets in
yes, it's really random, but when hitting the bug, yes this is the worste I have seen.


>>tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS
yes
>>jemalloc un-tuned: ~150k IOPS
It's more around 185k iops  (a little bit less than tcmalloc, with a little bit more cpu usage)




----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Lundi 27 Avril 2015 18:34:50
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

On 04/27/2015 10:11 AM, Alexandre DERUMIER wrote: 
>>> Is it possible that you were suffering from the bug during the first 
>>> test but once reinstalled you hadn't hit it yet? 
> 
> yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning. 
> I had patched it, but I think it's not enough. 
> I had always this bug in random, but mainly when I have a "lot" of concurrent client (20 -40). 
> more client increase - lower iops . 
> 
> 
> Today,I had try to start osd with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M , 
> and now it's working fine in all my benchs. 
> 
> 
>>> That's a pretty major 
>>> performance swing. I'm not sure if we can draw any conclusions about 
>>> jemalloc vs tcmalloc until we can figure out what went wrong. 
> 
> From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% or 2%). 
> Tcmalloc seem to works better, with correct tuning of thread_cache_bytes. 
> 
> 
> But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES correctly. 
> Maybe Sommath can tell us ? 

Ok, just to make sure that I understand: 

tcmalloc un-tuned: ~50k IOPS once bug sets in 
tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS 
jemalloc un-tuned: ~150k IOPS 

Is that correct? Are there configurations/results I'm missing? 

Mark 

> 
> 
> ----- Mail original ----- 
> De: "Mark Nelson" <mnelson@redhat.com> 
> À: "aderumier" <aderumier@odiso.com> 
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com> 
> Envoyé: Lundi 27 Avril 2015 16:54:34 
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
> 
> Hi Alex, 
> 
> Is it possible that you were suffering from the bug during the first 
> test but once reinstalled you hadn't hit it yet? That's a pretty major 
> performance swing. I'm not sure if we can draw any conclusions about 
> jemalloc vs tcmalloc until we can figure out what went wrong. 
> 
> Mark 
> 
> On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote: 
>>>> I'll retest tcmalloc, because I was prety sure to have patched it correctly. 
>> 
>> Ok, I really think I have patched tcmalloc wrongly. 
>> I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread). 
>> 
>> So better than jemalloc. 
>> 
>> 
>> ----- Mail original ----- 
>> De: "aderumier" <aderumier@odiso.com> 
>> À: "Mark Nelson" <mnelson@redhat.com> 
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com> 
>> Envoyé: Lundi 27 Avril 2015 07:01:21 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
>> 
>> Hi, 
>> 
>> also another big difference, 
>> 
>> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc. 
>> 
>> I'll retest tcmalloc, because I was prety sure to have patched it correctly. 
>> 
>> 
>> ----- Mail original ----- 
>> De: "aderumier" <aderumier@odiso.com> 
>> À: "Mark Nelson" <mnelson@redhat.com> 
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com> 
>> Envoyé: Samedi 25 Avril 2015 06:45:43 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
>> 
>>>> We haven't done any kind of real testing on jemalloc, so use at your own 
>>>> peril. Having said that, we've also been very interested in hearing 
>>>> community feedback from folks trying it out, so please feel free to give 
>>>> it a shot. :D 
>> 
>> Some feedback, I have runned bench all the night, no speed regression. 
>> 
>> And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse) 
>> 
>> with tcmalloc : 
>> 
>> 10 fio-rbd jobs = 300k iops 
>> 15 fio-rbd jobs = 290k iops 
>> 20 fio-rbd jobs = 270k iops 
>> 40 fio-rbd jobs = 250k iops 
>> 
>> (all with up and down values during the fio bench) 
>> 
>> 
>> with jemalloc: 
>> 
>> 10 fio-rbd jobs = 300k iops 
>> 15 fio-rbd jobs = 320k iops 
>> 20 fio-rbd jobs = 330k iops 
>> 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%) 
>> 
>> (all with contant values during the fio bench) 
>> 
>> ----- Mail original ----- 
>> De: "Mark Nelson" <mnelson@redhat.com> 
>> À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com> 
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com> 
>> Envoyé: Vendredi 24 Avril 2015 20:02:15 
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops 
>> 
>> We haven't done any kind of real testing on jemalloc, so use at your own 
>> peril. Having said that, we've also been very interested in hearing 
>> community feedback from folks trying it out, so please feel free to give 
>> it a shot. :D 
>> 
>> Mark 
>> 
>> On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote: 
>>> Is jemalloc recommanded in general? Does it also work for firefly? 
>>> 
>>> Stefan 
>>> 
>>> Excuse my typo sent from my mobile phone. 
>>> 
>>> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER <aderumier@odiso.com 
>>> <mailto:aderumier@odiso.com>>: 
>>> 
>>>> Hi, 
>>>> 
>>>> I have finished to rebuild ceph with jemalloc, 
>>>> 
>>>> all seem to working fine. 
>>>> 
>>>> I got a constant 300k iops for the moment, so no speed regression. 
>>>> 
>>>> I'll do more long benchmark next week. 
>>>> 
>>>> Regards, 
>>>> 
>>>> Alexandre 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>> 
>>>> À: "Somnath Roy" <Somnath.Roy@sandisk.com 
>>>> <mailto:Somnath.Roy@sandisk.com>> 
>>>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>, 
>>>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>, 
>>>> "ceph-users" <ceph-users@lists.ceph.com 
>>>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel" 
>>>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>, 
>>>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>> 
>>>> Envoyé: Vendredi 24 Avril 2015 13:37:52 
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> Hi,Alexandre! 
>>>> Do not try to change the parameter vm.min_free_kbytes? 
>>>> 
>>>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com 
>>>> <mailto:Somnath.Roy@sandisk.com> > : 
>>>> 
>>>> 
>>>> Alexandre, 
>>>> You can configure with --with-jemalloc or ./do_autogen -J to build 
>>>> ceph with jemalloc. 
>>>> 
>>>> Thanks & Regards 
>>>> Somnath 
>>>> 
>>>> -----Original Message----- 
>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre 
>>>> DERUMIER 
>>>> Sent: Thursday, April 23, 2015 4:56 AM 
>>>> To: Mark Nelson 
>>>> Cc: ceph-users; ceph-devel; Milosz Tanski 
>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>>>> If you have the means to compile the same version of ceph with 
>>>>>> jemalloc, I would be very interested to see how it does. 
>>>> 
>>>> Yes, sure. (I have around 3-4 weeks to do all the benchs) 
>>>> 
>>>> But I don't know how to do it ? 
>>>> I'm running the cluster on centos7.1, maybe it can be easy to patch 
>>>> the srpms to rebuild the package with jemalloc. 
>>>> 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> > 
>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> >, 
>>>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" < 
>>>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>> Envoyé: Jeudi 23 Avril 2015 13:33:00 
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>> daemon improve performance from 100k iops to 300k iops 
>>>> 
>>>> Thanks for the testing Alexandre! 
>>>> 
>>>> If you have the means to compile the same version of ceph with 
>>>> jemalloc, I would be very interested to see how it does. 
>>>> 
>>>> In some ways I'm glad it turned out not to be NUMA. I still suspect we 
>>>> will have to deal with it at some point, but perhaps not today. ;) 
>>>> 
>>>> Mark 
>>>> 
>>>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: 
>>>>> Maybe it's tcmalloc related 
>>>>> I thinked to have patched it correctly, but perf show a lot of 
>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>>>> 
>>>>> before osd restart (100k) 
>>>>> ------------------ 
>>>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache 
>>>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>>> tcmalloc::CentralFreeList::ReleaseToSpans 
>>>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper 
>>>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 
>>>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans 
>>>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd 
>>>>> libstdc++.so.6.0.19 [.] std::basic_string<char, 
>>>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91% 
>>>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd 
>>>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] 
>>>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] 
>>>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] 
>>>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] 
>>>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] 
>>>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] 
>>>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57% 
>>>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd 
>>>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] 
>>>>> [k] __schedule 
>>>>> 
>>>>> 
>>>>> after osd restart (300k iops) 
>>>>> ------------------------------ 
>>>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd 
>>>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper 
>>>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 
>>>>> [.] std::basic_string<char, std::char_traits<char>, 
>>>>> std::allocator<char> >::basic_string 1.34% ceph-osd 
>>>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 
>>>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd 
>>>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] 
>>>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] 
>>>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] 
>>>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] 
>>>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu 
>>>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd 
>>>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] 
>>>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% 
>>>>> ceph-osd libtcmalloc.so.4.1.2 [.] 
>>>>> tcmalloc::CentralFreeList::FetchFromSpans 
>>>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% 
>>>>> ceph-osd ceph-osd [.] 
>>>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 
>>>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd 
>>>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.] 
>>>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd 
>>>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd 
>>>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] 
>>>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light 
>>>>> 
>>>>> ----- Mail original ----- 
>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>>> Envoyé: Jeudi 23 Avril 2015 10:00:34 
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>>> daemon improve performance from 100k iops to 300k iops 
>>>>> 
>>>>> Hi, 
>>>>> I'm hitting this bug again today. 
>>>>> 
>>>>> So don't seem to be numa related (I have try to flush linux buffer to 
>>>>> be sure). 
>>>>> 
>>>>> and tcmalloc is patched (I don't known how to verify that it's ok). 
>>>>> 
>>>>> I don't have restarted osd yet. 
>>>>> 
>>>>> Maybe some perf trace could be usefulll ? 
>>>>> 
>>>>> 
>>>>> ----- Mail original ----- 
>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com 
>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" 
>>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >, 
>>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>>> Envoyé: Mercredi 22 Avril 2015 18:30:26 
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>>> daemon improve performance from 100k iops to 300k iops 
>>>>> 
>>>>> Hi, 
>>>>> 
>>>>>>> I feel it is due to tcmalloc issue 
>>>>> 
>>>>> Indeed, I had patched one of my node, but not the other. 
>>>>> So maybe I have hit this bug. (but I can't confirm, I don't have 
>>>>> traces). 
>>>>> 
>>>>> But numa interleaving seem to help in my case (maybe not from 
>>>>> 100->300k, but 250k->300k). 
>>>>> 
>>>>> I need to do more long tests to confirm that. 
>>>>> 
>>>>> 
>>>>> ----- Mail original ----- 
>>>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com 
>>>>> <mailto:Srinivasula.Maram@sandisk.com> > 
>>>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com> >, 
>>>>> "aderumier" 
>>>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski" 
>>>>> < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>>> Envoyé: Mercredi 22 Avril 2015 16:34:33 
>>>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd 
>>>>> daemon improve performance from 100k iops to 300k iops 
>>>>> 
>>>>> I feel it is due to tcmalloc issue 
>>>>> 
>>>>> I have seen similar issue in my setup after 20 days. 
>>>>> 
>>>>> Thanks, 
>>>>> Srinivas 
>>>>> 
>>>>> 
>>>>> 
>>>>> -----Original Message----- 
>>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com 
>>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf 
>>>>> Of Mark Nelson 
>>>>> Sent: Wednesday, April 22, 2015 7:31 PM 
>>>>> To: Alexandre DERUMIER; Milosz Tanski 
>>>>> Cc: ceph-devel; ceph-users 
>>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>>> daemon improve performance from 100k iops to 300k iops 
>>>>> 
>>>>> Hi Alexandre, 
>>>>> 
>>>>> We should discuss this at the perf meeting today. We knew NUMA node 
>>>>> affinity issues were going to crop up sooner or later (and indeed 
>>>>> already have in some cases), but this is pretty major. It's probably 
>>>>> time to really dig in and figure out how to deal with this. 
>>>>> 
>>>>> Note: this is one of the reasons I like small nodes with single 
>>>>> sockets and fewer OSDs. 
>>>>> 
>>>>> Mark 
>>>>> 
>>>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: 
>>>>>> Hi, 
>>>>>> 
>>>>>> I have done a lot of test today, and it seem indeed numa related. 
>>>>>> 
>>>>>> My numastat was 
>>>>>> 
>>>>>> # numastat 
>>>>>> node0 node1 
>>>>>> numa_hit 99075422 153976877 
>>>>>> numa_miss 167490965 1493663 
>>>>>> numa_foreign 1493663 167491417 
>>>>>> interleave_hit 157745 167015 
>>>>>> local_node 99049179 153830554 
>>>>>> other_node 167517697 1639986 
>>>>>> 
>>>>>> So, a lot of miss. 
>>>>>> 
>>>>>> In this case , I can reproduce ios going from 85k to 300k iops, up 
>>>>>> and down. 
>>>>>> 
>>>>>> now setting 
>>>>>> echo 0 > /proc/sys/kernel/numa_balancing 
>>>>>> 
>>>>>> and starting osd daemons with 
>>>>>> 
>>>>>> numactl --interleave=all /usr/bin/ceph-osd 
>>>>>> 
>>>>>> 
>>>>>> I have a constant 300k iops ! 
>>>>>> 
>>>>>> 
>>>>>> I wonder if it could be improve by binding osd daemons to specific 
>>>>>> numa node. 
>>>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also 
>>>>>> require ceph.conf osd threads tunning. 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ----- Mail original ----- 
>>>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> > 
>>>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" 
>>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23 
>>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd 
>>>>>> daemon improve performance from 100k iops to 300k iops 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < 
>>>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote: 
>>>>>> 
>>>>>> 
>>>>>> I wonder if it could be numa related, 
>>>>>> 
>>>>>> I'm using centos 7.1, 
>>>>>> and auto numa balacning is enabled 
>>>>>> 
>>>>>> cat /proc/sys/kernel/numa_balancing = 1 
>>>>>> 
>>>>>> Maybe osd daemon access to buffer on wrong numa node. 
>>>>>> 
>>>>>> I'll try to reproduce the problem 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Can you force the degenerate case using numactl? To either affirm or 
>>>>>> deny your suspicion. 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ----- Mail original ----- 
>>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com> > 
>>>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org 
>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" < 
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > 
>>>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05 
>>>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon 
>>>>>> improve performance from 100k iops to 300k iops 
>>>>>> 
>>>>>> Hi, 
>>>>>> 
>>>>>> I was doing some benchmarks, 
>>>>>> I have found an strange behaviour. 
>>>>>> 
>>>>>> Using fio with rbd engine, I was able to reach around 100k iops. 
>>>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>>>> 
>>>>>> then after restarting all osd daemons, 
>>>>>> 
>>>>>> the same fio benchmark show now around 300k iops. 
>>>>>> (osd datas in linux buffer, iostat show 0% disk access) 
>>>>>> 
>>>>>> 
>>>>>> any ideas? 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> before restarting osd 
>>>>>> --------------------- 
>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>>>> ioengine=rbd, iodepth=32 ... 
>>>>>> fio-2.2.7-10-g51e9 
>>>>>> Starting 10 processes 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 
>>>>>> iops] [eta 14m:45s] 
>>>>>> fio: terminating on signal 2 
>>>>>> 
>>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr 
>>>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= 
>>>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat 
>>>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): 
>>>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles 
>>>>>> (usec): 
>>>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], 
>>>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], 
>>>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], 
>>>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], 
>>>>>> | 99.99th=[47360] 
>>>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, 
>>>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : 
>>>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% 
>>>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : 
>>>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : 
>>>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>>>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, 
>>>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, 
>>>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, 
>>>>>> depth=32 
>>>>>> 
>>>>>> Run status group 0 (all jobs): 
>>>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, 
>>>>>> mint=26215msec, maxt=26215msec 
>>>>>> 
>>>>>> Disk stats (read/write): 
>>>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% 
>>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>>>> ioengine=rbd, iodepth=32 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> AFTER RESTARTING OSDS 
>>>>>> ---------------------- 
>>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd 
>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>>>> ioengine=rbd, iodepth=32 ... 
>>>>>> fio-2.2.7-10-g51e9 
>>>>>> Starting 10 processes 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> rbd engine: RBD version: 0.1.9 
>>>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 
>>>>>> iops] [eta 01h:09m:27s] 
>>>>>> fio: terminating on signal 2 
>>>>>> 
>>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr 
>>>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, 
>>>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 
>>>>>> clat 
>>>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): 
>>>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles 
>>>>>> (usec): 
>>>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], 
>>>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], 
>>>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], 
>>>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], 
>>>>>> | 99.99th=[436224] 
>>>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, 
>>>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 
>>>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) 
>>>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : 
>>>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, 
>>>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, 
>>>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 
>>>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 
>>>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : 
>>>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : 
>>>>>> target=0, window=0, percentile=100.00%, depth=32 
>>>>>> 
>>>>>> Run status group 0 (all jobs): 
>>>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, 
>>>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec 
>>>>>> 
>>>>>> Disk stats (read/write): 
>>>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> CEPH LOG 
>>>>>> -------- 
>>>>>> 
>>>>>> before restarting osd 
>>>>>> ---------------------- 
>>>>>> 
>>>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster 
>>>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s 
>>>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster 
>>>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s 
>>>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster 
>>>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s 
>>>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster 
>>>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s 
>>>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster 
>>>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s 
>>>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster 
>>>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s 
>>>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster 
>>>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s 
>>>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster 
>>>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s 
>>>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster 
>>>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s 
>>>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster 
>>>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s 
>>>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster 
>>>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s 
>>>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster 
>>>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s 
>>>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster 
>>>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s 
>>>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster 
>>>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s 
>>>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster 
>>>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s 
>>>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster 
>>>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s 
>>>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster 
>>>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s 
>>>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster 
>>>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s 
>>>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster 
>>>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s 
>>>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster 
>>>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / 
>>>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s 
>>>>>> 
>>>>>> 
>>>>>> restarting osd 
>>>>>> --------------- 
>>>>>> 
>>>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster 
>>>>>> [INF] osd.0 marked itself down 
>>>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster 
>>>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in 
>>>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster 
>>>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 
>>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>>>> stale+active+794 
>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 
>>>>>> kB/s rd, 130 op/s 
>>>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster 
>>>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in 
>>>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster 
>>>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 
>>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>>>> stale+active+794 
>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster 
>>>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 
>>>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, 
>>>>>> stale+active+794 
>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster 
>>>>>> [INF] osd.1 marked itself down 
>>>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster 
>>>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in 
>>>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster 
>>>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 
>>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>>>> stale+active+684 
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>>> used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster 
>>>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in 
>>>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster 
>>>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 
>>>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, 
>>>>>> stale+active+684 
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>>> used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster 
>>>>>> [INF] osd.2 marked itself down 
>>>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster 
>>>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in 
>>>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster 
>>>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 
>>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>>>> stale+active+600 
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>>> used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster 
>>>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in 
>>>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster 
>>>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 
>>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>>>> stale+active+600 
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>>> used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster 
>>>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck 
>>>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs 
>>>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 
>>>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster 
>>>>>> [INF] osd.1 10.7.0.152:6800/14848 boot 
>>>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster 
>>>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in 
>>>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster 
>>>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 
>>>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, 
>>>>>> stale+active+600 
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>>> used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster 
>>>>>> [INF] osd.3 marked itself down 
>>>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster 
>>>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in 
>>>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster 
>>>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 
>>>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, 
>>>>>> stale+active+506 
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>>> used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster 
>>>>>> [INF] osd.6 marked itself down 
>>>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster 
>>>>>> [INF] osd.2 10.7.0.152:6812/15410 boot 
>>>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster 
>>>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in 
>>>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster 
>>>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 
>>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>>>> stale+active+398 
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>>> used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster 
>>>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in 
>>>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster 
>>>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 
>>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>>>> stale+active+398 
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>>> used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster 
>>>>>> [INF] osd.3 10.7.0.152:6816/15971 boot 
>>>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster 
>>>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in 
>>>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster 
>>>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 
>>>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, 
>>>>>> stale+active+398 
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>>> used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster 
>>>>>> [INF] osd.7 marked itself down 
>>>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster 
>>>>>> [INF] osd.0 10.7.0.152:6804/14481 boot 
>>>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster 
>>>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in 
>>>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster 
>>>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 
>>>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, 
>>>>>> stale+active+292 
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB 
>>>>>> used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster 
>>>>>> [INF] osd.6 10.7.0.153:6808/21955 boot 
>>>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster 
>>>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in 
>>>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster 
>>>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 
>>>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 
>>>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, 
>>>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster 
>>>>>> [INF] osd.8 marked itself down 
>>>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster 
>>>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in 
>>>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster 
>>>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 
>>>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 
>>>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, 
>>>>>> 420 GB used, 874 GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster 
>>>>>> [INF] osd.7 10.7.0.153:6804/22536 boot 
>>>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster 
>>>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in 
>>>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster 
>>>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 
>>>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, 
>>>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 
>>>>>> GB / 1295 GB avail 
>>>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster 
>>>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in 
>>>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster 
>>>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 
>>>>>> stale+active+remapped, 503 stale+active+clean, 4 
>>>>>> active+undersized+degraded+remapped, 5 peering, 13 active 
>>>>>> 
>>>>>> 
>>>>>> AFTER OSD RESTART 
>>>>>> ------------------ 
>>>>>> 
>>>>>> 
>>>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster 
>>>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s 
>>>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster 
>>>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s 
>>>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster 
>>>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s 
>>>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster 
>>>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s 
>>>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster 
>>>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s 
>>>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster 
>>>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s 
>>>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster 
>>>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s 
>>>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster 
>>>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s 
>>>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster 
>>>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s 
>>>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster 
>>>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s 
>>>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster 
>>>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s 
>>>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster 
>>>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s 
>>>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster 
>>>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s 
>>>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster 
>>>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s 
>>>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster 
>>>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s 
>>>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster 
>>>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s 
>>>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster 
>>>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / 
>>>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s 
>>>>>> 
>>>>>> _______________________________________________ 
>>>>>> ceph-users mailing list 
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>>> 
>>>>> _______________________________________________ 
>>>>> ceph-users mailing list 
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>> 
>>>>> ________________________________ 
>>>>> 
>>>>> PLEASE NOTE: The information contained in this electronic mail 
>>>>> message is intended only for the use of the designated recipient(s) 
>>>>> named above. If the reader of this message is not the intended 
>>>>> recipient, you are hereby notified that you have received this 
>>>>> message in error and that any review, dissemination, distribution, or 
>>>>> copying of this message is strictly prohibited. If you have received 
>>>>> this communication in error, please notify the sender by telephone or 
>>>>> e-mail (as shown above) immediately and destroy any and all copies of 
>>>>> this message in your possession (whether hard copies or 
>>>>> electronically stored copies). 
>>>>> 
>>>>> _______________________________________________ 
>>>>> ceph-users mailing list 
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>> 
>>>>> _______________________________________________ 
>>>>> ceph-users mailing list 
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> С уважением, Фасихов Ирек Нургаязович 
>>>> Моб.: +79229045757 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>>>> the body of a message to majordomo@vger.kernel.org 
>>>> <mailto:majordomo@vger.kernel.org> 
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
>> 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> 
>> 
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>> the body of a message to majordomo@vger.kernel.org 
>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
>> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                                                             ` <205623974.712193182.1430153140877.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2015-04-27 17:25                                                                               ` Somnath Roy
       [not found]                                                                                 ` <755F6B91B3BE364F9BCA11EA3F9E0C6F2CD8214C-cXZ6iGhjG0il5HHZYNR2WTJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Somnath Roy @ 2015-04-27 17:25 UTC (permalink / raw)
  To: Alexandre DERUMIER, Mark Nelson; +Cc: ceph-users, ceph-devel, Milosz Tanski

Alexandre,
The moment you restarted after hitting the tcmalloc trace, irrespective of what value you set as thread cache, it will perform better and that's what happening in your case I guess.
Yes, setting this value kind of tricky and very much dependent on your setup/workload etc.
I would suggest to set it ~128M and run your test longer say ~10 hours or so.

Thanks & Regards
Somnath
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Alexandre DERUMIER
Sent: Monday, April 27, 2015 9:46 AM
To: Mark Nelson
Cc: ceph-users; ceph-devel; Milosz Tanski
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Ok, just to make sure that I understand:

>>tcmalloc un-tuned: ~50k IOPS once bug sets in
yes, it's really random, but when hitting the bug, yes this is the worste I have seen.


>>tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS
yes
>>jemalloc un-tuned: ~150k IOPS
It's more around 185k iops  (a little bit less than tcmalloc, with a little bit more cpu usage)




----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
Envoyé: Lundi 27 Avril 2015 18:34:50
Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

On 04/27/2015 10:11 AM, Alexandre DERUMIER wrote:
>>> Is it possible that you were suffering from the bug during the first
>>> test but once reinstalled you hadn't hit it yet?
>
> yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
> I had patched it, but I think it's not enough.
> I had always this bug in random, but mainly when I have a "lot" of concurrent client (20 -40).
> more client increase - lower iops .
>
>
> Today,I had try to start osd with
> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M , and now it's working fine in all my benchs.
>
>
>>> That's a pretty major
>>> performance swing. I'm not sure if we can draw any conclusions about
>>> jemalloc vs tcmalloc until we can figure out what went wrong.
>
> From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% or 2%).
> Tcmalloc seem to works better, with correct tuning of thread_cache_bytes.
>
>
> But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES correctly.
> Maybe Sommath can tell us ?

Ok, just to make sure that I understand:

tcmalloc un-tuned: ~50k IOPS once bug sets in tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS jemalloc un-tuned: ~150k IOPS

Is that correct? Are there configurations/results I'm missing?

Mark

>
>
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Lundi 27 Avril 2015 16:54:34
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
> daemon improve performance from 100k iops to 300k iops
>
> Hi Alex,
>
> Is it possible that you were suffering from the bug during the first
> test but once reinstalled you hadn't hit it yet? That's a pretty major
> performance swing. I'm not sure if we can draw any conclusions about
> jemalloc vs tcmalloc until we can figure out what went wrong.
>
> Mark
>
> On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote:
>>>> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>>
>> Ok, I really think I have patched tcmalloc wrongly.
>> I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread).
>>
>> So better than jemalloc.
>>
>>
>> ----- Mail original -----
>> De: "aderumier" <aderumier@odiso.com>
>> À: "Mark Nelson" <mnelson@redhat.com>
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
>> <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>> Envoyé: Lundi 27 Avril 2015 07:01:21
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
>> Hi,
>>
>> also another big difference,
>>
>> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc.
>>
>> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>>
>>
>> ----- Mail original -----
>> De: "aderumier" <aderumier@odiso.com>
>> À: "Mark Nelson" <mnelson@redhat.com>
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
>> <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>> Envoyé: Samedi 25 Avril 2015 06:45:43
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
>>>> We haven't done any kind of real testing on jemalloc, so use at
>>>> your own peril. Having said that, we've also been very interested
>>>> in hearing community feedback from folks trying it out, so please
>>>> feel free to give it a shot. :D
>>
>> Some feedback, I have runned bench all the night, no speed regression.
>>
>> And I have a speed increase with fio with more jobs. (with tcmalloc,
>> it seem to be the reverse)
>>
>> with tcmalloc :
>>
>> 10 fio-rbd jobs = 300k iops
>> 15 fio-rbd jobs = 290k iops
>> 20 fio-rbd jobs = 270k iops
>> 40 fio-rbd jobs = 250k iops
>>
>> (all with up and down values during the fio bench)
>>
>>
>> with jemalloc:
>>
>> 10 fio-rbd jobs = 300k iops
>> 15 fio-rbd jobs = 320k iops
>> 20 fio-rbd jobs = 330k iops
>> 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client
>> machine with 20cores 100%)
>>
>> (all with contant values during the fio bench)
>>
>> ----- Mail original -----
>> De: "Mark Nelson" <mnelson@redhat.com>
>> À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier"
>> <aderumier@odiso.com>
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
>> <ceph-devel@vger.kernel.org>, "Somnath Roy"
>> <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com>
>> Envoyé: Vendredi 24 Avril 2015 20:02:15
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
>> We haven't done any kind of real testing on jemalloc, so use at your
>> own peril. Having said that, we've also been very interested in
>> hearing community feedback from folks trying it out, so please feel
>> free to give it a shot. :D
>>
>> Mark
>>
>> On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote:
>>> Is jemalloc recommanded in general? Does it also work for firefly?
>>>
>>> Stefan
>>>
>>> Excuse my typo sent from my mobile phone.
>>>
>>> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER
>>> <aderumier@odiso.com
>>> <mailto:aderumier@odiso.com>>:
>>>
>>>> Hi,
>>>>
>>>> I have finished to rebuild ceph with jemalloc,
>>>>
>>>> all seem to working fine.
>>>>
>>>> I got a constant 300k iops for the moment, so no speed regression.
>>>>
>>>> I'll do more long benchmark next week.
>>>>
>>>> Regards,
>>>>
>>>> Alexandre
>>>>
>>>> ----- Mail original -----
>>>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>>
>>>> À: "Somnath Roy" <Somnath.Roy@sandisk.com
>>>> <mailto:Somnath.Roy@sandisk.com>>
>>>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>,
>>>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>,
>>>> "ceph-users" <ceph-users@lists.ceph.com
>>>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel"
>>>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>,
>>>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>>
>>>> Envoyé: Vendredi 24 Avril 2015 13:37:52
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> Hi,Alexandre!
>>>> Do not try to change the parameter vm.min_free_kbytes?
>>>>
>>>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com
>>>> <mailto:Somnath.Roy@sandisk.com> > :
>>>>
>>>>
>>>> Alexandre,
>>>> You can configure with --with-jemalloc or ./do_autogen -J to build
>>>> ceph with jemalloc.
>>>>
>>>> Thanks & Regards
>>>> Somnath
>>>>
>>>> -----Original Message-----
>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre
>>>> DERUMIER
>>>> Sent: Thursday, April 23, 2015 4:56 AM
>>>> To: Mark Nelson
>>>> Cc: ceph-users; ceph-devel; Milosz Tanski
>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting
>>>> osd daemon improve performance from 100k iops to 300k iops
>>>>
>>>>>> If you have the means to compile the same version of ceph with
>>>>>> jemalloc, I would be very interested to see how it does.
>>>>
>>>> Yes, sure. (I have around 3-4 weeks to do all the benchs)
>>>>
>>>> But I don't know how to do it ?
>>>> I'm running the cluster on centos7.1, maybe it can be easy to patch
>>>> the srpms to rebuild the package with jemalloc.
>>>>
>>>>
>>>>
>>>> ----- Mail original -----
>>>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com>
>>>> >
>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>> >, "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" <
>>>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>> Envoyé: Jeudi 23 Avril 2015 13:33:00
>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>> daemon improve performance from 100k iops to 300k iops
>>>>
>>>> Thanks for the testing Alexandre!
>>>>
>>>> If you have the means to compile the same version of ceph with
>>>> jemalloc, I would be very interested to see how it does.
>>>>
>>>> In some ways I'm glad it turned out not to be NUMA. I still suspect
>>>> we will have to deal with it at some point, but perhaps not today.
>>>> ;)
>>>>
>>>> Mark
>>>>
>>>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote:
>>>>> Maybe it's tcmalloc related
>>>>> I thinked to have patched it correctly, but perf show a lot of
>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>>
>>>>> before osd restart (100k)
>>>>> ------------------
>>>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>> tcmalloc::CentralFreeList::ReleaseToSpans
>>>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper
>>>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd
>>>>> libtcmalloc.so.4.1.2 [.]
>>>>> tcmalloc::CentralFreeList::ReleaseListToSpans
>>>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07%
>>>>> ceph-osd
>>>>> libstdc++.so.6.0.19 [.] std::basic_string<char,
>>>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91%
>>>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88%
>>>>> ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd
>>>>> ceph-osd [.] Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k]
>>>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so
>>>>> [.] pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k]
>>>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k]
>>>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k]
>>>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57%
>>>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd
>>>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper
>>>>> [kernel.kallsyms] [k] __schedule
>>>>>
>>>>>
>>>>> after osd restart (300k iops)
>>>>> ------------------------------
>>>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92%
>>>>> ceph-osd
>>>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper
>>>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd
>>>>> libstdc++.so.6.0.19 [.] std::basic_string<char,
>>>>> std::char_traits<char>, std::allocator<char> >::basic_string 1.34%
>>>>> ceph-osd
>>>>> libtcmalloc.so.4.1.2 [.]
>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd
>>>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.]
>>>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k]
>>>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so
>>>>> [.] pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k]
>>>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu
>>>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd
>>>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k]
>>>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65%
>>>>> ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61%
>>>>> ceph-osd ceph-osd [.]
>>>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
>>>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd
>>>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd
>>>>> libstdc++[.]
>>>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56%
>>>>> ceph-osd
>>>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55%
>>>>> libstdc++ceph-osd
>>>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.]
>>>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>>> >
>>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>
>>>>> >, "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>> Envoyé: Jeudi 23 Avril 2015 10:00:34
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> Hi,
>>>>> I'm hitting this bug again today.
>>>>>
>>>>> So don't seem to be numa related (I have try to flush linux buffer
>>>>> to be sure).
>>>>>
>>>>> and tcmalloc is patched (I don't known how to verify that it's ok).
>>>>>
>>>>> I don't have restarted osd yet.
>>>>>
>>>>> Maybe some perf trace could be usefulll ?
>>>>>
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>>> >
>>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>
>>>>> >, "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>> Envoyé: Mercredi 22 Avril 2015 18:30:26
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> Hi,
>>>>>
>>>>>>> I feel it is due to tcmalloc issue
>>>>>
>>>>> Indeed, I had patched one of my node, but not the other.
>>>>> So maybe I have hit this bug. (but I can't confirm, I don't have
>>>>> traces).
>>>>>
>>>>> But numa interleaving seem to help in my case (maybe not from
>>>>> 100->300k, but 250k->300k).
>>>>>
>>>>> I need to do more long tests to confirm that.
>>>>>
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com>
>>>>> >, "aderumier"
>>>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski"
>>>>> < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>> Envoyé: Mercredi 22 Avril 2015 16:34:33
>>>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> I feel it is due to tcmalloc issue
>>>>>
>>>>> I have seen similar issue in my setup after 20 days.
>>>>>
>>>>> Thanks,
>>>>> Srinivas
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Mark
>>>>> Nelson
>>>>> Sent: Wednesday, April 22, 2015 7:31 PM
>>>>> To: Alexandre DERUMIER; Milosz Tanski
>>>>> Cc: ceph-devel; ceph-users
>>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting
>>>>> osd daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> Hi Alexandre,
>>>>>
>>>>> We should discuss this at the perf meeting today. We knew NUMA
>>>>> node affinity issues were going to crop up sooner or later (and
>>>>> indeed already have in some cases), but this is pretty major. It's
>>>>> probably time to really dig in and figure out how to deal with this.
>>>>>
>>>>> Note: this is one of the reasons I like small nodes with single
>>>>> sockets and fewer OSDs.
>>>>>
>>>>> Mark
>>>>>
>>>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have done a lot of test today, and it seem indeed numa related.
>>>>>>
>>>>>> My numastat was
>>>>>>
>>>>>> # numastat
>>>>>> node0 node1
>>>>>> numa_hit 99075422 153976877
>>>>>> numa_miss 167490965 1493663
>>>>>> numa_foreign 1493663 167491417
>>>>>> interleave_hit 157745 167015
>>>>>> local_node 99049179 153830554
>>>>>> other_node 167517697 1639986
>>>>>>
>>>>>> So, a lot of miss.
>>>>>>
>>>>>> In this case , I can reproduce ios going from 85k to 300k iops,
>>>>>> up and down.
>>>>>>
>>>>>> now setting
>>>>>> echo 0 > /proc/sys/kernel/numa_balancing
>>>>>>
>>>>>> and starting osd daemons with
>>>>>>
>>>>>> numactl --interleave=all /usr/bin/ceph-osd
>>>>>>
>>>>>>
>>>>>> I have a constant 300k iops !
>>>>>>
>>>>>>
>>>>>> I wonder if it could be improve by binding osd daemons to
>>>>>> specific numa node.
>>>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also
>>>>>> require ceph.conf osd threads tunning.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Mail original -----
>>>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com>
>>>>>> >
>>>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>>>> >
>>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23
>>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting
>>>>>> osd daemon improve performance from 100k iops to 300k iops
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER <
>>>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote:
>>>>>>
>>>>>>
>>>>>> I wonder if it could be numa related,
>>>>>>
>>>>>> I'm using centos 7.1,
>>>>>> and auto numa balacning is enabled
>>>>>>
>>>>>> cat /proc/sys/kernel/numa_balancing = 1
>>>>>>
>>>>>> Maybe osd daemon access to buffer on wrong numa node.
>>>>>>
>>>>>> I'll try to reproduce the problem
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can you force the degenerate case using numactl? To either affirm
>>>>>> or deny your suspicion.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Mail original -----
>>>>>> De: "aderumier" < aderumier@odiso.com
>>>>>> <mailto:aderumier@odiso.com> >
>>>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" <
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05
>>>>>> Objet: [ceph-users] strange benchmark problem : restarting osd
>>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was doing some benchmarks,
>>>>>> I have found an strange behaviour.
>>>>>>
>>>>>> Using fio with rbd engine, I was able to reach around 100k iops.
>>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>>
>>>>>> then after restarting all osd daemons,
>>>>>>
>>>>>> the same fio benchmark show now around 300k iops.
>>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>>
>>>>>>
>>>>>> any ideas?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> before restarting osd
>>>>>> ---------------------
>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>> ioengine=rbd, iodepth=32 ...
>>>>>> fio-2.2.7-10-g51e9
>>>>>> Starting 10 processes
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s]
>>>>>> [96.6K/0/0 iops] [eta 14m:45s]
>>>>>> fio: terminating on signal 2
>>>>>>
>>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed
>>>>>> Apr
>>>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871,
>>>>>> runt= 26215msec slat (usec): min=5, max=3685, avg=16.89,
>>>>>> stdev=17.38 clat
>>>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
>>>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
>>>>>> (usec):
>>>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
>>>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
>>>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024],
>>>>>> | 95.00th=[14656], 99.00th=[25984], 99.50th=[30336],
>>>>>> | 99.90th=[38656], 99.95th=[41728], 99.99th=[47360]
>>>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
>>>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%,
>>>>>> 100=0.01%, 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
>>>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) :
>>>>>> 100=0.01% cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
>>>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
>>>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
>>>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
>>>>>> drop=r=0/w=0/d=0 latency : target=0, window=0,
>>>>>> percentile=100.00%,
>>>>>> depth=32
>>>>>>
>>>>>> Run status group 0 (all jobs):
>>>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s,
>>>>>> maxb=451487KB/s, mint=26215msec, maxt=26215msec
>>>>>>
>>>>>> Disk stats (read/write):
>>>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
>>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>> ioengine=rbd, iodepth=32
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> AFTER RESTARTING OSDS
>>>>>> ----------------------
>>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>> ioengine=rbd, iodepth=32 ...
>>>>>> fio-2.2.7-10-g51e9
>>>>>> Starting 10 processes
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> rbd engine: RBD version: 0.1.9
>>>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s]
>>>>>> [296K/0/0 iops] [eta 01h:09m:27s]
>>>>>> fio: terminating on signal 2
>>>>>>
>>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed
>>>>>> Apr
>>>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218,
>>>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59,
>>>>>> stdev=40.35 clat
>>>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
>>>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
>>>>>> (usec):
>>>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
>>>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
>>>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[
>>>>>> | 1720], 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920],
>>>>>> | 99.95th=[ 9792], 99.99th=[436224]
>>>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
>>>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%,
>>>>>> 100=0.01%, 250=1.23% lat (usec) : 500=25.64%, 750=29.15%,
>>>>>> 1000=18.84% lat (msec)
>>>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
>>>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
>>>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
>>>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
>>>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
>>>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
>>>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
>>>>>> target=0, window=0, percentile=100.00%, depth=32
>>>>>>
>>>>>> Run status group 0 (all jobs):
>>>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s,
>>>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec
>>>>>>
>>>>>> Disk stats (read/write):
>>>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> CEPH LOG
>>>>>> --------
>>>>>>
>>>>>> before restarting osd
>>>>>> ----------------------
>>>>>>
>>>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
>>>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s
>>>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
>>>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s
>>>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
>>>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s
>>>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
>>>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s
>>>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
>>>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s
>>>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
>>>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s
>>>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
>>>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s
>>>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
>>>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s
>>>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
>>>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s
>>>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
>>>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s
>>>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
>>>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s
>>>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
>>>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s
>>>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
>>>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s
>>>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
>>>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s
>>>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
>>>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s
>>>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
>>>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s
>>>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
>>>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s
>>>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
>>>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s
>>>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
>>>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s
>>>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
>>>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>> active+GB /
>>>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s
>>>>>>
>>>>>>
>>>>>> restarting osd
>>>>>> ---------------
>>>>>>
>>>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
>>>>>> [INF] osd.0 marked itself down
>>>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
>>>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in
>>>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
>>>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
>>>>>> stale+active+remapped, 106 stale+active+clean, 54
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+794
>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail;
>>>>>> active+516
>>>>>> kB/s rd, 130 op/s
>>>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
>>>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in
>>>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
>>>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
>>>>>> stale+active+remapped, 106 stale+active+clean, 54
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+794
>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
>>>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
>>>>>> stale+active+remapped, 106 stale+active+clean, 54
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+794
>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
>>>>>> [INF] osd.1 marked itself down
>>>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
>>>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
>>>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
>>>>>> stale+active+remapped, 216 stale+active+clean, 49
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+684
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>> active+420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
>>>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
>>>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
>>>>>> stale+active+remapped, 216 stale+active+clean, 49
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+684
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>> active+420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
>>>>>> [INF] osd.2 marked itself down
>>>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
>>>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in
>>>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
>>>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
>>>>>> stale+active+remapped, 300 stale+active+clean, 40
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+600
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>> active+420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
>>>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in
>>>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
>>>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
>>>>>> stale+active+remapped, 300 stale+active+clean, 40
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+600
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>> active+420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
>>>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
>>>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
>>>>>> undersized; 3/9 in osds are down; clock skew detected on
>>>>>> mon.ceph1-2
>>>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
>>>>>> [INF] osd.1 10.7.0.152:6800/14848 boot
>>>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
>>>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
>>>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
>>>>>> stale+active+remapped, 300 stale+active+clean, 40
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+600
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>> active+420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
>>>>>> [INF] osd.3 marked itself down
>>>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
>>>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in
>>>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
>>>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
>>>>>> stale+active+remapped, 394 stale+active+clean, 37
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+506
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>> active+420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
>>>>>> [INF] osd.6 marked itself down
>>>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
>>>>>> [INF] osd.2 10.7.0.152:6812/15410 boot
>>>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
>>>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in
>>>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
>>>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
>>>>>> stale+active+remapped, 502 stale+active+clean, 32
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+398
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>> active+420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
>>>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in
>>>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
>>>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
>>>>>> stale+active+remapped, 502 stale+active+clean, 32
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+398
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>> active+420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
>>>>>> [INF] osd.3 10.7.0.152:6816/15971 boot
>>>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
>>>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
>>>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
>>>>>> stale+active+remapped, 502 stale+active+clean, 32
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+398
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>> active+420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
>>>>>> [INF] osd.7 marked itself down
>>>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
>>>>>> [INF] osd.0 10.7.0.152:6804/14481 boot
>>>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
>>>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
>>>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
>>>>>> stale+active+remapped, 608 stale+active+clean, 27
>>>>>> stale+active+active+remapped,
>>>>>> stale+active+292
>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>> active+420 GB
>>>>>> used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
>>>>>> [INF] osd.6 10.7.0.153:6808/21955 boot
>>>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
>>>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in
>>>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
>>>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
>>>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
>>>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB
>>>>>> active+data,
>>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
>>>>>> [INF] osd.8 marked itself down
>>>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
>>>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in
>>>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
>>>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
>>>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
>>>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB
>>>>>> active+data,
>>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
>>>>>> [INF] osd.7 10.7.0.153:6804/22536 boot
>>>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
>>>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in
>>>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
>>>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
>>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>>> active+undersized+degraded+remapped, 5 peering, 13
>>>>>> active+undersized+degraded+active+remapped,
>>>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used,
>>>>>> 874 GB / 1295 GB avail
>>>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
>>>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in
>>>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
>>>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
>>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>>> active+undersized+degraded+remapped, 5 peering, 13 active
>>>>>>
>>>>>>
>>>>>> AFTER OSD RESTART
>>>>>> ------------------
>>>>>>
>>>>>>
>>>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
>>>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s
>>>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
>>>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s
>>>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
>>>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s
>>>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
>>>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s
>>>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
>>>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s
>>>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
>>>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s
>>>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
>>>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s
>>>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
>>>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s
>>>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
>>>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s
>>>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
>>>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s
>>>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
>>>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s
>>>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
>>>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s
>>>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
>>>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s
>>>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
>>>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s
>>>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
>>>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s
>>>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
>>>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s
>>>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
>>>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>> active+GB /
>>>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>> ________________________________
>>>>>
>>>>> PLEASE NOTE: The information contained in this electronic mail
>>>>> message is intended only for the use of the designated
>>>>> recipient(s) named above. If the reader of this message is not the
>>>>> intended recipient, you are hereby notified that you have received
>>>>> this message in error and that any review, dissemination,
>>>>> distribution, or copying of this message is strictly prohibited.
>>>>> If you have received this communication in error, please notify
>>>>> the sender by telephone or e-mail (as shown above) immediately and
>>>>> destroy any and all copies of this message in your possession
>>>>> (whether hard copies or electronically stored copies).
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> С уважением, Фасихов Ирек Нургаязович
>>>> Моб.: +79229045757
>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> ceph-devel" in the body of a message to majordomo@vger.kernel.org
>>>> <mailto:majordomo@vger.kernel.org>
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
       [not found]                                                                                 ` <755F6B91B3BE364F9BCA11EA3F9E0C6F2CD8214C-cXZ6iGhjG0il5HHZYNR2WTJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
@ 2015-04-27 17:41                                                                                   ` Mark Nelson
  2015-04-27 18:01                                                                                     ` [ceph-users] " Somnath Roy
  0 siblings, 1 reply; 35+ messages in thread
From: Mark Nelson @ 2015-04-27 17:41 UTC (permalink / raw)
  To: Somnath Roy, Alexandre DERUMIER; +Cc: ceph-users, ceph-devel, Milosz Tanski

Hi Somnath,

Forgive me as I think this was discussed earlier in the thread, but did 
we confirm that the patch/fix/etc does not 100% fix the problem?

Mark

On 04/27/2015 12:25 PM, Somnath Roy wrote:
> Alexandre,
> The moment you restarted after hitting the tcmalloc trace, irrespective of what value you set as thread cache, it will perform better and that's what happening in your case I guess.
> Yes, setting this value kind of tricky and very much dependent on your setup/workload etc.
> I would suggest to set it ~128M and run your test longer say ~10 hours or so.
>
> Thanks & Regards
> Somnath
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Alexandre DERUMIER
> Sent: Monday, April 27, 2015 9:46 AM
> To: Mark Nelson
> Cc: ceph-users; ceph-devel; Milosz Tanski
> Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> Ok, just to make sure that I understand:
>
>>> tcmalloc un-tuned: ~50k IOPS once bug sets in
> yes, it's really random, but when hitting the bug, yes this is the worste I have seen.
>
>
>>> tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS
> yes
>>> jemalloc un-tuned: ~150k IOPS
> It's more around 185k iops  (a little bit less than tcmalloc, with a little bit more cpu usage)
>
>
>
>
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Lundi 27 Avril 2015 18:34:50
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> On 04/27/2015 10:11 AM, Alexandre DERUMIER wrote:
>>>> Is it possible that you were suffering from the bug during the first
>>>> test but once reinstalled you hadn't hit it yet?
>>
>> yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
>> I had patched it, but I think it's not enough.
>> I had always this bug in random, but mainly when I have a "lot" of concurrent client (20 -40).
>> more client increase - lower iops .
>>
>>
>> Today,I had try to start osd with
>> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M , and now it's working fine in all my benchs.
>>
>>
>>>> That's a pretty major
>>>> performance swing. I'm not sure if we can draw any conclusions about
>>>> jemalloc vs tcmalloc until we can figure out what went wrong.
>>
>>  From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% or 2%).
>> Tcmalloc seem to works better, with correct tuning of thread_cache_bytes.
>>
>>
>> But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES correctly.
>> Maybe Sommath can tell us ?
>
> Ok, just to make sure that I understand:
>
> tcmalloc un-tuned: ~50k IOPS once bug sets in tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS jemalloc un-tuned: ~150k IOPS
>
> Is that correct? Are there configurations/results I'm missing?
>
> Mark
>
>>
>>
>> ----- Mail original -----
>> De: "Mark Nelson" <mnelson@redhat.com>
>> À: "aderumier" <aderumier@odiso.com>
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
>> <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>> Envoyé: Lundi 27 Avril 2015 16:54:34
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
>> Hi Alex,
>>
>> Is it possible that you were suffering from the bug during the first
>> test but once reinstalled you hadn't hit it yet? That's a pretty major
>> performance swing. I'm not sure if we can draw any conclusions about
>> jemalloc vs tcmalloc until we can figure out what went wrong.
>>
>> Mark
>>
>> On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote:
>>>>> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>>>
>>> Ok, I really think I have patched tcmalloc wrongly.
>>> I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread).
>>>
>>> So better than jemalloc.
>>>
>>>
>>> ----- Mail original -----
>>> De: "aderumier" <aderumier@odiso.com>
>>> À: "Mark Nelson" <mnelson@redhat.com>
>>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
>>> <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>>> Envoyé: Lundi 27 Avril 2015 07:01:21
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> Hi,
>>>
>>> also another big difference,
>>>
>>> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc.
>>>
>>> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>>>
>>>
>>> ----- Mail original -----
>>> De: "aderumier" <aderumier@odiso.com>
>>> À: "Mark Nelson" <mnelson@redhat.com>
>>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
>>> <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>>> Envoyé: Samedi 25 Avril 2015 06:45:43
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>>>> We haven't done any kind of real testing on jemalloc, so use at
>>>>> your own peril. Having said that, we've also been very interested
>>>>> in hearing community feedback from folks trying it out, so please
>>>>> feel free to give it a shot. :D
>>>
>>> Some feedback, I have runned bench all the night, no speed regression.
>>>
>>> And I have a speed increase with fio with more jobs. (with tcmalloc,
>>> it seem to be the reverse)
>>>
>>> with tcmalloc :
>>>
>>> 10 fio-rbd jobs = 300k iops
>>> 15 fio-rbd jobs = 290k iops
>>> 20 fio-rbd jobs = 270k iops
>>> 40 fio-rbd jobs = 250k iops
>>>
>>> (all with up and down values during the fio bench)
>>>
>>>
>>> with jemalloc:
>>>
>>> 10 fio-rbd jobs = 300k iops
>>> 15 fio-rbd jobs = 320k iops
>>> 20 fio-rbd jobs = 330k iops
>>> 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client
>>> machine with 20cores 100%)
>>>
>>> (all with contant values during the fio bench)
>>>
>>> ----- Mail original -----
>>> De: "Mark Nelson" <mnelson@redhat.com>
>>> À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier"
>>> <aderumier@odiso.com>
>>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
>>> <ceph-devel@vger.kernel.org>, "Somnath Roy"
>>> <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com>
>>> Envoyé: Vendredi 24 Avril 2015 20:02:15
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> We haven't done any kind of real testing on jemalloc, so use at your
>>> own peril. Having said that, we've also been very interested in
>>> hearing community feedback from folks trying it out, so please feel
>>> free to give it a shot. :D
>>>
>>> Mark
>>>
>>> On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote:
>>>> Is jemalloc recommanded in general? Does it also work for firefly?
>>>>
>>>> Stefan
>>>>
>>>> Excuse my typo sent from my mobile phone.
>>>>
>>>> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER
>>>> <aderumier@odiso.com
>>>> <mailto:aderumier@odiso.com>>:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have finished to rebuild ceph with jemalloc,
>>>>>
>>>>> all seem to working fine.
>>>>>
>>>>> I got a constant 300k iops for the moment, so no speed regression.
>>>>>
>>>>> I'll do more long benchmark next week.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Alexandre
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>>
>>>>> À: "Somnath Roy" <Somnath.Roy@sandisk.com
>>>>> <mailto:Somnath.Roy@sandisk.com>>
>>>>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>,
>>>>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>,
>>>>> "ceph-users" <ceph-users@lists.ceph.com
>>>>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel"
>>>>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>,
>>>>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>>
>>>>> Envoyé: Vendredi 24 Avril 2015 13:37:52
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> Hi,Alexandre!
>>>>> Do not try to change the parameter vm.min_free_kbytes?
>>>>>
>>>>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com
>>>>> <mailto:Somnath.Roy@sandisk.com> > :
>>>>>
>>>>>
>>>>> Alexandre,
>>>>> You can configure with --with-jemalloc or ./do_autogen -J to build
>>>>> ceph with jemalloc.
>>>>>
>>>>> Thanks & Regards
>>>>> Somnath
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre
>>>>> DERUMIER
>>>>> Sent: Thursday, April 23, 2015 4:56 AM
>>>>> To: Mark Nelson
>>>>> Cc: ceph-users; ceph-devel; Milosz Tanski
>>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting
>>>>> osd daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>>>> If you have the means to compile the same version of ceph with
>>>>>>> jemalloc, I would be very interested to see how it does.
>>>>>
>>>>> Yes, sure. (I have around 3-4 weeks to do all the benchs)
>>>>>
>>>>> But I don't know how to do it ?
>>>>> I'm running the cluster on centos7.1, maybe it can be easy to patch
>>>>> the srpms to rebuild the package with jemalloc.
>>>>>
>>>>>
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com>
>>>>>>
>>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>>>> , "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" <
>>>>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>> Envoyé: Jeudi 23 Avril 2015 13:33:00
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> Thanks for the testing Alexandre!
>>>>>
>>>>> If you have the means to compile the same version of ceph with
>>>>> jemalloc, I would be very interested to see how it does.
>>>>>
>>>>> In some ways I'm glad it turned out not to be NUMA. I still suspect
>>>>> we will have to deal with it at some point, but perhaps not today.
>>>>> ;)
>>>>>
>>>>> Mark
>>>>>
>>>>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote:
>>>>>> Maybe it's tcmalloc related
>>>>>> I thinked to have patched it correctly, but perf show a lot of
>>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>>>
>>>>>> before osd restart (100k)
>>>>>> ------------------
>>>>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::CentralFreeList::ReleaseToSpans
>>>>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper
>>>>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd
>>>>>> libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::CentralFreeList::ReleaseListToSpans
>>>>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07%
>>>>>> ceph-osd
>>>>>> libstdc++.so.6.0.19 [.] std::basic_string<char,
>>>>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91%
>>>>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88%
>>>>>> ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd
>>>>>> ceph-osd [.] Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k]
>>>>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so
>>>>>> [.] pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k]
>>>>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k]
>>>>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k]
>>>>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57%
>>>>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd
>>>>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper
>>>>>> [kernel.kallsyms] [k] __schedule
>>>>>>
>>>>>>
>>>>>> after osd restart (300k iops)
>>>>>> ------------------------------
>>>>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92%
>>>>>> ceph-osd
>>>>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper
>>>>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd
>>>>>> libstdc++.so.6.0.19 [.] std::basic_string<char,
>>>>>> std::char_traits<char>, std::allocator<char> >::basic_string 1.34%
>>>>>> ceph-osd
>>>>>> libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd
>>>>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.]
>>>>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k]
>>>>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so
>>>>>> [.] pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k]
>>>>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu
>>>>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd
>>>>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k]
>>>>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65%
>>>>>> ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61%
>>>>>> ceph-osd ceph-osd [.]
>>>>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
>>>>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd
>>>>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd
>>>>>> libstdc++[.]
>>>>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56%
>>>>>> ceph-osd
>>>>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55%
>>>>>> libstdc++ceph-osd
>>>>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.]
>>>>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light
>>>>>>
>>>>>> ----- Mail original -----
>>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>>>>>
>>>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>
>>>>>>> , "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>>> Envoyé: Jeudi 23 Avril 2015 10:00:34
>>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>>
>>>>>> Hi,
>>>>>> I'm hitting this bug again today.
>>>>>>
>>>>>> So don't seem to be numa related (I have try to flush linux buffer
>>>>>> to be sure).
>>>>>>
>>>>>> and tcmalloc is patched (I don't known how to verify that it's ok).
>>>>>>
>>>>>> I don't have restarted osd yet.
>>>>>>
>>>>>> Maybe some perf trace could be usefulll ?
>>>>>>
>>>>>>
>>>>>> ----- Mail original -----
>>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>>>>>
>>>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>
>>>>>>> , "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>>> Envoyé: Mercredi 22 Avril 2015 18:30:26
>>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>>> I feel it is due to tcmalloc issue
>>>>>>
>>>>>> Indeed, I had patched one of my node, but not the other.
>>>>>> So maybe I have hit this bug. (but I can't confirm, I don't have
>>>>>> traces).
>>>>>>
>>>>>> But numa interleaving seem to help in my case (maybe not from
>>>>>> 100->300k, but 250k->300k).
>>>>>>
>>>>>> I need to do more long tests to confirm that.
>>>>>>
>>>>>>
>>>>>> ----- Mail original -----
>>>>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com>
>>>>>>> , "aderumier"
>>>>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski"
>>>>>> < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>>> Envoyé: Mercredi 22 Avril 2015 16:34:33
>>>>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd
>>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>>
>>>>>> I feel it is due to tcmalloc issue
>>>>>>
>>>>>> I have seen similar issue in my setup after 20 days.
>>>>>>
>>>>>> Thanks,
>>>>>> Srinivas
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Mark
>>>>>> Nelson
>>>>>> Sent: Wednesday, April 22, 2015 7:31 PM
>>>>>> To: Alexandre DERUMIER; Milosz Tanski
>>>>>> Cc: ceph-devel; ceph-users
>>>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting
>>>>>> osd daemon improve performance from 100k iops to 300k iops
>>>>>>
>>>>>> Hi Alexandre,
>>>>>>
>>>>>> We should discuss this at the perf meeting today. We knew NUMA
>>>>>> node affinity issues were going to crop up sooner or later (and
>>>>>> indeed already have in some cases), but this is pretty major. It's
>>>>>> probably time to really dig in and figure out how to deal with this.
>>>>>>
>>>>>> Note: this is one of the reasons I like small nodes with single
>>>>>> sockets and fewer OSDs.
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have done a lot of test today, and it seem indeed numa related.
>>>>>>>
>>>>>>> My numastat was
>>>>>>>
>>>>>>> # numastat
>>>>>>> node0 node1
>>>>>>> numa_hit 99075422 153976877
>>>>>>> numa_miss 167490965 1493663
>>>>>>> numa_foreign 1493663 167491417
>>>>>>> interleave_hit 157745 167015
>>>>>>> local_node 99049179 153830554
>>>>>>> other_node 167517697 1639986
>>>>>>>
>>>>>>> So, a lot of miss.
>>>>>>>
>>>>>>> In this case , I can reproduce ios going from 85k to 300k iops,
>>>>>>> up and down.
>>>>>>>
>>>>>>> now setting
>>>>>>> echo 0 > /proc/sys/kernel/numa_balancing
>>>>>>>
>>>>>>> and starting osd daemons with
>>>>>>>
>>>>>>> numactl --interleave=all /usr/bin/ceph-osd
>>>>>>>
>>>>>>>
>>>>>>> I have a constant 300k iops !
>>>>>>>
>>>>>>>
>>>>>>> I wonder if it could be improve by binding osd daemons to
>>>>>>> specific numa node.
>>>>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also
>>>>>>> require ceph.conf osd threads tunning.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Mail original -----
>>>>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com>
>>>>>>>>
>>>>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>>>>>>
>>>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23
>>>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting
>>>>>>> osd daemon improve performance from 100k iops to 300k iops
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER <
>>>>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote:
>>>>>>>
>>>>>>>
>>>>>>> I wonder if it could be numa related,
>>>>>>>
>>>>>>> I'm using centos 7.1,
>>>>>>> and auto numa balacning is enabled
>>>>>>>
>>>>>>> cat /proc/sys/kernel/numa_balancing = 1
>>>>>>>
>>>>>>> Maybe osd daemon access to buffer on wrong numa node.
>>>>>>>
>>>>>>> I'll try to reproduce the problem
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Can you force the degenerate case using numactl? To either affirm
>>>>>>> or deny your suspicion.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Mail original -----
>>>>>>> De: "aderumier" < aderumier@odiso.com
>>>>>>> <mailto:aderumier@odiso.com> >
>>>>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" <
>>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05
>>>>>>> Objet: [ceph-users] strange benchmark problem : restarting osd
>>>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was doing some benchmarks,
>>>>>>> I have found an strange behaviour.
>>>>>>>
>>>>>>> Using fio with rbd engine, I was able to reach around 100k iops.
>>>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>>>
>>>>>>> then after restarting all osd daemons,
>>>>>>>
>>>>>>> the same fio benchmark show now around 300k iops.
>>>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>>>
>>>>>>>
>>>>>>> any ideas?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> before restarting osd
>>>>>>> ---------------------
>>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>>> ioengine=rbd, iodepth=32 ...
>>>>>>> fio-2.2.7-10-g51e9
>>>>>>> Starting 10 processes
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s]
>>>>>>> [96.6K/0/0 iops] [eta 14m:45s]
>>>>>>> fio: terminating on signal 2
>>>>>>>
>>>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed
>>>>>>> Apr
>>>>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871,
>>>>>>> runt= 26215msec slat (usec): min=5, max=3685, avg=16.89,
>>>>>>> stdev=17.38 clat
>>>>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
>>>>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
>>>>>>> (usec):
>>>>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
>>>>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
>>>>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024],
>>>>>>> | 95.00th=[14656], 99.00th=[25984], 99.50th=[30336],
>>>>>>> | 99.90th=[38656], 99.95th=[41728], 99.99th=[47360]
>>>>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
>>>>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%,
>>>>>>> 100=0.01%, 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
>>>>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) :
>>>>>>> 100=0.01% cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
>>>>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
>>>>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>>>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
>>>>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
>>>>>>> drop=r=0/w=0/d=0 latency : target=0, window=0,
>>>>>>> percentile=100.00%,
>>>>>>> depth=32
>>>>>>>
>>>>>>> Run status group 0 (all jobs):
>>>>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s,
>>>>>>> maxb=451487KB/s, mint=26215msec, maxt=26215msec
>>>>>>>
>>>>>>> Disk stats (read/write):
>>>>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
>>>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>>> ioengine=rbd, iodepth=32
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> AFTER RESTARTING OSDS
>>>>>>> ----------------------
>>>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>>> ioengine=rbd, iodepth=32 ...
>>>>>>> fio-2.2.7-10-g51e9
>>>>>>> Starting 10 processes
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s]
>>>>>>> [296K/0/0 iops] [eta 01h:09m:27s]
>>>>>>> fio: terminating on signal 2
>>>>>>>
>>>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed
>>>>>>> Apr
>>>>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218,
>>>>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59,
>>>>>>> stdev=40.35 clat
>>>>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
>>>>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
>>>>>>> (usec):
>>>>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
>>>>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
>>>>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[
>>>>>>> | 1720], 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920],
>>>>>>> | 99.95th=[ 9792], 99.99th=[436224]
>>>>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
>>>>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%,
>>>>>>> 100=0.01%, 250=1.23% lat (usec) : 500=25.64%, 750=29.15%,
>>>>>>> 1000=18.84% lat (msec)
>>>>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
>>>>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
>>>>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
>>>>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
>>>>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
>>>>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
>>>>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
>>>>>>> target=0, window=0, percentile=100.00%, depth=32
>>>>>>>
>>>>>>> Run status group 0 (all jobs):
>>>>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s,
>>>>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec
>>>>>>>
>>>>>>> Disk stats (read/write):
>>>>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> CEPH LOG
>>>>>>> --------
>>>>>>>
>>>>>>> before restarting osd
>>>>>>> ----------------------
>>>>>>>
>>>>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
>>>>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s
>>>>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
>>>>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s
>>>>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
>>>>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s
>>>>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
>>>>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s
>>>>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
>>>>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s
>>>>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
>>>>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s
>>>>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
>>>>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s
>>>>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
>>>>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s
>>>>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
>>>>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s
>>>>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
>>>>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s
>>>>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
>>>>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s
>>>>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
>>>>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s
>>>>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
>>>>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s
>>>>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
>>>>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s
>>>>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
>>>>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s
>>>>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
>>>>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s
>>>>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
>>>>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s
>>>>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
>>>>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s
>>>>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
>>>>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s
>>>>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
>>>>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s
>>>>>>>
>>>>>>>
>>>>>>> restarting osd
>>>>>>> ---------------
>>>>>>>
>>>>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
>>>>>>> [INF] osd.0 marked itself down
>>>>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
>>>>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in
>>>>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
>>>>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
>>>>>>> stale+active+remapped, 106 stale+active+clean, 54
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+794
>>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail;
>>>>>>> active+516
>>>>>>> kB/s rd, 130 op/s
>>>>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
>>>>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in
>>>>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
>>>>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
>>>>>>> stale+active+remapped, 106 stale+active+clean, 54
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+794
>>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
>>>>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
>>>>>>> stale+active+remapped, 106 stale+active+clean, 54
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+794
>>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
>>>>>>> [INF] osd.1 marked itself down
>>>>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
>>>>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
>>>>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
>>>>>>> stale+active+remapped, 216 stale+active+clean, 49
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+684
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
>>>>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
>>>>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
>>>>>>> stale+active+remapped, 216 stale+active+clean, 49
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+684
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
>>>>>>> [INF] osd.2 marked itself down
>>>>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
>>>>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in
>>>>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
>>>>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
>>>>>>> stale+active+remapped, 300 stale+active+clean, 40
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+600
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
>>>>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in
>>>>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
>>>>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
>>>>>>> stale+active+remapped, 300 stale+active+clean, 40
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+600
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
>>>>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
>>>>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
>>>>>>> undersized; 3/9 in osds are down; clock skew detected on
>>>>>>> mon.ceph1-2
>>>>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
>>>>>>> [INF] osd.1 10.7.0.152:6800/14848 boot
>>>>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
>>>>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
>>>>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
>>>>>>> stale+active+remapped, 300 stale+active+clean, 40
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+600
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
>>>>>>> [INF] osd.3 marked itself down
>>>>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
>>>>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in
>>>>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
>>>>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
>>>>>>> stale+active+remapped, 394 stale+active+clean, 37
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+506
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
>>>>>>> [INF] osd.6 marked itself down
>>>>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
>>>>>>> [INF] osd.2 10.7.0.152:6812/15410 boot
>>>>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
>>>>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in
>>>>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
>>>>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
>>>>>>> stale+active+remapped, 502 stale+active+clean, 32
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+398
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
>>>>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in
>>>>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
>>>>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
>>>>>>> stale+active+remapped, 502 stale+active+clean, 32
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+398
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
>>>>>>> [INF] osd.3 10.7.0.152:6816/15971 boot
>>>>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
>>>>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
>>>>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
>>>>>>> stale+active+remapped, 502 stale+active+clean, 32
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+398
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
>>>>>>> [INF] osd.7 marked itself down
>>>>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
>>>>>>> [INF] osd.0 10.7.0.152:6804/14481 boot
>>>>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
>>>>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
>>>>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
>>>>>>> stale+active+remapped, 608 stale+active+clean, 27
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+292
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
>>>>>>> [INF] osd.6 10.7.0.153:6808/21955 boot
>>>>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
>>>>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in
>>>>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
>>>>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
>>>>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
>>>>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB
>>>>>>> active+data,
>>>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
>>>>>>> [INF] osd.8 marked itself down
>>>>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
>>>>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
>>>>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
>>>>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
>>>>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB
>>>>>>> active+data,
>>>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
>>>>>>> [INF] osd.7 10.7.0.153:6804/22536 boot
>>>>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
>>>>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in
>>>>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
>>>>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
>>>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>>>> active+undersized+degraded+remapped, 5 peering, 13
>>>>>>> active+undersized+degraded+active+remapped,
>>>>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used,
>>>>>>> 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
>>>>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in
>>>>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
>>>>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
>>>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>>>> active+undersized+degraded+remapped, 5 peering, 13 active
>>>>>>>
>>>>>>>
>>>>>>> AFTER OSD RESTART
>>>>>>> ------------------
>>>>>>>
>>>>>>>
>>>>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
>>>>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s
>>>>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
>>>>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s
>>>>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
>>>>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s
>>>>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
>>>>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s
>>>>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
>>>>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s
>>>>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
>>>>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s
>>>>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
>>>>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s
>>>>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
>>>>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s
>>>>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
>>>>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s
>>>>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
>>>>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s
>>>>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
>>>>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s
>>>>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
>>>>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s
>>>>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
>>>>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s
>>>>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
>>>>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s
>>>>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
>>>>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s
>>>>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
>>>>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s
>>>>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
>>>>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>> ________________________________
>>>>>>
>>>>>> PLEASE NOTE: The information contained in this electronic mail
>>>>>> message is intended only for the use of the designated
>>>>>> recipient(s) named above. If the reader of this message is not the
>>>>>> intended recipient, you are hereby notified that you have received
>>>>>> this message in error and that any review, dissemination,
>>>>>> distribution, or copying of this message is strictly prohibited.
>>>>>> If you have received this communication in error, please notify
>>>>>> the sender by telephone or e-mail (as shown above) immediately and
>>>>>> destroy any and all copies of this message in your possession
>>>>>> (whether hard copies or electronically stored copies).
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> С уважением, Фасихов Ирек Нургаязович
>>>>> Моб.: +79229045757
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> ceph-devel" in the body of a message to majordomo@vger.kernel.org
>>>>> <mailto:majordomo@vger.kernel.org>
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
  2015-04-27 17:41                                                                                   ` Mark Nelson
@ 2015-04-27 18:01                                                                                     ` Somnath Roy
  0 siblings, 0 replies; 35+ messages in thread
From: Somnath Roy @ 2015-04-27 18:01 UTC (permalink / raw)
  To: Mark Nelson, Alexandre DERUMIER; +Cc: ceph-users, ceph-devel, Milosz Tanski

Yes, the tcmalloc patch we applied is not to solve the trace we are seeing. The env varaiable code path was noop in the tcmalloc code base and the patch has resolved that. Now, setting the env variable is taking effect within tcmalloc code base.
Now, this thread cache env variable is a performance workaround with tcmalloc that will prevent the tcmalloc perf trace we are hitting to come soon :-) or may never be coming depending on your workload (objecs <=32K is vulnerable to hit this).
Basically, giving more cache it will build a bigger free list so that it doesn't have to do garbage collect often or go to central free list often. Here is the nice article explaining this.

http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html

So, it is very difficult to predict the optimal value for this thread cache and that's why it will not solve the issue completely.

Thanks & Regards
Somnath

-----Original Message-----
From: Mark Nelson [mailto:mnelson@redhat.com] 
Sent: Monday, April 27, 2015 10:42 AM
To: Somnath Roy; Alexandre DERUMIER
Cc: ceph-users; ceph-devel; Milosz Tanski
Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

Hi Somnath,

Forgive me as I think this was discussed earlier in the thread, but did 
we confirm that the patch/fix/etc does not 100% fix the problem?

Mark

On 04/27/2015 12:25 PM, Somnath Roy wrote:
> Alexandre,
> The moment you restarted after hitting the tcmalloc trace, irrespective of what value you set as thread cache, it will perform better and that's what happening in your case I guess.
> Yes, setting this value kind of tricky and very much dependent on your setup/workload etc.
> I would suggest to set it ~128M and run your test longer say ~10 hours or so.
>
> Thanks & Regards
> Somnath
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Alexandre DERUMIER
> Sent: Monday, April 27, 2015 9:46 AM
> To: Mark Nelson
> Cc: ceph-users; ceph-devel; Milosz Tanski
> Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> Ok, just to make sure that I understand:
>
>>> tcmalloc un-tuned: ~50k IOPS once bug sets in
> yes, it's really random, but when hitting the bug, yes this is the worste I have seen.
>
>
>>> tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS
> yes
>>> jemalloc un-tuned: ~150k IOPS
> It's more around 185k iops  (a little bit less than tcmalloc, with a little bit more cpu usage)
>
>
>
>
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
> Envoyé: Lundi 27 Avril 2015 18:34:50
> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops
>
> On 04/27/2015 10:11 AM, Alexandre DERUMIER wrote:
>>>> Is it possible that you were suffering from the bug during the first
>>>> test but once reinstalled you hadn't hit it yet?
>>
>> yes, I'm pretty sure I'm hitting the tcmalloc bug since the beginning.
>> I had patched it, but I think it's not enough.
>> I had always this bug in random, but mainly when I have a "lot" of concurrent client (20 -40).
>> more client increase - lower iops .
>>
>>
>> Today,I had try to start osd with
>> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128M , and now it's working fine in all my benchs.
>>
>>
>>>> That's a pretty major
>>>> performance swing. I'm not sure if we can draw any conclusions about
>>>> jemalloc vs tcmalloc until we can figure out what went wrong.
>>
>>  From my bench, jemalloc use a little bit more cpu than tcmalloc (maybe 1% or 2%).
>> Tcmalloc seem to works better, with correct tuning of thread_cache_bytes.
>>
>>
>> But I don't known how to tune TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES correctly.
>> Maybe Sommath can tell us ?
>
> Ok, just to make sure that I understand:
>
> tcmalloc un-tuned: ~50k IOPS once bug sets in tcmalloc with patch and 128MB thread cache bytes: ~195k IOPS jemalloc un-tuned: ~150k IOPS
>
> Is that correct? Are there configurations/results I'm missing?
>
> Mark
>
>>
>>
>> ----- Mail original -----
>> De: "Mark Nelson" <mnelson@redhat.com>
>> À: "aderumier" <aderumier@odiso.com>
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
>> <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>> Envoyé: Lundi 27 Avril 2015 16:54:34
>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>> daemon improve performance from 100k iops to 300k iops
>>
>> Hi Alex,
>>
>> Is it possible that you were suffering from the bug during the first
>> test but once reinstalled you hadn't hit it yet? That's a pretty major
>> performance swing. I'm not sure if we can draw any conclusions about
>> jemalloc vs tcmalloc until we can figure out what went wrong.
>>
>> Mark
>>
>> On 04/27/2015 12:46 AM, Alexandre DERUMIER wrote:
>>>>> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>>>
>>> Ok, I really think I have patched tcmalloc wrongly.
>>> I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread).
>>>
>>> So better than jemalloc.
>>>
>>>
>>> ----- Mail original -----
>>> De: "aderumier" <aderumier@odiso.com>
>>> À: "Mark Nelson" <mnelson@redhat.com>
>>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
>>> <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>>> Envoyé: Lundi 27 Avril 2015 07:01:21
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> Hi,
>>>
>>> also another big difference,
>>>
>>> I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc.
>>>
>>> I'll retest tcmalloc, because I was prety sure to have patched it correctly.
>>>
>>>
>>> ----- Mail original -----
>>> De: "aderumier" <aderumier@odiso.com>
>>> À: "Mark Nelson" <mnelson@redhat.com>
>>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
>>> <ceph-devel@vger.kernel.org>, "Milosz Tanski" <milosz@adfin.com>
>>> Envoyé: Samedi 25 Avril 2015 06:45:43
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>>>> We haven't done any kind of real testing on jemalloc, so use at
>>>>> your own peril. Having said that, we've also been very interested
>>>>> in hearing community feedback from folks trying it out, so please
>>>>> feel free to give it a shot. :D
>>>
>>> Some feedback, I have runned bench all the night, no speed regression.
>>>
>>> And I have a speed increase with fio with more jobs. (with tcmalloc,
>>> it seem to be the reverse)
>>>
>>> with tcmalloc :
>>>
>>> 10 fio-rbd jobs = 300k iops
>>> 15 fio-rbd jobs = 290k iops
>>> 20 fio-rbd jobs = 270k iops
>>> 40 fio-rbd jobs = 250k iops
>>>
>>> (all with up and down values during the fio bench)
>>>
>>>
>>> with jemalloc:
>>>
>>> 10 fio-rbd jobs = 300k iops
>>> 15 fio-rbd jobs = 320k iops
>>> 20 fio-rbd jobs = 330k iops
>>> 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client
>>> machine with 20cores 100%)
>>>
>>> (all with contant values during the fio bench)
>>>
>>> ----- Mail original -----
>>> De: "Mark Nelson" <mnelson@redhat.com>
>>> À: "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier"
>>> <aderumier@odiso.com>
>>> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel"
>>> <ceph-devel@vger.kernel.org>, "Somnath Roy"
>>> <Somnath.Roy@sandisk.com>, "Milosz Tanski" <milosz@adfin.com>
>>> Envoyé: Vendredi 24 Avril 2015 20:02:15
>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>> daemon improve performance from 100k iops to 300k iops
>>>
>>> We haven't done any kind of real testing on jemalloc, so use at your
>>> own peril. Having said that, we've also been very interested in
>>> hearing community feedback from folks trying it out, so please feel
>>> free to give it a shot. :D
>>>
>>> Mark
>>>
>>> On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote:
>>>> Is jemalloc recommanded in general? Does it also work for firefly?
>>>>
>>>> Stefan
>>>>
>>>> Excuse my typo sent from my mobile phone.
>>>>
>>>> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER
>>>> <aderumier@odiso.com
>>>> <mailto:aderumier@odiso.com>>:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have finished to rebuild ceph with jemalloc,
>>>>>
>>>>> all seem to working fine.
>>>>>
>>>>> I got a constant 300k iops for the moment, so no speed regression.
>>>>>
>>>>> I'll do more long benchmark next week.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Alexandre
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Irek Fasikhov" <malmyzh@gmail.com <mailto:malmyzh@gmail.com>>
>>>>> À: "Somnath Roy" <Somnath.Roy@sandisk.com
>>>>> <mailto:Somnath.Roy@sandisk.com>>
>>>>> Cc: "aderumier" <aderumier@odiso.com <mailto:aderumier@odiso.com>>,
>>>>> "Mark Nelson" <mnelson@redhat.com <mailto:mnelson@redhat.com>>,
>>>>> "ceph-users" <ceph-users@lists.ceph.com
>>>>> <mailto:ceph-users@lists.ceph.com>>, "ceph-devel"
>>>>> <ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>>,
>>>>> "Milosz Tanski" <milosz@adfin.com <mailto:milosz@adfin.com>>
>>>>> Envoyé: Vendredi 24 Avril 2015 13:37:52
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> Hi,Alexandre!
>>>>> Do not try to change the parameter vm.min_free_kbytes?
>>>>>
>>>>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com
>>>>> <mailto:Somnath.Roy@sandisk.com> > :
>>>>>
>>>>>
>>>>> Alexandre,
>>>>> You can configure with --with-jemalloc or ./do_autogen -J to build
>>>>> ceph with jemalloc.
>>>>>
>>>>> Thanks & Regards
>>>>> Somnath
>>>>>
>>>>> -----Original Message-----
>>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Alexandre
>>>>> DERUMIER
>>>>> Sent: Thursday, April 23, 2015 4:56 AM
>>>>> To: Mark Nelson
>>>>> Cc: ceph-users; ceph-devel; Milosz Tanski
>>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting
>>>>> osd daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>>>> If you have the means to compile the same version of ceph with
>>>>>>> jemalloc, I would be very interested to see how it does.
>>>>>
>>>>> Yes, sure. (I have around 3-4 weeks to do all the benchs)
>>>>>
>>>>> But I don't know how to do it ?
>>>>> I'm running the cluster on centos7.1, maybe it can be easy to patch
>>>>> the srpms to rebuild the package with jemalloc.
>>>>>
>>>>>
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com>
>>>>>>
>>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>>>> , "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel" <
>>>>> ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org> >,
>>>>> "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>> Envoyé: Jeudi 23 Avril 2015 13:33:00
>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>
>>>>> Thanks for the testing Alexandre!
>>>>>
>>>>> If you have the means to compile the same version of ceph with
>>>>> jemalloc, I would be very interested to see how it does.
>>>>>
>>>>> In some ways I'm glad it turned out not to be NUMA. I still suspect
>>>>> we will have to deal with it at some point, but perhaps not today.
>>>>> ;)
>>>>>
>>>>> Mark
>>>>>
>>>>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote:
>>>>>> Maybe it's tcmalloc related
>>>>>> I thinked to have patched it correctly, but perf show a lot of
>>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>>>
>>>>>> before osd restart (100k)
>>>>>> ------------------
>>>>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::CentralFreeList::ReleaseToSpans
>>>>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper
>>>>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd
>>>>>> libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::CentralFreeList::ReleaseListToSpans
>>>>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07%
>>>>>> ceph-osd
>>>>>> libstdc++.so.6.0.19 [.] std::basic_string<char,
>>>>>> std::char_traits<char>, std::allocator<char> >::basic_string 0.91%
>>>>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88%
>>>>>> ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd
>>>>>> ceph-osd [.] Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k]
>>>>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so
>>>>>> [.] pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k]
>>>>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k]
>>>>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k]
>>>>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57%
>>>>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd
>>>>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper
>>>>>> [kernel.kallsyms] [k] __schedule
>>>>>>
>>>>>>
>>>>>> after osd restart (300k iops)
>>>>>> ------------------------------
>>>>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92%
>>>>>> ceph-osd
>>>>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper
>>>>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd
>>>>>> libstdc++.so.6.0.19 [.] std::basic_string<char,
>>>>>> std::char_traits<char>, std::allocator<char> >::basic_string 1.34%
>>>>>> ceph-osd
>>>>>> libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::ThreadCache::ReleaseToCentralCache
>>>>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd
>>>>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.]
>>>>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k]
>>>>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so
>>>>>> [.] pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k]
>>>>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu
>>>>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd
>>>>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k]
>>>>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65%
>>>>>> ceph-osd libtcmalloc.so.4.1.2 [.]
>>>>>> tcmalloc::CentralFreeList::FetchFromSpans
>>>>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61%
>>>>>> ceph-osd ceph-osd [.]
>>>>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
>>>>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd
>>>>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd
>>>>>> libstdc++[.]
>>>>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56%
>>>>>> ceph-osd
>>>>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55%
>>>>>> libstdc++ceph-osd
>>>>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.]
>>>>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light
>>>>>>
>>>>>> ----- Mail original -----
>>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>>>>>
>>>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>
>>>>>>> , "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>>> Envoyé: Jeudi 23 Avril 2015 10:00:34
>>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>>
>>>>>> Hi,
>>>>>> I'm hitting this bug again today.
>>>>>>
>>>>>> So don't seem to be numa related (I have try to flush linux buffer
>>>>>> to be sure).
>>>>>>
>>>>>> and tcmalloc is patched (I don't known how to verify that it's ok).
>>>>>>
>>>>>> I don't have restarted osd yet.
>>>>>>
>>>>>> Maybe some perf trace could be usefulll ?
>>>>>>
>>>>>>
>>>>>> ----- Mail original -----
>>>>>> De: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>>>>>
>>>>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>>> Cc: "ceph-users" < ceph-users@lists.ceph.com
>>>>>> <mailto:ceph-users@lists.ceph.com> >, "ceph-devel"
>>>>>> < ceph-devel@vger.kernel.org <mailto:ceph-devel@vger.kernel.org>
>>>>>>> , "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>>> Envoyé: Mercredi 22 Avril 2015 18:30:26
>>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd
>>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>>> I feel it is due to tcmalloc issue
>>>>>>
>>>>>> Indeed, I had patched one of my node, but not the other.
>>>>>> So maybe I have hit this bug. (but I can't confirm, I don't have
>>>>>> traces).
>>>>>>
>>>>>> But numa interleaving seem to help in my case (maybe not from
>>>>>> 100->300k, but 250k->300k).
>>>>>>
>>>>>> I need to do more long tests to confirm that.
>>>>>>
>>>>>>
>>>>>> ----- Mail original -----
>>>>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com
>>>>>> <mailto:Srinivasula.Maram@sandisk.com> >
>>>>>> À: "Mark Nelson" < mnelson@redhat.com <mailto:mnelson@redhat.com>
>>>>>>> , "aderumier"
>>>>>> < aderumier@odiso.com <mailto:aderumier@odiso.com> >, "Milosz Tanski"
>>>>>> < milosz@adfin.com <mailto:milosz@adfin.com> >
>>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>>> Envoyé: Mercredi 22 Avril 2015 16:34:33
>>>>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd
>>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>>
>>>>>> I feel it is due to tcmalloc issue
>>>>>>
>>>>>> I have seen similar issue in my setup after 20 days.
>>>>>>
>>>>>> Thanks,
>>>>>> Srinivas
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com
>>>>>> <mailto:ceph-users-bounces@lists.ceph.com> ] On Behalf Of Mark
>>>>>> Nelson
>>>>>> Sent: Wednesday, April 22, 2015 7:31 PM
>>>>>> To: Alexandre DERUMIER; Milosz Tanski
>>>>>> Cc: ceph-devel; ceph-users
>>>>>> Subject: Re: [ceph-users] strange benchmark problem : restarting
>>>>>> osd daemon improve performance from 100k iops to 300k iops
>>>>>>
>>>>>> Hi Alexandre,
>>>>>>
>>>>>> We should discuss this at the perf meeting today. We knew NUMA
>>>>>> node affinity issues were going to crop up sooner or later (and
>>>>>> indeed already have in some cases), but this is pretty major. It's
>>>>>> probably time to really dig in and figure out how to deal with this.
>>>>>>
>>>>>> Note: this is one of the reasons I like small nodes with single
>>>>>> sockets and fewer OSDs.
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have done a lot of test today, and it seem indeed numa related.
>>>>>>>
>>>>>>> My numastat was
>>>>>>>
>>>>>>> # numastat
>>>>>>> node0 node1
>>>>>>> numa_hit 99075422 153976877
>>>>>>> numa_miss 167490965 1493663
>>>>>>> numa_foreign 1493663 167491417
>>>>>>> interleave_hit 157745 167015
>>>>>>> local_node 99049179 153830554
>>>>>>> other_node 167517697 1639986
>>>>>>>
>>>>>>> So, a lot of miss.
>>>>>>>
>>>>>>> In this case , I can reproduce ios going from 85k to 300k iops,
>>>>>>> up and down.
>>>>>>>
>>>>>>> now setting
>>>>>>> echo 0 > /proc/sys/kernel/numa_balancing
>>>>>>>
>>>>>>> and starting osd daemons with
>>>>>>>
>>>>>>> numactl --interleave=all /usr/bin/ceph-osd
>>>>>>>
>>>>>>>
>>>>>>> I have a constant 300k iops !
>>>>>>>
>>>>>>>
>>>>>>> I wonder if it could be improve by binding osd daemons to
>>>>>>> specific numa node.
>>>>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also
>>>>>>> require ceph.conf osd threads tunning.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Mail original -----
>>>>>>> De: "Milosz Tanski" < milosz@adfin.com <mailto:milosz@adfin.com>
>>>>>>>>
>>>>>>> À: "aderumier" < aderumier@odiso.com <mailto:aderumier@odiso.com>
>>>>>>>>
>>>>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users"
>>>>>>> < ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23
>>>>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting
>>>>>>> osd daemon improve performance from 100k iops to 300k iops
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER <
>>>>>>> aderumier@odiso.com <mailto:aderumier@odiso.com> > wrote:
>>>>>>>
>>>>>>>
>>>>>>> I wonder if it could be numa related,
>>>>>>>
>>>>>>> I'm using centos 7.1,
>>>>>>> and auto numa balacning is enabled
>>>>>>>
>>>>>>> cat /proc/sys/kernel/numa_balancing = 1
>>>>>>>
>>>>>>> Maybe osd daemon access to buffer on wrong numa node.
>>>>>>>
>>>>>>> I'll try to reproduce the problem
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Can you force the degenerate case using numactl? To either affirm
>>>>>>> or deny your suspicion.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Mail original -----
>>>>>>> De: "aderumier" < aderumier@odiso.com
>>>>>>> <mailto:aderumier@odiso.com> >
>>>>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org
>>>>>>> <mailto:ceph-devel@vger.kernel.org> >, "ceph-users" <
>>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
>>>>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05
>>>>>>> Objet: [ceph-users] strange benchmark problem : restarting osd
>>>>>>> daemon improve performance from 100k iops to 300k iops
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was doing some benchmarks,
>>>>>>> I have found an strange behaviour.
>>>>>>>
>>>>>>> Using fio with rbd engine, I was able to reach around 100k iops.
>>>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>>>
>>>>>>> then after restarting all osd daemons,
>>>>>>>
>>>>>>> the same fio benchmark show now around 300k iops.
>>>>>>> (osd datas in linux buffer, iostat show 0% disk access)
>>>>>>>
>>>>>>>
>>>>>>> any ideas?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> before restarting osd
>>>>>>> ---------------------
>>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>>> ioengine=rbd, iodepth=32 ...
>>>>>>> fio-2.2.7-10-g51e9
>>>>>>> Starting 10 processes
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s]
>>>>>>> [96.6K/0/0 iops] [eta 14m:45s]
>>>>>>> fio: terminating on signal 2
>>>>>>>
>>>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed
>>>>>>> Apr
>>>>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871,
>>>>>>> runt= 26215msec slat (usec): min=5, max=3685, avg=16.89,
>>>>>>> stdev=17.38 clat
>>>>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec):
>>>>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles
>>>>>>> (usec):
>>>>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247],
>>>>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660],
>>>>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024],
>>>>>>> | 95.00th=[14656], 99.00th=[25984], 99.50th=[30336],
>>>>>>> | 99.90th=[38656], 99.95th=[41728], 99.99th=[47360]
>>>>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82,
>>>>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%,
>>>>>>> 100=0.01%, 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) :
>>>>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) :
>>>>>>> 100=0.01% cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths :
>>>>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit :
>>>>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>>>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%,
>>>>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0,
>>>>>>> drop=r=0/w=0/d=0 latency : target=0, window=0,
>>>>>>> percentile=100.00%,
>>>>>>> depth=32
>>>>>>>
>>>>>>> Run status group 0 (all jobs):
>>>>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s,
>>>>>>> maxb=451487KB/s, mint=26215msec, maxt=26215msec
>>>>>>>
>>>>>>> Disk stats (read/write):
>>>>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01%
>>>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>>> ioengine=rbd, iodepth=32
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> AFTER RESTARTING OSDS
>>>>>>> ----------------------
>>>>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd
>>>>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>>>>>>> ioengine=rbd, iodepth=32 ...
>>>>>>> fio-2.2.7-10-g51e9
>>>>>>> Starting 10 processes
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> rbd engine: RBD version: 0.1.9
>>>>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s]
>>>>>>> [296K/0/0 iops] [eta 01h:09m:27s]
>>>>>>> fio: terminating on signal 2
>>>>>>>
>>>>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed
>>>>>>> Apr
>>>>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218,
>>>>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59,
>>>>>>> stdev=40.35 clat
>>>>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec):
>>>>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles
>>>>>>> (usec):
>>>>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450],
>>>>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796],
>>>>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[
>>>>>>> | 1720], 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920],
>>>>>>> | 99.95th=[ 9792], 99.99th=[436224]
>>>>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46,
>>>>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%,
>>>>>>> 100=0.01%, 250=1.23% lat (usec) : 500=25.64%, 750=29.15%,
>>>>>>> 1000=18.84% lat (msec)
>>>>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) :
>>>>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%,
>>>>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%,
>>>>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%,
>>>>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%,
>>>>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued :
>>>>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency :
>>>>>>> target=0, window=0, percentile=100.00%, depth=32
>>>>>>>
>>>>>>> Run status group 0 (all jobs):
>>>>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s,
>>>>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec
>>>>>>>
>>>>>>> Disk stats (read/write):
>>>>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03%
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> CEPH LOG
>>>>>>> --------
>>>>>>>
>>>>>>> before restarting osd
>>>>>>> ----------------------
>>>>>>>
>>>>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster
>>>>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s
>>>>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster
>>>>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s
>>>>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster
>>>>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s
>>>>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster
>>>>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s
>>>>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster
>>>>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s
>>>>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster
>>>>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s
>>>>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster
>>>>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s
>>>>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster
>>>>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s
>>>>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster
>>>>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s
>>>>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster
>>>>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s
>>>>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster
>>>>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s
>>>>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster
>>>>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s
>>>>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster
>>>>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s
>>>>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster
>>>>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s
>>>>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster
>>>>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s
>>>>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster
>>>>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s
>>>>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster
>>>>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s
>>>>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster
>>>>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s
>>>>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster
>>>>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s
>>>>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster
>>>>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s
>>>>>>>
>>>>>>>
>>>>>>> restarting osd
>>>>>>> ---------------
>>>>>>>
>>>>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster
>>>>>>> [INF] osd.0 marked itself down
>>>>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster
>>>>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in
>>>>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster
>>>>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8
>>>>>>> stale+active+remapped, 106 stale+active+clean, 54
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+794
>>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail;
>>>>>>> active+516
>>>>>>> kB/s rd, 130 op/s
>>>>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster
>>>>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in
>>>>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster
>>>>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8
>>>>>>> stale+active+remapped, 106 stale+active+clean, 54
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+794
>>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster
>>>>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8
>>>>>>> stale+active+remapped, 106 stale+active+clean, 54
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+794
>>>>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster
>>>>>>> [INF] osd.1 marked itself down
>>>>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster
>>>>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster
>>>>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13
>>>>>>> stale+active+remapped, 216 stale+active+clean, 49
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+684
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster
>>>>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster
>>>>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13
>>>>>>> stale+active+remapped, 216 stale+active+clean, 49
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+684
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster
>>>>>>> [INF] osd.2 marked itself down
>>>>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster
>>>>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in
>>>>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster
>>>>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22
>>>>>>> stale+active+remapped, 300 stale+active+clean, 40
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+600
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster
>>>>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in
>>>>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster
>>>>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22
>>>>>>> stale+active+remapped, 300 stale+active+clean, 40
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+600
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster
>>>>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck
>>>>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs
>>>>>>> undersized; 3/9 in osds are down; clock skew detected on
>>>>>>> mon.ceph1-2
>>>>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster
>>>>>>> [INF] osd.1 10.7.0.152:6800/14848 boot
>>>>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster
>>>>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster
>>>>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22
>>>>>>> stale+active+remapped, 300 stale+active+clean, 40
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+600
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster
>>>>>>> [INF] osd.3 marked itself down
>>>>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster
>>>>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in
>>>>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster
>>>>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25
>>>>>>> stale+active+remapped, 394 stale+active+clean, 37
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+506
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster
>>>>>>> [INF] osd.6 marked itself down
>>>>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster
>>>>>>> [INF] osd.2 10.7.0.152:6812/15410 boot
>>>>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster
>>>>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in
>>>>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster
>>>>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30
>>>>>>> stale+active+remapped, 502 stale+active+clean, 32
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+398
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster
>>>>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in
>>>>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster
>>>>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30
>>>>>>> stale+active+remapped, 502 stale+active+clean, 32
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+398
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster
>>>>>>> [INF] osd.3 10.7.0.152:6816/15971 boot
>>>>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster
>>>>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster
>>>>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30
>>>>>>> stale+active+remapped, 502 stale+active+clean, 32
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+398
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster
>>>>>>> [INF] osd.7 marked itself down
>>>>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster
>>>>>>> [INF] osd.0 10.7.0.152:6804/14481 boot
>>>>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster
>>>>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster
>>>>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35
>>>>>>> stale+active+remapped, 608 stale+active+clean, 27
>>>>>>> stale+active+active+remapped,
>>>>>>> stale+active+292
>>>>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data,
>>>>>>> active+420 GB
>>>>>>> used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster
>>>>>>> [INF] osd.6 10.7.0.153:6808/21955 boot
>>>>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster
>>>>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in
>>>>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster
>>>>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27
>>>>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28
>>>>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB
>>>>>>> active+data,
>>>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster
>>>>>>> [INF] osd.8 marked itself down
>>>>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster
>>>>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in
>>>>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster
>>>>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44
>>>>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11
>>>>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB
>>>>>>> active+data,
>>>>>>> 420 GB used, 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster
>>>>>>> [INF] osd.7 10.7.0.153:6804/22536 boot
>>>>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster
>>>>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in
>>>>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster
>>>>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31
>>>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>>>> active+undersized+degraded+remapped, 5 peering, 13
>>>>>>> active+undersized+degraded+active+remapped,
>>>>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used,
>>>>>>> 874 GB / 1295 GB avail
>>>>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster
>>>>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in
>>>>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster
>>>>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31
>>>>>>> stale+active+remapped, 503 stale+active+clean, 4
>>>>>>> active+undersized+degraded+remapped, 5 peering, 13 active
>>>>>>>
>>>>>>>
>>>>>>> AFTER OSD RESTART
>>>>>>> ------------------
>>>>>>>
>>>>>>>
>>>>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster
>>>>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s
>>>>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster
>>>>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s
>>>>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster
>>>>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s
>>>>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster
>>>>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s
>>>>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster
>>>>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s
>>>>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster
>>>>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s
>>>>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster
>>>>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s
>>>>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster
>>>>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s
>>>>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster
>>>>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s
>>>>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster
>>>>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s
>>>>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster
>>>>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s
>>>>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster
>>>>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s
>>>>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster
>>>>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s
>>>>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster
>>>>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s
>>>>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster
>>>>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s
>>>>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster
>>>>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s
>>>>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster
>>>>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62
>>>>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874
>>>>>>> active+GB /
>>>>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>> ________________________________
>>>>>>
>>>>>> PLEASE NOTE: The information contained in this electronic mail
>>>>>> message is intended only for the use of the designated
>>>>>> recipient(s) named above. If the reader of this message is not the
>>>>>> intended recipient, you are hereby notified that you have received
>>>>>> this message in error and that any review, dissemination,
>>>>>> distribution, or copying of this message is strictly prohibited.
>>>>>> If you have received this communication in error, please notify
>>>>>> the sender by telephone or e-mail (as shown above) immediately and
>>>>>> destroy any and all copies of this message in your possession
>>>>>> (whether hard copies or electronically stored copies).
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> С уважением, Фасихов Ирек Нургаязович
>>>>> Моб.: +79229045757
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> ceph-devel" in the body of a message to majordomo@vger.kernel.org
>>>>> <mailto:majordomo@vger.kernel.org>
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-04-27 18:01 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-22  8:40 strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops Alexandre DERUMIER
     [not found] ` <1232683114.515741917.1429692005478.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-04-22  9:01   ` Alexandre DERUMIER
     [not found]     ` <990277439.516588145.1429693262093.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-04-22 10:54       ` Milosz Tanski
2015-04-22 13:56         ` [ceph-users] " Alexandre DERUMIER
     [not found]           ` <886679306.527978224.1429710961512.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-04-22 14:01             ` Mark Nelson
     [not found]               ` <5537A9A8.10200-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-04-22 14:09                 ` Alexandre DERUMIER
2015-04-22 14:34                 ` Srinivasula Maram
2015-04-22 16:30                   ` [ceph-users] " Alexandre DERUMIER
     [not found]                     ` <336199846.535234014.1429720226874.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-04-23  8:00                       ` Alexandre DERUMIER
2015-04-23  8:04                         ` 答复: [ceph-users] " 管清政
     [not found]                         ` <1806383776.554938002.1429776034371.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-04-23 10:58                           ` Alexandre DERUMIER
2015-04-23 11:33                             ` [ceph-users] " Mark Nelson
2015-04-23 11:56                               ` Alexandre DERUMIER
2015-04-23 16:24                                 ` Somnath Roy
     [not found]                                   ` <755F6B91B3BE364F9BCA11EA3F9E0C6F2CD806BE-cXZ6iGhjG0il5HHZYNR2WTJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
2015-04-24 11:37                                     ` Irek Fasikhov
     [not found]                                       ` <CAF-rypyB=CjjJQs_+3Q=ELCVwpg9nyWKdBu8gVyuDQa49=GHtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-04-24 16:38                                         ` Alexandre DERUMIER
     [not found]                                           ` <1270073308.621430804.1429893511370.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-04-24 17:36                                             ` Stefan Priebe - Profihost AG
     [not found]                                               ` <BEEF4E3E-92BD-4E41-8E17-5BF9476A3674-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2015-04-24 18:02                                                 ` Mark Nelson
     [not found]                                                   ` <553A8527.3020503-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-04-25  4:45                                                     ` Alexandre DERUMIER
     [not found]                                                       ` <1416266012.633995646.1429937143054.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-04-27  5:01                                                         ` Alexandre DERUMIER
     [not found]                                                           ` <94521284.683547748.1430110881936.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-04-27  5:46                                                             ` Alexandre DERUMIER
2015-04-27  6:12                                                               ` [ceph-users] " Venkateswara Rao Jujjuri
     [not found]                                                                 ` <CAKKTCLV9fVjprDFPs8YO+zrtox8g_1oOASWVxvY0RBwg8gFbRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-04-27  6:31                                                                   ` Alexandre DERUMIER
     [not found]                                                                     ` <195300554.685766176.1430116301190.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-04-27 15:27                                                                       ` Sage Weil
     [not found]                                                                         ` <alpine.DEB.2.00.1504270827080.5458-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-04-27 15:50                                                                           ` Dan van der Ster
     [not found]                                                               ` <1545533798.684536916.1430113581382.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-04-27 14:54                                                                 ` Mark Nelson
     [not found]                                                                   ` <553E4DAA.2070206-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-04-27 15:11                                                                     ` Alexandre DERUMIER
2015-04-27 16:34                                                                       ` [ceph-users] " Mark Nelson
     [not found]                                                                         ` <553E652A.2060607-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-04-27 16:45                                                                           ` Alexandre DERUMIER
     [not found]                                                                             ` <205623974.712193182.1430153140877.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-04-27 17:25                                                                               ` Somnath Roy
     [not found]                                                                                 ` <755F6B91B3BE364F9BCA11EA3F9E0C6F2CD8214C-cXZ6iGhjG0il5HHZYNR2WTJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
2015-04-27 17:41                                                                                   ` Mark Nelson
2015-04-27 18:01                                                                                     ` [ceph-users] " Somnath Roy
2015-04-24 18:38                                                 ` Somnath Roy
2015-04-24 18:41                                               ` Somnath Roy
2015-04-24 17:49                                             ` Milosz Tanski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.