* Slow request in XFS
@ 2013-01-31 11:34 Chen, Xiaoxi
2013-01-31 12:43 ` Sage Weil
0 siblings, 1 reply; 7+ messages in thread
From: Chen, Xiaoxi @ 2013-01-31 11:34 UTC (permalink / raw)
To: ceph-devel
Hi list,
I just rebuild my ceph setup with 6 nodes, (20 sata+ 4 ssd as journal +10GbE ) per node,software stack is Ubuntu 12.10+ Kernel 3.6.3 + xfs+ceph 0.56.2. Before build up ceph cluster , I have checked all my disks can reach 90MB+/s for sequential write and 100MB+/s for sequential read, and the BWs are stable.
Per nhm's suggestion, before I do any test on top of RBD, I run the rados bench gen got the following strange result, it has BW at first but then some of the requests hang there for a lot of time, and from osd you can see slow requests.The cluster is idle except for rados bench.
Any inputs are highly appreciate.Thanks a lot ~
root@ceph-1:~# rados bench -p data 30 write
Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds.
Object prefix: benchmark_data_ceph-1_39426
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 57 41 163.951 164 0.071567 0.0978894
2 16 57 41 81.9807 0 - 0.0978894
3 16 57 41 54.6555 0 - 0.0978894
4 16 57 41 40.9921 0 - 0.0978894
5 16 57 41 32.794 0 - 0.0978894
6 16 57 41 27.3285 0 - 0.0978894
7 16 57 41 23.4245 0 - 0.0978894
8 16 57 41 20.4965 0 - 0.0978894
9 16 57 41 18.2192 0 - 0.0978894
10 16 57 41 16.3973 0 - 0.0978894
11 16 57 41 14.9067 0 - 0.0978894
12 16 57 41 13.6645 0 - 0.0978894
13 16 57 41 12.6134 0 - 0.0978894
14 16 57 41 11.7124 0 - 0.0978894
15 16 57 41 10.9316 0 - 0.0978894
16 16 57 41 10.2484 0 - 0.0978894
17 16 57 41 9.64559 0 - 0.0978894
18 16 57 41 9.10974 0 - 0.0978894
19 16 57 41 8.63028 0 - 0.0978894
2013-01-31 19:25:55.282796min lat: 0.070066 max lat: 0.150938 avg lat: 0.0978894
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 16 57 41 8.19878 0 - 0.0978894
21 16 57 41 7.80834 0 - 0.0978894
22 16 57 41 7.45342 0 - 0.0978894
23 16 57 41 7.12936 0 - 0.0978894
24 16 57 41 6.83232 0 - 0.0978894
25 16 57 41 6.55903 0 - 0.0978894
26 16 57 41 6.30676 0 - 0.0978894
27 16 57 41 6.07318 0 - 0.0978894
28 16 57 41 5.85628 0 - 0.0978894
29 16 57 41 5.65435 0 - 0.0978894
30 16 57 41 5.46587 0 - 0.0978894
31 16 57 41 5.28955 0 - 0.0978894
32 16 57 41 5.12426 0 - 0.0978894
33 16 57 41 4.96898 0 - 0.0978894
34 16 57 41 4.82283 0 - 0.0978894
35 16 57 41 4.68504 0 - 0.0978894
36 16 57 41 4.5549 0 - 0.0978894
37 16 57 41 4.4318 0 - 0.0978894
38 16 57 41 4.31517 0 - 0.0978894
39 16 57 41 4.20452 0 - 0.0978894
2013-01-31 19:26:15.285538min lat: 0.070066 max lat: 0.150938 avg lat: 0.0978894
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
40 16 57 41 4.09941 0 - 0.0978894
41 16 57 41 3.99942 0 - 0.0978894
42 16 57 41 3.9042 0 - 0.0978894
43 16 57 41 3.81341 0 - 0.0978894
44 16 57 41 3.72674 0 - 0.0978894
45 16 57 41 3.64392 0 - 0.0978894
46 16 57 41 3.56471 0 - 0.0978894
47 16 57 41 3.48886 0 - 0.0978894
48 16 57 41 3.41618 0 - 0.0978894
49 16 57 41 3.34646 0 - 0.0978894
50 15 58 43 3.43951 0.163265 0.07674 1.22647
51 15 58 43 3.37207 0 - 1.22647
52 15 58 43 3.30722 0 - 1.22647
53 15 58 43 3.24482 0 - 1.22647
54 15 58 43 3.18473 0 - 1.22647
55 15 58 43 3.12683 0 - 1.22647
56 15 58 43 3.071 0 - 1.22647
57 15 58 43 3.01712 0 - 1.22647
58 15 58 43 2.9651 0 - 1.22647
59 15 58 43 2.91484 0 - 1.22647
2013-01-31 19:26:35.288266min lat: 0.070066 max lat: 48.648 avg lat: 1.22647
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
60 15 58 43 2.86626 0 - 1.22647
61 15 58 43 2.81927 0 - 1.22647
62 15 58 43 2.7738 0 - 1.22647
63 15 58 43 2.72977 0 - 1.22647
64 15 58 43 2.68712 0 - 1.22647
65 15 58 43 2.64578 0 - 1.22647
66 15 58 43 2.60569 0 - 1.22647
67 15 58 43 2.5668 0 - 1.22647
68 15 58 43 2.52905 0 - 1.22647
69 15 58 43 2.4924 0 - 1.22647
70 15 58 43 2.45679 0 - 1.22647
71 15 58 43 2.42219 0 - 1.22647
72 15 58 43 2.38855 0 - 1.22647
73 15 58 43 2.35583 0 - 1.22647
74 15 58 43 2.32399 0 - 1.22647
75 13 58 45 2.39966 0.32 73.9839 4.45992
76 13 58 45 2.36809 0 - 4.45992
77 13 58 45 2.33733 0 - 4.45992
78 13 58 45 2.30737 0 - 4.45992
79 13 58 45 2.27816 0 - 4.45992
2013-01-31 19:26:55.291112min lat: 0.070066 max lat: 73.9839 avg lat: 4.45992
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
80 13 58 45 2.24968 0 - 4.45992
81 13 58 45 2.22191 0 - 4.45992
82 13 58 45 2.19481 0 - 4.45992
83 13 58 45 2.16837 0 - 4.45992
84 13 58 45 2.14255 0 - 4.45992
85 13 58 45 2.11735 0 - 4.45992
86 13 58 45 2.09273 0 - 4.45992
87 13 58 45 2.06867 0 - 4.45992
88 13 58 45 2.04517 0 - 4.45992
89 13 58 45 2.02219 0 - 4.45992
90 13 58 45 1.99972 0 - 4.45992
91 13 58 45 1.97774 0 - 4.45992
92 13 58 45 1.95625 0 - 4.45992
93 13 58 45 1.93521 0 - 4.45992
94 13 58 45 1.91462 0 - 4.45992
95 13 58 45 1.89447 0 - 4.45992
96 13 58 45 1.87474 0 - 4.45992
97 13 58 45 1.85541 0 - 4.45992
98 13 58 45 1.83647 0 - 4.45992
99 13 58 45 1.81792 0 - 4.45992
2013-01-31 19:27:15.293969min lat: 0.070066 max lat: 73.9839 avg lat: 4.45992
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
100 13 58 45 1.79975 0 - 4.45992
101 13 58 45 1.78193 0 - 4.45992
102 13 58 45 1.76446 0 - 4.45992
103 13 58 45 1.74732 0 - 4.45992
104 13 58 45 1.73052 0 - 4.45992
105 13 58 45 1.71404 0 - 4.45992
106 13 58 45 1.69787 0 - 4.45992
107 13 58 45 1.682 0 - 4.45992
108 13 58 45 1.66643 0 - 4.45992
109 13 58 45 1.65114 0 - 4.45992
110 13 58 45 1.63613 0 - 4.45992
111 13 58 45 1.62139 0 - 4.45992
112 13 58 45 1.60691 0 - 4.45992
113 13 58 45 1.59269 0 - 4.45992
114 13 58 45 1.57872 0 - 4.45992
115 13 58 45 1.565 0 - 4.45992
116 13 58 45 1.5515 0 - 4.45992
117 13 58 45 1.53824 0 - 4.45992
118 13 58 45 1.52521 0 - 4.45992
119 13 58 45 1.51239 0 - 4.45992
2013-01-31 19:27:35.296900min lat: 0.070066 max lat: 73.9839 avg lat: 4.45992
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
120 13 58 45 1.49979 0 - 4.45992
121 13 58 45 1.48739 0 - 4.45992
122 13 58 45 1.4752 0 - 4.45992
123 13 58 45 1.46321 0 - 4.45992
124 13 58 45 1.4514 0 - 4.45992
125 13 58 45 1.43979 0 - 4.45992
126 13 58 45 1.42837 0 - 4.45992
127 13 58 45 1.41712 0 - 4.45992
128 13 58 45 1.40605 0 - 4.45992
129 13 58 45 1.39515 0 - 4.45992
130 13 58 45 1.38442 0 - 4.45992
131 13 58 45 1.37385 0 - 4.45992
132 13 58 45 1.36344 0 - 4.45992
133 13 58 45 1.35319 0 - 4.45992
134 13 58 45 1.34309 0 - 4.45992
135 13 58 45 1.33314 0 - 4.45992
136 13 58 45 1.32334 0 - 4.45992
137 13 58 45 1.31368 0 - 4.45992
138 13 58 45 1.30416 0 - 4.45992
139 13 58 45 1.29478 0 - 4.45992
2013-01-31 19:27:55.299876min lat: 0.070066 max lat: 73.9839 avg lat: 4.45992
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
140 13 58 45 1.28553 0 - 4.45992
141 13 58 45 1.27641 0 - 4.45992
142 13 58 45 1.26742 0 - 4.45992
143 13 58 45 1.25856 0 - 4.45992
144 13 58 45 1.24982 0 - 4.45992
145 13 58 45 1.2412 0 - 4.45992
146 13 58 45 1.2327 0 - 4.45992
147 13 58 45 1.22431 0 - 4.45992
148 13 58 45 1.21604 0 - 4.45992
149 13 58 45 1.20788 0 - 4.45992
150 13 58 45 1.19983 0 - 4.45992
151 13 58 45 1.19188 0 - 4.45992
152 13 58 45 1.18404 0 - 4.45992
153 13 58 45 1.1763 0 - 4.45992
154 13 58 45 1.16866 0 - 4.45992
155 13 58 45 1.16112 0 - 4.45992
156 13 58 45 1.15368 0 - 4.45992
157 13 58 45 1.14633 0 - 4.45992
158 13 58 45 1.13908 0 - 4.45992
159 13 58 45 1.13191 0 - 4.45992
2013-01-31 19:28:15.302677min lat: 0.070066 max lat: 73.9839 avg lat: 4.45992
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
160 13 58 45 1.12484 0 - 4.45992
161 13 58 45 1.11785 0 - 4.45992
162 13 58 45 1.11095 0 - 4.45992
163 13 58 45 1.10414 0 - 4.45992
164 13 58 45 1.0974 0 - 4.45992
165 13 58 45 1.09075 0 - 4.45992
166 13 58 45 1.08418 0 - 4.45992
167 13 58 45 1.07769 0 - 4.45992
168 13 58 45 1.07128 0 - 4.45992
169 13 58 45 1.06494 0 - 4.45992
170 13 58 45 1.05867 0 - 4.45992
171 3 58 55 1.28637 0.416667 170.042 34.5177
172 3 58 55 1.27889 0 - 34.5177
173 3 58 55 1.27149 0 - 34.5177
174 3 58 55 1.26419 0 - 34.5177
175 3 58 55 1.25696 0 - 34.5177
176 3 58 55 1.24982 0 - 34.5177
177 3 58 55 1.24276 0 - 34.5177
178 3 58 55 1.23578 0 - 34.5177
179 3 58 55 1.22887 0 - 34.5177
2013-01-31 19:28:35.305621min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
180 3 58 55 1.22205 0 - 34.5177
181 3 58 55 1.2153 0 - 34.5177
182 3 58 55 1.20862 0 - 34.5177
183 3 58 55 1.20201 0 - 34.5177
184 3 58 55 1.19548 0 - 34.5177
185 3 58 55 1.18902 0 - 34.5177
186 3 58 55 1.18263 0 - 34.5177
187 3 58 55 1.1763 0 - 34.5177
188 3 58 55 1.17004 0 - 34.5177
189 3 58 55 1.16385 0 - 34.5177
190 3 58 55 1.15773 0 - 34.5177
191 3 58 55 1.15167 0 - 34.5177
192 3 58 55 1.14567 0 - 34.5177
193 3 58 55 1.13973 0 - 34.5177
194 3 58 55 1.13386 0 - 34.5177
195 3 58 55 1.12804 0 - 34.5177
196 3 58 55 1.12229 0 - 34.5177
197 3 58 55 1.11659 0 - 34.5177
198 3 58 55 1.11095 0 - 34.5177
199 3 58 55 1.10537 0 - 34.5177
2013-01-31 19:28:55.308524min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
200 3 58 55 1.09984 0 - 34.5177
201 3 58 55 1.09437 0 - 34.5177
202 3 58 55 1.08895 0 - 34.5177
203 3 58 55 1.08359 0 - 34.5177
204 3 58 55 1.07828 0 - 34.5177
205 3 58 55 1.07302 0 - 34.5177
206 3 58 55 1.06781 0 - 34.5177
207 3 58 55 1.06265 0 - 34.5177
208 3 58 55 1.05754 0 - 34.5177
209 3 58 55 1.05248 0 - 34.5177
210 3 58 55 1.04747 0 - 34.5177
211 3 58 55 1.0425 0 - 34.5177
212 3 58 55 1.03759 0 - 34.5177
213 3 58 55 1.03272 0 - 34.5177
214 3 58 55 1.02789 0 - 34.5177
215 3 58 55 1.02311 0 - 34.5177
216 3 58 55 1.01837 0 - 34.5177
217 3 58 55 1.01368 0 - 34.5177
218 3 58 55 1.00903 0 - 34.5177
219 3 58 55 1.00442 0 - 34.5177
2013-01-31 19:29:15.311388min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
220 3 58 55 0.999856 0 - 34.5177
221 3 58 55 0.995332 0 - 34.5177
222 3 58 55 0.990849 0 - 34.5177
223 3 58 55 0.986405 0 - 34.5177
224 3 58 55 0.982002 0 - 34.5177
225 3 58 55 0.977637 0 - 34.5177
226 3 58 55 0.973312 0 - 34.5177
227 3 58 55 0.969024 0 - 34.5177
228 3 58 55 0.964774 0 - 34.5177
229 3 58 55 0.960561 0 - 34.5177
230 3 58 55 0.956384 0 - 34.5177
231 3 58 55 0.952244 0 - 34.5177
232 3 58 55 0.94814 0 - 34.5177
233 3 58 55 0.944071 0 - 34.5177
234 3 58 55 0.940036 0 - 34.5177
235 3 58 55 0.936036 0 - 34.5177
236 3 58 55 0.93207 0 - 34.5177
237 3 58 55 0.928137 0 - 34.5177
238 3 58 55 0.924237 0 - 34.5177
239 3 58 55 0.92037 0 - 34.5177
2013-01-31 19:29:35.314257min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
240 3 58 55 0.916535 0 - 34.5177
241 3 58 55 0.912732 0 - 34.5177
242 3 58 55 0.90896 0 - 34.5177
243 3 58 55 0.90522 0 - 34.5177
244 3 58 55 0.90151 0 - 34.5177
245 3 58 55 0.89783 0 - 34.5177
246 3 58 55 0.89418 0 - 34.5177
247 3 58 55 0.89056 0 - 34.5177
248 3 58 55 0.886969 0 - 34.5177
249 3 58 55 0.883407 0 - 34.5177
250 3 58 55 0.879874 0 - 34.5177
251 3 58 55 0.876368 0 - 34.5177
252 3 58 55 0.87289 0 - 34.5177
253 3 58 55 0.86944 0 - 34.5177
254 3 58 55 0.866017 0 - 34.5177
255 3 58 55 0.862621 0 - 34.5177
256 3 58 55 0.859252 0 - 34.5177
257 3 58 55 0.855908 0 - 34.5177
258 3 58 55 0.852591 0 - 34.5177
259 3 58 55 0.849299 0 - 34.5177
2013-01-31 19:29:55.317178min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
260 3 58 55 0.846032 0 - 34.5177
261 3 58 55 0.842791 0 - 34.5177
262 3 58 55 0.839574 0 - 34.5177
263 3 58 55 0.836382 0 - 34.5177
264 3 58 55 0.833213 0 - 34.5177
265 3 58 55 0.830069 0 - 34.5177
266 3 58 55 0.826949 0 - 34.5177
267 3 58 55 0.823851 0 - 34.5177
268 3 58 55 0.820777 0 - 34.5177
269 3 58 55 0.817726 0 - 34.5177
270 3 58 55 0.814698 0 - 34.5177
271 3 58 55 0.811691 0 - 34.5177
272 3 58 55 0.808707 0 - 34.5177
273 3 58 55 0.805745 0 - 34.5177
274 3 58 55 0.802804 0 - 34.5177
275 3 58 55 0.799885 0 - 34.5177
276 3 58 55 0.796987 0 - 34.5177
277 3 58 55 0.79411 0 - 34.5177
278 3 58 55 0.791253 0 - 34.5177
279 3 58 55 0.788417 0 - 34.5177
2013-01-31 19:30:15.320112min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
280 3 58 55 0.785601 0 - 34.5177
281 3 58 55 0.782805 0 - 34.5177
282 3 58 55 0.780029 0 - 34.5177
283 3 58 55 0.777273 0 - 34.5177
284 3 58 55 0.774536 0 - 34.5177
285 3 58 55 0.771819 0 - 34.5177
286 3 58 55 0.76912 0 - 34.5177
287 3 58 55 0.76644 0 - 34.5177
288 3 58 55 0.763779 0 - 34.5177
289 3 58 55 0.761136 0 - 34.5177
290 3 58 55 0.758511 0 - 34.5177
291 3 58 55 0.755905 0 - 34.5177
292 3 58 55 0.753316 0 - 34.5177
293 3 58 55 0.750745 0 - 34.5177
294 3 58 55 0.748191 0 - 34.5177
295 3 58 55 0.745655 0 - 34.5177
296 3 58 55 0.743136 0 - 34.5177
297 3 58 55 0.740634 0 - 34.5177
298 3 58 55 0.738149 0 - 34.5177
Total time run: 298.962451
Total writes made: 58
Write size: 4194304
Bandwidth (MB/sec): 0.776
Stddev Bandwidth: 9.48424
Max bandwidth (MB/sec): 164
Min bandwidth (MB/sec): 0
Average Latency: 48.189
Stddev Latency: 87.3356
Max latency: 298.961
Min latency: 0.070066
Below is the output of ceph -w
root@ceph-2:/data/osd.21/current# ceph -w
health HEALTH_OK
monmap e1: 1 mons at {ceph1=192.101.11.120:6789/0}, election epoch 2, quorum 0 ceph1
osdmap e92: 120 osds: 120 up, 120 in
pgmap v1375: 23232 pgs: 23232 active+clean; 0 bytes data, 8163 MB used, 83787 GB / 83795 GB avail
mdsmap e1: 0/0/1 up
2013-01-31 19:25:07.898527 mon.0 [INF] pgmap v1375: 23232 pgs: 23232 active+clean; 0 bytes data, 8163 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:25:35.931347 mon.0 [INF] pgmap v1376: 23232 pgs: 23232 active+clean; 8192 KB data, 8163 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:25:36.999098 mon.0 [INF] pgmap v1377: 23232 pgs: 23232 active+clean; 57344 KB data, 8175 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:25:38.074841 mon.0 [INF] pgmap v1378: 23232 pgs: 23232 active+clean; 69632 KB data, 8179 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:25:39.201283 mon.0 [INF] pgmap v1379: 23232 pgs: 23232 active+clean; 90112 KB data, 8211 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:25:40.243968 mon.0 [INF] pgmap v1380: 23232 pgs: 23232 active+clean; 132 MB data, 8284 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:25:41.303049 mon.0 [INF] pgmap v1381: 23232 pgs: 23232 active+clean; 164 MB data, 8332 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:25:45.630344 mon.0 [INF] pgmap v1382: 23232 pgs: 23232 active+clean; 164 MB data, 8336 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:01.793346 mon.0 [INF] pgmap v1383: 23232 pgs: 23232 active+clean; 164 MB data, 8340 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:05.293101 osd.25 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.005692 secs
2013-01-31 19:26:05.293110 osd.25 [WRN] slow request 30.005692 seconds old, received at 2013-01-31 19:25:35.287275: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]
2013-01-31 19:26:05.600533 osd.114 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.107983 secs
2013-01-31 19:26:05.600542 osd.114 [WRN] slow request 30.107983 seconds old, received at 2013-01-31 19:25:35.492449: osd_op(client.4240.0:35 benchmark_data_ceph-1_39426_object34 [write 0~4194304] 0.9abc7139) v4 currently waiting for subops from [68]
2013-01-31 19:26:08.949329 mon.0 [INF] pgmap v1384: 23232 pgs: 23232 active+clean; 164 MB data, 8344 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:10.875142 mon.0 [INF] pgmap v1385: 23232 pgs: 23232 active+clean; 164 MB data, 8348 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:06.358717 osd.516 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.857906 secs
2013-01-31 19:26:06.358727 osd.516 [WRN] slow request 30.857906 seconds old, received at 2013-01-31 19:25:35.500724: osd_op(client.4240.0:38 benchmark_data_ceph-1_39426_object37 [write 0~4194304] 0.666b9194) v4 currently waiting for subops from [612]
2013-01-31 19:26:05.415806 osd.518 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.129037 secs
2013-01-31 19:26:05.415824 osd.518 [WRN] slow request 30.129037 seconds old, received at 2013-01-31 19:25:35.286662: osd_op(client.4240.0:7 benchmark_data_ceph-1_39426_object6 [write 0~4194304] 0.baa4121c) v4 currently waiting for subops from [615]
2013-01-31 19:26:06.857152 osd.414 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.890407 secs
2013-01-31 19:26:06.857162 osd.414 [WRN] slow request 30.890407 seconds old, received at 2013-01-31 19:25:35.966641: osd_op(client.4240.0:57 benchmark_data_ceph-1_39426_object56 [write 0~4194304] 0.2b691f11) v4 currently waiting for subops from [620]
2013-01-31 19:26:27.376824 mon.0 [INF] pgmap v1386: 23232 pgs: 23232 active+clean; 168 MB data, 8356 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:28.552513 mon.0 [INF] pgmap v1387: 23232 pgs: 23232 active+clean; 172 MB data, 8360 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:29.578534 mon.0 [INF] pgmap v1388: 23232 pgs: 23232 active+clean; 176 MB data, 8364 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:30.829351 mon.0 [INF] pgmap v1389: 23232 pgs: 23232 active+clean; 176 MB data, 8368 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:33.155823 mon.0 [INF] pgmap v1390: 23232 pgs: 23232 active+clean; 180 MB data, 8368 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:35.298090 osd.25 [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.010747 secs
2013-01-31 19:26:35.298102 osd.25 [WRN] slow request 60.010747 seconds old, received at 2013-01-31 19:25:35.287275: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]
2013-01-31 19:26:35.605458 osd.114 [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.112920 secs
2013-01-31 19:26:35.605469 osd.114 [WRN] slow request 60.112920 seconds old, received at 2013-01-31 19:25:35.492449: osd_op(client.4240.0:35 benchmark_data_ceph-1_39426_object34 [write 0~4194304] 0.9abc7139) v4 currently waiting for subops from [68]
2013-01-31 19:26:38.552453 mon.0 [INF] pgmap v1391: 23232 pgs: 23232 active+clean; 184 MB data, 8368 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:40.524479 mon.0 [INF] pgmap v1392: 23232 pgs: 23232 active+clean; 188 MB data, 8372 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:36.363526 osd.516 [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.862743 secs
2013-01-31 19:26:36.363534 osd.516 [WRN] slow request 60.862743 seconds old, received at 2013-01-31 19:25:35.500724: osd_op(client.4240.0:38 benchmark_data_ceph-1_39426_object37 [write 0~4194304] 0.666b9194) v4 currently waiting for subops from [612]
2013-01-31 19:26:35.420666 osd.518 [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.133929 secs
2013-01-31 19:26:35.420677 osd.518 [WRN] slow request 60.133929 seconds old, received at 2013-01-31 19:25:35.286662: osd_op(client.4240.0:7 benchmark_data_ceph-1_39426_object6 [write 0~4194304] 0.baa4121c) v4 currently waiting for subops from [615]
2013-01-31 19:26:44.813429 mon.0 [INF] pgmap v1393: 23232 pgs: 23232 active+clean; 192 MB data, 8372 MB used, 83787 GB / 83795 GB avail
2013-01-31 19:26:36.861749 osd.414 [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.895043 secs
2013-01-31 19:26:36.861761 osd.414 [WRN] slow request 60.895043 seconds old, received at 2013-01-31 19:25:35.966641: osd_
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Slow request in XFS
2013-01-31 11:34 Slow request in XFS Chen, Xiaoxi
@ 2013-01-31 12:43 ` Sage Weil
2013-01-31 15:15 ` Jim Schutt
0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2013-01-31 12:43 UTC (permalink / raw)
To: Chen, Xiaoxi; +Cc: ceph-devel
Hi-
Can you reproduce this with logs? It looks like there are a few ops that
are hanging for a very long time, but there isn't enough information here
except to point to osds 610, 612, 615, and 68...
thanks!
s
On Thu, 31 Jan 2013, Chen, Xiaoxi wrote:
> Hi list,
> I just rebuild my ceph setup with 6 nodes, (20 sata+ 4 ssd as journal +10GbE ) per node,software stack is Ubuntu 12.10+ Kernel 3.6.3 + xfs+ceph 0.56.2. Before build up ceph cluster , I have checked all my disks can reach 90MB+/s for sequential write and 100MB+/s for sequential read, and the BWs are stable.
> Per nhm's suggestion, before I do any test on top of RBD, I run the rados bench gen got the following strange result, it has BW at first but then some of the requests hang there for a lot of time, and from osd you can see slow requests.The cluster is idle except for rados bench.
> Any inputs are highly appreciate.Thanks a lot ~
>
> root@ceph-1:~# rados bench -p data 30 write
> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds.
> Object prefix: benchmark_data_ceph-1_39426
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 16 57 41 163.951 164 0.071567 0.0978894
> 2 16 57 41 81.9807 0 - 0.0978894
> 3 16 57 41 54.6555 0 - 0.0978894
> 4 16 57 41 40.9921 0 - 0.0978894
> 5 16 57 41 32.794 0 - 0.0978894
> 6 16 57 41 27.3285 0 - 0.0978894
> 7 16 57 41 23.4245 0 - 0.0978894
> 8 16 57 41 20.4965 0 - 0.0978894
> 9 16 57 41 18.2192 0 - 0.0978894
> 10 16 57 41 16.3973 0 - 0.0978894
> 11 16 57 41 14.9067 0 - 0.0978894
> 12 16 57 41 13.6645 0 - 0.0978894
> 13 16 57 41 12.6134 0 - 0.0978894
> 14 16 57 41 11.7124 0 - 0.0978894
> 15 16 57 41 10.9316 0 - 0.0978894
> 16 16 57 41 10.2484 0 - 0.0978894
> 17 16 57 41 9.64559 0 - 0.0978894
> 18 16 57 41 9.10974 0 - 0.0978894
> 19 16 57 41 8.63028 0 - 0.0978894
> 2013-01-31 19:25:55.282796min lat: 0.070066 max lat: 0.150938 avg lat: 0.0978894
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 20 16 57 41 8.19878 0 - 0.0978894
> 21 16 57 41 7.80834 0 - 0.0978894
> 22 16 57 41 7.45342 0 - 0.0978894
> 23 16 57 41 7.12936 0 - 0.0978894
> 24 16 57 41 6.83232 0 - 0.0978894
> 25 16 57 41 6.55903 0 - 0.0978894
> 26 16 57 41 6.30676 0 - 0.0978894
> 27 16 57 41 6.07318 0 - 0.0978894
> 28 16 57 41 5.85628 0 - 0.0978894
> 29 16 57 41 5.65435 0 - 0.0978894
> 30 16 57 41 5.46587 0 - 0.0978894
> 31 16 57 41 5.28955 0 - 0.0978894
> 32 16 57 41 5.12426 0 - 0.0978894
> 33 16 57 41 4.96898 0 - 0.0978894
> 34 16 57 41 4.82283 0 - 0.0978894
> 35 16 57 41 4.68504 0 - 0.0978894
> 36 16 57 41 4.5549 0 - 0.0978894
> 37 16 57 41 4.4318 0 - 0.0978894
> 38 16 57 41 4.31517 0 - 0.0978894
> 39 16 57 41 4.20452 0 - 0.0978894
> 2013-01-31 19:26:15.285538min lat: 0.070066 max lat: 0.150938 avg lat: 0.0978894
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 40 16 57 41 4.09941 0 - 0.0978894
> 41 16 57 41 3.99942 0 - 0.0978894
> 42 16 57 41 3.9042 0 - 0.0978894
> 43 16 57 41 3.81341 0 - 0.0978894
> 44 16 57 41 3.72674 0 - 0.0978894
> 45 16 57 41 3.64392 0 - 0.0978894
> 46 16 57 41 3.56471 0 - 0.0978894
> 47 16 57 41 3.48886 0 - 0.0978894
> 48 16 57 41 3.41618 0 - 0.0978894
> 49 16 57 41 3.34646 0 - 0.0978894
> 50 15 58 43 3.43951 0.163265 0.07674 1.22647
> 51 15 58 43 3.37207 0 - 1.22647
> 52 15 58 43 3.30722 0 - 1.22647
> 53 15 58 43 3.24482 0 - 1.22647
> 54 15 58 43 3.18473 0 - 1.22647
> 55 15 58 43 3.12683 0 - 1.22647
> 56 15 58 43 3.071 0 - 1.22647
> 57 15 58 43 3.01712 0 - 1.22647
> 58 15 58 43 2.9651 0 - 1.22647
> 59 15 58 43 2.91484 0 - 1.22647
> 2013-01-31 19:26:35.288266min lat: 0.070066 max lat: 48.648 avg lat: 1.22647
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 60 15 58 43 2.86626 0 - 1.22647
> 61 15 58 43 2.81927 0 - 1.22647
> 62 15 58 43 2.7738 0 - 1.22647
> 63 15 58 43 2.72977 0 - 1.22647
> 64 15 58 43 2.68712 0 - 1.22647
> 65 15 58 43 2.64578 0 - 1.22647
> 66 15 58 43 2.60569 0 - 1.22647
> 67 15 58 43 2.5668 0 - 1.22647
> 68 15 58 43 2.52905 0 - 1.22647
> 69 15 58 43 2.4924 0 - 1.22647
> 70 15 58 43 2.45679 0 - 1.22647
> 71 15 58 43 2.42219 0 - 1.22647
> 72 15 58 43 2.38855 0 - 1.22647
> 73 15 58 43 2.35583 0 - 1.22647
> 74 15 58 43 2.32399 0 - 1.22647
> 75 13 58 45 2.39966 0.32 73.9839 4.45992
> 76 13 58 45 2.36809 0 - 4.45992
> 77 13 58 45 2.33733 0 - 4.45992
> 78 13 58 45 2.30737 0 - 4.45992
> 79 13 58 45 2.27816 0 - 4.45992
> 2013-01-31 19:26:55.291112min lat: 0.070066 max lat: 73.9839 avg lat: 4.45992
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 80 13 58 45 2.24968 0 - 4.45992
> 81 13 58 45 2.22191 0 - 4.45992
> 82 13 58 45 2.19481 0 - 4.45992
> 83 13 58 45 2.16837 0 - 4.45992
> 84 13 58 45 2.14255 0 - 4.45992
> 85 13 58 45 2.11735 0 - 4.45992
> 86 13 58 45 2.09273 0 - 4.45992
> 87 13 58 45 2.06867 0 - 4.45992
> 88 13 58 45 2.04517 0 - 4.45992
> 89 13 58 45 2.02219 0 - 4.45992
> 90 13 58 45 1.99972 0 - 4.45992
> 91 13 58 45 1.97774 0 - 4.45992
> 92 13 58 45 1.95625 0 - 4.45992
> 93 13 58 45 1.93521 0 - 4.45992
> 94 13 58 45 1.91462 0 - 4.45992
> 95 13 58 45 1.89447 0 - 4.45992
> 96 13 58 45 1.87474 0 - 4.45992
> 97 13 58 45 1.85541 0 - 4.45992
> 98 13 58 45 1.83647 0 - 4.45992
> 99 13 58 45 1.81792 0 - 4.45992
> 2013-01-31 19:27:15.293969min lat: 0.070066 max lat: 73.9839 avg lat: 4.45992
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 100 13 58 45 1.79975 0 - 4.45992
> 101 13 58 45 1.78193 0 - 4.45992
> 102 13 58 45 1.76446 0 - 4.45992
> 103 13 58 45 1.74732 0 - 4.45992
> 104 13 58 45 1.73052 0 - 4.45992
> 105 13 58 45 1.71404 0 - 4.45992
> 106 13 58 45 1.69787 0 - 4.45992
> 107 13 58 45 1.682 0 - 4.45992
> 108 13 58 45 1.66643 0 - 4.45992
> 109 13 58 45 1.65114 0 - 4.45992
> 110 13 58 45 1.63613 0 - 4.45992
> 111 13 58 45 1.62139 0 - 4.45992
> 112 13 58 45 1.60691 0 - 4.45992
> 113 13 58 45 1.59269 0 - 4.45992
> 114 13 58 45 1.57872 0 - 4.45992
> 115 13 58 45 1.565 0 - 4.45992
> 116 13 58 45 1.5515 0 - 4.45992
> 117 13 58 45 1.53824 0 - 4.45992
> 118 13 58 45 1.52521 0 - 4.45992
> 119 13 58 45 1.51239 0 - 4.45992
> 2013-01-31 19:27:35.296900min lat: 0.070066 max lat: 73.9839 avg lat: 4.45992
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 120 13 58 45 1.49979 0 - 4.45992
> 121 13 58 45 1.48739 0 - 4.45992
> 122 13 58 45 1.4752 0 - 4.45992
> 123 13 58 45 1.46321 0 - 4.45992
> 124 13 58 45 1.4514 0 - 4.45992
> 125 13 58 45 1.43979 0 - 4.45992
> 126 13 58 45 1.42837 0 - 4.45992
> 127 13 58 45 1.41712 0 - 4.45992
> 128 13 58 45 1.40605 0 - 4.45992
> 129 13 58 45 1.39515 0 - 4.45992
> 130 13 58 45 1.38442 0 - 4.45992
> 131 13 58 45 1.37385 0 - 4.45992
> 132 13 58 45 1.36344 0 - 4.45992
> 133 13 58 45 1.35319 0 - 4.45992
> 134 13 58 45 1.34309 0 - 4.45992
> 135 13 58 45 1.33314 0 - 4.45992
> 136 13 58 45 1.32334 0 - 4.45992
> 137 13 58 45 1.31368 0 - 4.45992
> 138 13 58 45 1.30416 0 - 4.45992
> 139 13 58 45 1.29478 0 - 4.45992
> 2013-01-31 19:27:55.299876min lat: 0.070066 max lat: 73.9839 avg lat: 4.45992
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 140 13 58 45 1.28553 0 - 4.45992
> 141 13 58 45 1.27641 0 - 4.45992
> 142 13 58 45 1.26742 0 - 4.45992
> 143 13 58 45 1.25856 0 - 4.45992
> 144 13 58 45 1.24982 0 - 4.45992
> 145 13 58 45 1.2412 0 - 4.45992
> 146 13 58 45 1.2327 0 - 4.45992
> 147 13 58 45 1.22431 0 - 4.45992
> 148 13 58 45 1.21604 0 - 4.45992
> 149 13 58 45 1.20788 0 - 4.45992
> 150 13 58 45 1.19983 0 - 4.45992
> 151 13 58 45 1.19188 0 - 4.45992
> 152 13 58 45 1.18404 0 - 4.45992
> 153 13 58 45 1.1763 0 - 4.45992
> 154 13 58 45 1.16866 0 - 4.45992
> 155 13 58 45 1.16112 0 - 4.45992
> 156 13 58 45 1.15368 0 - 4.45992
> 157 13 58 45 1.14633 0 - 4.45992
> 158 13 58 45 1.13908 0 - 4.45992
> 159 13 58 45 1.13191 0 - 4.45992
> 2013-01-31 19:28:15.302677min lat: 0.070066 max lat: 73.9839 avg lat: 4.45992
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 160 13 58 45 1.12484 0 - 4.45992
> 161 13 58 45 1.11785 0 - 4.45992
> 162 13 58 45 1.11095 0 - 4.45992
> 163 13 58 45 1.10414 0 - 4.45992
> 164 13 58 45 1.0974 0 - 4.45992
> 165 13 58 45 1.09075 0 - 4.45992
> 166 13 58 45 1.08418 0 - 4.45992
> 167 13 58 45 1.07769 0 - 4.45992
> 168 13 58 45 1.07128 0 - 4.45992
> 169 13 58 45 1.06494 0 - 4.45992
> 170 13 58 45 1.05867 0 - 4.45992
> 171 3 58 55 1.28637 0.416667 170.042 34.5177
> 172 3 58 55 1.27889 0 - 34.5177
> 173 3 58 55 1.27149 0 - 34.5177
> 174 3 58 55 1.26419 0 - 34.5177
> 175 3 58 55 1.25696 0 - 34.5177
> 176 3 58 55 1.24982 0 - 34.5177
> 177 3 58 55 1.24276 0 - 34.5177
> 178 3 58 55 1.23578 0 - 34.5177
> 179 3 58 55 1.22887 0 - 34.5177
> 2013-01-31 19:28:35.305621min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 180 3 58 55 1.22205 0 - 34.5177
> 181 3 58 55 1.2153 0 - 34.5177
> 182 3 58 55 1.20862 0 - 34.5177
> 183 3 58 55 1.20201 0 - 34.5177
> 184 3 58 55 1.19548 0 - 34.5177
> 185 3 58 55 1.18902 0 - 34.5177
> 186 3 58 55 1.18263 0 - 34.5177
> 187 3 58 55 1.1763 0 - 34.5177
> 188 3 58 55 1.17004 0 - 34.5177
> 189 3 58 55 1.16385 0 - 34.5177
> 190 3 58 55 1.15773 0 - 34.5177
> 191 3 58 55 1.15167 0 - 34.5177
> 192 3 58 55 1.14567 0 - 34.5177
> 193 3 58 55 1.13973 0 - 34.5177
> 194 3 58 55 1.13386 0 - 34.5177
> 195 3 58 55 1.12804 0 - 34.5177
> 196 3 58 55 1.12229 0 - 34.5177
> 197 3 58 55 1.11659 0 - 34.5177
> 198 3 58 55 1.11095 0 - 34.5177
> 199 3 58 55 1.10537 0 - 34.5177
> 2013-01-31 19:28:55.308524min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 200 3 58 55 1.09984 0 - 34.5177
> 201 3 58 55 1.09437 0 - 34.5177
> 202 3 58 55 1.08895 0 - 34.5177
> 203 3 58 55 1.08359 0 - 34.5177
> 204 3 58 55 1.07828 0 - 34.5177
> 205 3 58 55 1.07302 0 - 34.5177
> 206 3 58 55 1.06781 0 - 34.5177
> 207 3 58 55 1.06265 0 - 34.5177
> 208 3 58 55 1.05754 0 - 34.5177
> 209 3 58 55 1.05248 0 - 34.5177
> 210 3 58 55 1.04747 0 - 34.5177
> 211 3 58 55 1.0425 0 - 34.5177
> 212 3 58 55 1.03759 0 - 34.5177
> 213 3 58 55 1.03272 0 - 34.5177
> 214 3 58 55 1.02789 0 - 34.5177
> 215 3 58 55 1.02311 0 - 34.5177
> 216 3 58 55 1.01837 0 - 34.5177
> 217 3 58 55 1.01368 0 - 34.5177
> 218 3 58 55 1.00903 0 - 34.5177
> 219 3 58 55 1.00442 0 - 34.5177
> 2013-01-31 19:29:15.311388min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 220 3 58 55 0.999856 0 - 34.5177
> 221 3 58 55 0.995332 0 - 34.5177
> 222 3 58 55 0.990849 0 - 34.5177
> 223 3 58 55 0.986405 0 - 34.5177
> 224 3 58 55 0.982002 0 - 34.5177
> 225 3 58 55 0.977637 0 - 34.5177
> 226 3 58 55 0.973312 0 - 34.5177
> 227 3 58 55 0.969024 0 - 34.5177
> 228 3 58 55 0.964774 0 - 34.5177
> 229 3 58 55 0.960561 0 - 34.5177
> 230 3 58 55 0.956384 0 - 34.5177
> 231 3 58 55 0.952244 0 - 34.5177
> 232 3 58 55 0.94814 0 - 34.5177
> 233 3 58 55 0.944071 0 - 34.5177
> 234 3 58 55 0.940036 0 - 34.5177
> 235 3 58 55 0.936036 0 - 34.5177
> 236 3 58 55 0.93207 0 - 34.5177
> 237 3 58 55 0.928137 0 - 34.5177
> 238 3 58 55 0.924237 0 - 34.5177
> 239 3 58 55 0.92037 0 - 34.5177
> 2013-01-31 19:29:35.314257min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 240 3 58 55 0.916535 0 - 34.5177
> 241 3 58 55 0.912732 0 - 34.5177
> 242 3 58 55 0.90896 0 - 34.5177
> 243 3 58 55 0.90522 0 - 34.5177
> 244 3 58 55 0.90151 0 - 34.5177
> 245 3 58 55 0.89783 0 - 34.5177
> 246 3 58 55 0.89418 0 - 34.5177
> 247 3 58 55 0.89056 0 - 34.5177
> 248 3 58 55 0.886969 0 - 34.5177
> 249 3 58 55 0.883407 0 - 34.5177
> 250 3 58 55 0.879874 0 - 34.5177
> 251 3 58 55 0.876368 0 - 34.5177
> 252 3 58 55 0.87289 0 - 34.5177
> 253 3 58 55 0.86944 0 - 34.5177
> 254 3 58 55 0.866017 0 - 34.5177
> 255 3 58 55 0.862621 0 - 34.5177
> 256 3 58 55 0.859252 0 - 34.5177
> 257 3 58 55 0.855908 0 - 34.5177
> 258 3 58 55 0.852591 0 - 34.5177
> 259 3 58 55 0.849299 0 - 34.5177
> 2013-01-31 19:29:55.317178min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 260 3 58 55 0.846032 0 - 34.5177
> 261 3 58 55 0.842791 0 - 34.5177
> 262 3 58 55 0.839574 0 - 34.5177
> 263 3 58 55 0.836382 0 - 34.5177
> 264 3 58 55 0.833213 0 - 34.5177
> 265 3 58 55 0.830069 0 - 34.5177
> 266 3 58 55 0.826949 0 - 34.5177
> 267 3 58 55 0.823851 0 - 34.5177
> 268 3 58 55 0.820777 0 - 34.5177
> 269 3 58 55 0.817726 0 - 34.5177
> 270 3 58 55 0.814698 0 - 34.5177
> 271 3 58 55 0.811691 0 - 34.5177
> 272 3 58 55 0.808707 0 - 34.5177
> 273 3 58 55 0.805745 0 - 34.5177
> 274 3 58 55 0.802804 0 - 34.5177
> 275 3 58 55 0.799885 0 - 34.5177
> 276 3 58 55 0.796987 0 - 34.5177
> 277 3 58 55 0.79411 0 - 34.5177
> 278 3 58 55 0.791253 0 - 34.5177
> 279 3 58 55 0.788417 0 - 34.5177
> 2013-01-31 19:30:15.320112min lat: 0.070066 max lat: 170.042 avg lat: 34.5177
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 280 3 58 55 0.785601 0 - 34.5177
> 281 3 58 55 0.782805 0 - 34.5177
> 282 3 58 55 0.780029 0 - 34.5177
> 283 3 58 55 0.777273 0 - 34.5177
> 284 3 58 55 0.774536 0 - 34.5177
> 285 3 58 55 0.771819 0 - 34.5177
> 286 3 58 55 0.76912 0 - 34.5177
> 287 3 58 55 0.76644 0 - 34.5177
> 288 3 58 55 0.763779 0 - 34.5177
> 289 3 58 55 0.761136 0 - 34.5177
> 290 3 58 55 0.758511 0 - 34.5177
> 291 3 58 55 0.755905 0 - 34.5177
> 292 3 58 55 0.753316 0 - 34.5177
> 293 3 58 55 0.750745 0 - 34.5177
> 294 3 58 55 0.748191 0 - 34.5177
> 295 3 58 55 0.745655 0 - 34.5177
> 296 3 58 55 0.743136 0 - 34.5177
> 297 3 58 55 0.740634 0 - 34.5177
> 298 3 58 55 0.738149 0 - 34.5177
> Total time run: 298.962451
> Total writes made: 58
> Write size: 4194304
> Bandwidth (MB/sec): 0.776
>
> Stddev Bandwidth: 9.48424
> Max bandwidth (MB/sec): 164
> Min bandwidth (MB/sec): 0
> Average Latency: 48.189
> Stddev Latency: 87.3356
> Max latency: 298.961
> Min latency: 0.070066
>
>
>
>
>
>
> Below is the output of ceph -w
> root@ceph-2:/data/osd.21/current# ceph -w
> health HEALTH_OK
> monmap e1: 1 mons at {ceph1=192.101.11.120:6789/0}, election epoch 2, quorum 0 ceph1
> osdmap e92: 120 osds: 120 up, 120 in
> pgmap v1375: 23232 pgs: 23232 active+clean; 0 bytes data, 8163 MB used, 83787 GB / 83795 GB avail
> mdsmap e1: 0/0/1 up
>
> 2013-01-31 19:25:07.898527 mon.0 [INF] pgmap v1375: 23232 pgs: 23232 active+clean; 0 bytes data, 8163 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:25:35.931347 mon.0 [INF] pgmap v1376: 23232 pgs: 23232 active+clean; 8192 KB data, 8163 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:25:36.999098 mon.0 [INF] pgmap v1377: 23232 pgs: 23232 active+clean; 57344 KB data, 8175 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:25:38.074841 mon.0 [INF] pgmap v1378: 23232 pgs: 23232 active+clean; 69632 KB data, 8179 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:25:39.201283 mon.0 [INF] pgmap v1379: 23232 pgs: 23232 active+clean; 90112 KB data, 8211 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:25:40.243968 mon.0 [INF] pgmap v1380: 23232 pgs: 23232 active+clean; 132 MB data, 8284 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:25:41.303049 mon.0 [INF] pgmap v1381: 23232 pgs: 23232 active+clean; 164 MB data, 8332 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:25:45.630344 mon.0 [INF] pgmap v1382: 23232 pgs: 23232 active+clean; 164 MB data, 8336 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:01.793346 mon.0 [INF] pgmap v1383: 23232 pgs: 23232 active+clean; 164 MB data, 8340 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:05.293101 osd.25 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.005692 secs
> 2013-01-31 19:26:05.293110 osd.25 [WRN] slow request 30.005692 seconds old, received at 2013-01-31 19:25:35.287275: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]
> 2013-01-31 19:26:05.600533 osd.114 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.107983 secs
> 2013-01-31 19:26:05.600542 osd.114 [WRN] slow request 30.107983 seconds old, received at 2013-01-31 19:25:35.492449: osd_op(client.4240.0:35 benchmark_data_ceph-1_39426_object34 [write 0~4194304] 0.9abc7139) v4 currently waiting for subops from [68]
> 2013-01-31 19:26:08.949329 mon.0 [INF] pgmap v1384: 23232 pgs: 23232 active+clean; 164 MB data, 8344 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:10.875142 mon.0 [INF] pgmap v1385: 23232 pgs: 23232 active+clean; 164 MB data, 8348 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:06.358717 osd.516 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.857906 secs
> 2013-01-31 19:26:06.358727 osd.516 [WRN] slow request 30.857906 seconds old, received at 2013-01-31 19:25:35.500724: osd_op(client.4240.0:38 benchmark_data_ceph-1_39426_object37 [write 0~4194304] 0.666b9194) v4 currently waiting for subops from [612]
> 2013-01-31 19:26:05.415806 osd.518 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.129037 secs
> 2013-01-31 19:26:05.415824 osd.518 [WRN] slow request 30.129037 seconds old, received at 2013-01-31 19:25:35.286662: osd_op(client.4240.0:7 benchmark_data_ceph-1_39426_object6 [write 0~4194304] 0.baa4121c) v4 currently waiting for subops from [615]
> 2013-01-31 19:26:06.857152 osd.414 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.890407 secs
> 2013-01-31 19:26:06.857162 osd.414 [WRN] slow request 30.890407 seconds old, received at 2013-01-31 19:25:35.966641: osd_op(client.4240.0:57 benchmark_data_ceph-1_39426_object56 [write 0~4194304] 0.2b691f11) v4 currently waiting for subops from [620]
> 2013-01-31 19:26:27.376824 mon.0 [INF] pgmap v1386: 23232 pgs: 23232 active+clean; 168 MB data, 8356 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:28.552513 mon.0 [INF] pgmap v1387: 23232 pgs: 23232 active+clean; 172 MB data, 8360 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:29.578534 mon.0 [INF] pgmap v1388: 23232 pgs: 23232 active+clean; 176 MB data, 8364 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:30.829351 mon.0 [INF] pgmap v1389: 23232 pgs: 23232 active+clean; 176 MB data, 8368 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:33.155823 mon.0 [INF] pgmap v1390: 23232 pgs: 23232 active+clean; 180 MB data, 8368 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:35.298090 osd.25 [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.010747 secs
> 2013-01-31 19:26:35.298102 osd.25 [WRN] slow request 60.010747 seconds old, received at 2013-01-31 19:25:35.287275: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]
> 2013-01-31 19:26:35.605458 osd.114 [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.112920 secs
> 2013-01-31 19:26:35.605469 osd.114 [WRN] slow request 60.112920 seconds old, received at 2013-01-31 19:25:35.492449: osd_op(client.4240.0:35 benchmark_data_ceph-1_39426_object34 [write 0~4194304] 0.9abc7139) v4 currently waiting for subops from [68]
> 2013-01-31 19:26:38.552453 mon.0 [INF] pgmap v1391: 23232 pgs: 23232 active+clean; 184 MB data, 8368 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:40.524479 mon.0 [INF] pgmap v1392: 23232 pgs: 23232 active+clean; 188 MB data, 8372 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:36.363526 osd.516 [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.862743 secs
> 2013-01-31 19:26:36.363534 osd.516 [WRN] slow request 60.862743 seconds old, received at 2013-01-31 19:25:35.500724: osd_op(client.4240.0:38 benchmark_data_ceph-1_39426_object37 [write 0~4194304] 0.666b9194) v4 currently waiting for subops from [612]
> 2013-01-31 19:26:35.420666 osd.518 [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.133929 secs
> 2013-01-31 19:26:35.420677 osd.518 [WRN] slow request 60.133929 seconds old, received at 2013-01-31 19:25:35.286662: osd_op(client.4240.0:7 benchmark_data_ceph-1_39426_object6 [write 0~4194304] 0.baa4121c) v4 currently waiting for subops from [615]
> 2013-01-31 19:26:44.813429 mon.0 [INF] pgmap v1393: 23232 pgs: 23232 active+clean; 192 MB data, 8372 MB used, 83787 GB / 83795 GB avail
> 2013-01-31 19:26:36.861749 osd.414 [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.895043 secs
> 2013-01-31 19:26:36.861761 osd.414 [WRN] slow request 60.895043 seconds old, received at 2013-01-31 19:25:35.966641: osd_
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Slow request in XFS
2013-01-31 12:43 ` Sage Weil
@ 2013-01-31 15:15 ` Jim Schutt
2013-01-31 20:00 ` Sage Weil
0 siblings, 1 reply; 7+ messages in thread
From: Jim Schutt @ 2013-01-31 15:15 UTC (permalink / raw)
To: Sage Weil; +Cc: Chen, Xiaoxi, ceph-devel
On 01/31/2013 05:43 AM, Sage Weil wrote:
> Hi-
>
> Can you reproduce this with logs? It looks like there are a few ops that
> are hanging for a very long time, but there isn't enough information here
> except to point to osds 610, 612, 615, and 68...
FWIW, I have a small pile of disks with bad sections in
them - these sections aren't bad enough that writes fail,
but they are bad enough that throughput drops by a factor
of ~20.
Do OSDs already collect statistics on, say, op commit
elapsed time (assuming that's the statistic most directly
related to slow writes on only some sections of a disk),
in a way that could be used to diagnose such disks?
If not, is there enough structure already in place that
it would be easy to add?
-- Jim
>
> thanks!
> s
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Slow request in XFS
2013-01-31 15:15 ` Jim Schutt
@ 2013-01-31 20:00 ` Sage Weil
2013-01-31 20:41 ` Jim Schutt
2013-02-01 8:39 ` Chen, Xiaoxi
0 siblings, 2 replies; 7+ messages in thread
From: Sage Weil @ 2013-01-31 20:00 UTC (permalink / raw)
To: Jim Schutt; +Cc: Chen, Xiaoxi, ceph-devel
On Thu, 31 Jan 2013, Jim Schutt wrote:
> On 01/31/2013 05:43 AM, Sage Weil wrote:
> > Hi-
> >
> > Can you reproduce this with logs? It looks like there are a few ops that
> > are hanging for a very long time, but there isn't enough information here
> > except to point to osds 610, 612, 615, and 68...
>
> FWIW, I have a small pile of disks with bad sections in
> them - these sections aren't bad enough that writes fail,
> but they are bad enough that throughput drops by a factor
> of ~20.
>
> Do OSDs already collect statistics on, say, op commit
> elapsed time (assuming that's the statistic most directly
> related to slow writes on only some sections of a disk),
> in a way that could be used to diagnose such disks?
>
> If not, is there enough structure already in place that
> it would be easy to add?
They're tracking it internally, but it gets averaged into the totals
before a user gets to see any per-request latencies. The per-daemon
totals are available via the admin socket 'perf dump' command. Have you
looked at that information yet?
sage
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Slow request in XFS
2013-01-31 20:00 ` Sage Weil
@ 2013-01-31 20:41 ` Jim Schutt
2013-02-01 8:39 ` Chen, Xiaoxi
1 sibling, 0 replies; 7+ messages in thread
From: Jim Schutt @ 2013-01-31 20:41 UTC (permalink / raw)
To: Sage Weil; +Cc: Chen, Xiaoxi, ceph-devel
On 01/31/2013 01:00 PM, Sage Weil wrote:
> On Thu, 31 Jan 2013, Jim Schutt wrote:
>> On 01/31/2013 05:43 AM, Sage Weil wrote:
>>> Hi-
>>>
>>> Can you reproduce this with logs? It looks like there are a few ops that
>>> are hanging for a very long time, but there isn't enough information here
>>> except to point to osds 610, 612, 615, and 68...
>>
>> FWIW, I have a small pile of disks with bad sections in
>> them - these sections aren't bad enough that writes fail,
>> but they are bad enough that throughput drops by a factor
>> of ~20.
>>
>> Do OSDs already collect statistics on, say, op commit
>> elapsed time (assuming that's the statistic most directly
>> related to slow writes on only some sections of a disk),
>> in a way that could be used to diagnose such disks?
>>
>> If not, is there enough structure already in place that
>> it would be easy to add?
>
> They're tracking it internally, but it gets averaged into the totals
> before a user gets to see any per-request latencies. The per-daemon
> totals are available via the admin socket 'perf dump' command. Have you
> looked at that information yet?
Not yet - thanks for the pointer.
I'll take a look, and see what happens when I
put one of my questionable disks back in.
Thanks!
-- Jim
>
> sage
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Slow request in XFS
2013-01-31 20:00 ` Sage Weil
2013-01-31 20:41 ` Jim Schutt
@ 2013-02-01 8:39 ` Chen, Xiaoxi
2013-02-01 12:19 ` Sage Weil
1 sibling, 1 reply; 7+ messages in thread
From: Chen, Xiaoxi @ 2013-02-01 8:39 UTC (permalink / raw)
To: Sage Weil, Jim Schutt; +Cc: ceph-devel
Is it right to have such a lot pg scrubing?I didn't tune anything
2013-02-01 16:37:53.943933 osd.12 [INF] 1.4a8 deep-scrub ok
2013-02-01 16:37:54.379463 osd.14 [INF] 1.93 deep-scrub ok
2013-02-01 16:37:55.076349 osd.210 [INF] 2.4a2 deep-scrub ok
2013-02-01 16:37:56.379866 osd.14 [INF] 1.96 deep-scrub ok
2013-02-01 16:37:56.664761 osd.15 [INF] 1.a0c deep-scrub ok
2013-02-01 16:37:56.908169 osd.120 [INF] 2.39 deep-scrub ok
2013-02-01 16:37:56.944525 osd.12 [INF] 1.4ad deep-scrub ok
2013-02-01 16:37:57.789605 osd.119 [INF] 1.25 deep-scrub ok
2013-02-01 16:37:58.076872 osd.210 [INF] 2.4f5 deep-scrub ok
2013-02-01 16:37:58.380193 osd.14 [INF] 1.ea deep-scrub ok
2013-02-01 16:37:58.789817 osd.119 [INF] 1.65 deep-scrub ok
2013-02-01 16:37:59.174758 osd.13 [INF] 1.4ec deep-scrub ok
2013-02-01 16:37:59.945553 osd.12 [INF] 1.4b5 deep-scrub ok
2013-02-01 16:38:00.757881 osd.28 [INF] 1.5d6 deep-scrub ok
2013-02-01 16:38:00.790282 osd.119 [INF] 1.6b deep-scrub ok
2013-02-01 16:38:00.945527 osd.12 [INF] 1.4c4 deep-scrub ok
2013-02-01 16:38:01.665956 osd.15 [INF] 1.a1e deep-scrub ok
2013-02-01 16:38:02.666142 osd.15 [INF] 2.2d deep-scrub ok
2013-02-01 16:38:02.790525 osd.119 [INF] 1.b7 deep-scrub ok
2013-02-01 16:38:02.880595 osd.28 [INF] 1.609 deep-scrub ok
2013-02-01 16:38:03.077917 osd.210 [INF] 2.50f deep-scrub ok
2013-02-01 16:38:03.622409 mon.0 [INF] pgmap v524: 7872 pgs: 81 active, 7710 active+clean, 81 peering; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
2013-02-01 16:38:03.666447 osd.15 [INF] 2.31 deep-scrub ok
2013-02-01 16:38:04.731588 mon.0 [INF] pgmap v525: 7872 pgs: 75 active, 7716 active+clean, 81 peering; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
2013-02-01 16:37:55.688988 osd.29 [INF] 1.82f deep-scrub ok
2013-02-01 16:37:56.142424 osd.216 [INF] 2.4fc deep-scrub ok
2013-02-01 16:37:58.689627 osd.29 [INF] 1.831 deep-scrub ok
2013-02-01 16:37:59.689851 osd.29 [INF] 1.845 deep-scrub ok
2013-02-01 16:38:01.143157 osd.216 [INF] 2.50d deep-scrub ok
2013-02-01 16:38:01.690207 osd.29 [INF] 1.882 deep-scrub ok
2013-02-01 16:38:02.143704 osd.216 [INF] 2.551 deep-scrub ok
2013-02-01 16:38:02.690510 osd.29 [INF] 1.8be deep-scrub ok
2013-02-01 16:38:03.143961 osd.216 [INF] 2.561 deep-scrub ok
2013-02-01 16:38:04.144356 osd.216 [INF] 2.61c deep-scrub ok
2013-02-01 16:38:05.890791 mon.0 [INF] pgmap v526: 7872 pgs: 75 active, 7716 active+clean, 81 peering; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
2013-02-01 16:38:06.974976 mon.0 [INF] pgmap v527: 7872 pgs: 77 active, 7716 active+clean, 78 peering, 1 active+clean+scrubbing+deep; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
2013-02-01 16:38:08.092490 mon.0 [INF] pgmap v528: 7872 pgs: 77 active, 7716 active+clean, 78 peering, 1 active+clean+scrubbing+deep; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
2013-02-01 16:38:02.318896 osd.220 [INF] 2.3b9 deep-scrub ok
2013-02-01 16:38:02.618443 osd.22 [INF] 2.3b0 deep-scrub ok
2013-02-01 16:38:04.319381 osd.220 [INF] 2.3cf deep-scrub ok
2013-02-01 16:38:04.618932 osd.22 [INF] 2.3b6 deep-scrub ok
2013-02-01 16:38:09.160545 mon.0 [INF] pgmap v529: 7872 pgs: 77 active, 7716 active+clean, 78 peering, 1 active+clean+scrubbing+deep; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
2013-02-01 16:38:10.244711 mon.0 [INF] pgmap v530: 7872 pgs: 77 active, 7716 active+clean, 78 peering, 1 active+clean+scrubbing+deep; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
2013-02-01 16:38:00.789838 osd.24 [INF] 1.734 deep-scrub ok
2013-02-01 16:38:01.543266 osd.25 [INF] 1.827 deep-scrub ok
2013-02-01 16:38:02.543662 osd.25 [INF] 1.834 deep-scrub ok
2013-02-01 16:38:03.085778 osd.211 [INF] 2.a24 deep-scrub ok
2013-02-01 16:38:03.790503 osd.24 [INF] 1.737 deep-scrub ok
2013-02-01 16:38:06.791036 osd.24 [INF] 1.74f deep-scrub ok
2013-02-01 16:38:08.544790 osd.25 [INF] 1.83a deep-scrub ok
2013-02-01 16:38:10.545088 osd.25 [INF] 1.83f deep-scrub ok
2013-02-01 16:38:11.453853 mon.0 [INF] pgmap v531: 7872 pgs: 77 active, 7716 active+clean, 78 peering, 1 active+clean+scrubbing+deep; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
2013-02-01 16:38:02.064245 osd.111 [INF] 1.522 deep-scrub ok
2013-02-01 16:38:02.169805 osd.110 [INF] 1.608 deep-scrub ok
2013-02-01 16:38:05.064739 osd.111 [INF] 1.526 deep-scrub ok
2013-02-01 16:38:05.098158 osd.17 [INF] 1.282 deep-scrub ok
2013-02-01 16:38:06.065128 osd.111 [INF] 1.53d deep-scrub ok
2013-02-01 16:38:07.098443 osd.17 [INF] 1.2dc deep-scrub ok
2013-02-01 16:38:09.098901 osd.17 [INF] 1.2e5 deep-scrub ok
2013-02-01 16:38:09.236673 osd.16 [INF] 0.1ea deep-scrub ok
2013-02-01 16:38:03.124982 osd.27 [INF] 1.545 deep-scrub ok
2013-02-01 16:38:04.078118 osd.210 [INF] 2.52d deep-scrub ok
2013-02-01 16:38:04.125158 osd.213 [INF] 2.813 deep-scrub ok
2013-02-01 16:38:04.381415 osd.14 [INF] 1.103 deep-scrub ok
2013-02-01 16:38:04.666744 osd.15 [INF] 2.a8 deep-scrub ok
2013-02-01 16:38:05.078598 osd.210 [INF] 2.534 deep-scrub ok
2013-02-01 16:38:05.125319 osd.213 [INF] 2.83d deep-scrub ok
2013-02-01 16:38:05.300474 osd.117 [INF] 1.3a0 deep-scrub ok
2013-02-01 16:38:05.381625 osd.14 [INF] 1.108 deep-scrub ok
2013-02-01 16:38:05.881223 osd.28 [INF] 1.634 deep-scrub ok
2013-02-01 16:38:06.078723 osd.210 [INF] 2.552 deep-scrub ok
2013-02-01 16:38:06.093018 osd.113 [INF] 1.349 deep-scrub ok
2013-02-01 16:38:06.125485 osd.27 [INF] 1.546 deep-scrub ok
2013-02-01 16:38:06.300759 osd.117 [INF] 1.3fc deep-scrub ok
2013-02-01 16:38:06.881229 osd.28 [INF] 1.647 deep-scrub ok
2013-02-01 16:38:06.946823 osd.12 [INF] 1.514 deep-scrub ok
2013-02-01 16:38:08.093758 osd.113 [INF] 1.3af deep-scrub ok
2013-02-01 16:38:08.791689 osd.119 [INF] 1.10a deep-scrub ok
2013-02-01 16:38:09.079355 osd.210 [INF] 2.55c deep-scrub ok
2013-02-01 16:38:09.125886 osd.27 [INF] 1.54a deep-scrub ok
2013-02-01 16:38:09.747373 osd.118 [INF] 0.992 deep-scrub ok
2013-02-01 16:38:09.792101 osd.119 [INF] 1.11a deep-scrub ok
2013-02-01 16:38:10.079707 osd.210 [INF] 2.5a0 deep-scrub ok
2013-02-01 16:38:10.747615 osd.118 [INF] 0.9a6 deep-scrub ok
2013-02-01 16:38:11.126305 osd.213 [INF] 2.85e deep-scrub ok
2013-02-01 16:38:11.390611 osd.214 [INF] 2.266 deep-scrub ok
2013-02-01 16:38:11.692015 osd.29 [INF] 1.8c6 deep-scrub ok
2013-02-01 16:38:11.947688 osd.12 [INF] 1.523 deep-scrub ok
2013-02-01 16:38:12.301511 osd.117 [INF] 1.442 deep-scrub ok
2013-02-01 16:38:12.390220 osd.214 [INF] 2.26c deep-scrub ok
-----Original Message-----
From: Sage Weil [mailto:sage@inktank.com]
Sent: 2013年2月1日 4:01
To: Jim Schutt
Cc: Chen, Xiaoxi; ceph-devel@vger.kernel.org
Subject: Re: Slow request in XFS
On Thu, 31 Jan 2013, Jim Schutt wrote:
> On 01/31/2013 05:43 AM, Sage Weil wrote:
> > Hi-
> >
> > Can you reproduce this with logs? It looks like there are a few ops
> > that are hanging for a very long time, but there isn't enough
> > information here except to point to osds 610, 612, 615, and 68...
>
> FWIW, I have a small pile of disks with bad sections in them - these
> sections aren't bad enough that writes fail, but they are bad enough
> that throughput drops by a factor of ~20.
>
> Do OSDs already collect statistics on, say, op commit elapsed time
> (assuming that's the statistic most directly related to slow writes on
> only some sections of a disk), in a way that could be used to diagnose
> such disks?
>
> If not, is there enough structure already in place that it would be
> easy to add?
They're tracking it internally, but it gets averaged into the totals before a user gets to see any per-request latencies. The per-daemon totals are available via the admin socket 'perf dump' command. Have you looked at that information yet?
sage
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Slow request in XFS
2013-02-01 8:39 ` Chen, Xiaoxi
@ 2013-02-01 12:19 ` Sage Weil
0 siblings, 0 replies; 7+ messages in thread
From: Sage Weil @ 2013-02-01 12:19 UTC (permalink / raw)
To: Chen, Xiaoxi; +Cc: Jim Schutt, ceph-devel
On Fri, 1 Feb 2013, Chen, Xiaoxi wrote:
> Is it right to have such a lot pg scrubing?I didn't tune anything
For a new cluster, you'll see a bunch of these as the initial scrub
happens. Then it's once per configurable period.. 24h for the regular
scrub (metadata) and a week for the deep scrub (reread all data). Do
ceph --show-config | grep scrub
to see the options and their defaults.
sage
>
> 2013-02-01 16:37:53.943933 osd.12 [INF] 1.4a8 deep-scrub ok
> 2013-02-01 16:37:54.379463 osd.14 [INF] 1.93 deep-scrub ok
> 2013-02-01 16:37:55.076349 osd.210 [INF] 2.4a2 deep-scrub ok
> 2013-02-01 16:37:56.379866 osd.14 [INF] 1.96 deep-scrub ok
> 2013-02-01 16:37:56.664761 osd.15 [INF] 1.a0c deep-scrub ok
> 2013-02-01 16:37:56.908169 osd.120 [INF] 2.39 deep-scrub ok
> 2013-02-01 16:37:56.944525 osd.12 [INF] 1.4ad deep-scrub ok
> 2013-02-01 16:37:57.789605 osd.119 [INF] 1.25 deep-scrub ok
> 2013-02-01 16:37:58.076872 osd.210 [INF] 2.4f5 deep-scrub ok
> 2013-02-01 16:37:58.380193 osd.14 [INF] 1.ea deep-scrub ok
> 2013-02-01 16:37:58.789817 osd.119 [INF] 1.65 deep-scrub ok
> 2013-02-01 16:37:59.174758 osd.13 [INF] 1.4ec deep-scrub ok
> 2013-02-01 16:37:59.945553 osd.12 [INF] 1.4b5 deep-scrub ok
> 2013-02-01 16:38:00.757881 osd.28 [INF] 1.5d6 deep-scrub ok
> 2013-02-01 16:38:00.790282 osd.119 [INF] 1.6b deep-scrub ok
> 2013-02-01 16:38:00.945527 osd.12 [INF] 1.4c4 deep-scrub ok
> 2013-02-01 16:38:01.665956 osd.15 [INF] 1.a1e deep-scrub ok
> 2013-02-01 16:38:02.666142 osd.15 [INF] 2.2d deep-scrub ok
> 2013-02-01 16:38:02.790525 osd.119 [INF] 1.b7 deep-scrub ok
> 2013-02-01 16:38:02.880595 osd.28 [INF] 1.609 deep-scrub ok
> 2013-02-01 16:38:03.077917 osd.210 [INF] 2.50f deep-scrub ok
> 2013-02-01 16:38:03.622409 mon.0 [INF] pgmap v524: 7872 pgs: 81 active, 7710 active+clean, 81 peering; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
> 2013-02-01 16:38:03.666447 osd.15 [INF] 2.31 deep-scrub ok
> 2013-02-01 16:38:04.731588 mon.0 [INF] pgmap v525: 7872 pgs: 75 active, 7716 active+clean, 81 peering; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
> 2013-02-01 16:37:55.688988 osd.29 [INF] 1.82f deep-scrub ok
> 2013-02-01 16:37:56.142424 osd.216 [INF] 2.4fc deep-scrub ok
> 2013-02-01 16:37:58.689627 osd.29 [INF] 1.831 deep-scrub ok
> 2013-02-01 16:37:59.689851 osd.29 [INF] 1.845 deep-scrub ok
> 2013-02-01 16:38:01.143157 osd.216 [INF] 2.50d deep-scrub ok
> 2013-02-01 16:38:01.690207 osd.29 [INF] 1.882 deep-scrub ok
> 2013-02-01 16:38:02.143704 osd.216 [INF] 2.551 deep-scrub ok
> 2013-02-01 16:38:02.690510 osd.29 [INF] 1.8be deep-scrub ok
> 2013-02-01 16:38:03.143961 osd.216 [INF] 2.561 deep-scrub ok
> 2013-02-01 16:38:04.144356 osd.216 [INF] 2.61c deep-scrub ok
> 2013-02-01 16:38:05.890791 mon.0 [INF] pgmap v526: 7872 pgs: 75 active, 7716 active+clean, 81 peering; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
> 2013-02-01 16:38:06.974976 mon.0 [INF] pgmap v527: 7872 pgs: 77 active, 7716 active+clean, 78 peering, 1 active+clean+scrubbing+deep; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
> 2013-02-01 16:38:08.092490 mon.0 [INF] pgmap v528: 7872 pgs: 77 active, 7716 active+clean, 78 peering, 1 active+clean+scrubbing+deep; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
> 2013-02-01 16:38:02.318896 osd.220 [INF] 2.3b9 deep-scrub ok
> 2013-02-01 16:38:02.618443 osd.22 [INF] 2.3b0 deep-scrub ok
> 2013-02-01 16:38:04.319381 osd.220 [INF] 2.3cf deep-scrub ok
> 2013-02-01 16:38:04.618932 osd.22 [INF] 2.3b6 deep-scrub ok
> 2013-02-01 16:38:09.160545 mon.0 [INF] pgmap v529: 7872 pgs: 77 active, 7716 active+clean, 78 peering, 1 active+clean+scrubbing+deep; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
> 2013-02-01 16:38:10.244711 mon.0 [INF] pgmap v530: 7872 pgs: 77 active, 7716 active+clean, 78 peering, 1 active+clean+scrubbing+deep; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
> 2013-02-01 16:38:00.789838 osd.24 [INF] 1.734 deep-scrub ok
> 2013-02-01 16:38:01.543266 osd.25 [INF] 1.827 deep-scrub ok
> 2013-02-01 16:38:02.543662 osd.25 [INF] 1.834 deep-scrub ok
> 2013-02-01 16:38:03.085778 osd.211 [INF] 2.a24 deep-scrub ok
> 2013-02-01 16:38:03.790503 osd.24 [INF] 1.737 deep-scrub ok
> 2013-02-01 16:38:06.791036 osd.24 [INF] 1.74f deep-scrub ok
> 2013-02-01 16:38:08.544790 osd.25 [INF] 1.83a deep-scrub ok
> 2013-02-01 16:38:10.545088 osd.25 [INF] 1.83f deep-scrub ok
> 2013-02-01 16:38:11.453853 mon.0 [INF] pgmap v531: 7872 pgs: 77 active, 7716 active+clean, 78 peering, 1 active+clean+scrubbing+deep; 0 bytes data, 1738 MB used, 27930 GB / 27931 GB avail
> 2013-02-01 16:38:02.064245 osd.111 [INF] 1.522 deep-scrub ok
> 2013-02-01 16:38:02.169805 osd.110 [INF] 1.608 deep-scrub ok
> 2013-02-01 16:38:05.064739 osd.111 [INF] 1.526 deep-scrub ok
> 2013-02-01 16:38:05.098158 osd.17 [INF] 1.282 deep-scrub ok
> 2013-02-01 16:38:06.065128 osd.111 [INF] 1.53d deep-scrub ok
> 2013-02-01 16:38:07.098443 osd.17 [INF] 1.2dc deep-scrub ok
> 2013-02-01 16:38:09.098901 osd.17 [INF] 1.2e5 deep-scrub ok
> 2013-02-01 16:38:09.236673 osd.16 [INF] 0.1ea deep-scrub ok
> 2013-02-01 16:38:03.124982 osd.27 [INF] 1.545 deep-scrub ok
> 2013-02-01 16:38:04.078118 osd.210 [INF] 2.52d deep-scrub ok
> 2013-02-01 16:38:04.125158 osd.213 [INF] 2.813 deep-scrub ok
> 2013-02-01 16:38:04.381415 osd.14 [INF] 1.103 deep-scrub ok
> 2013-02-01 16:38:04.666744 osd.15 [INF] 2.a8 deep-scrub ok
> 2013-02-01 16:38:05.078598 osd.210 [INF] 2.534 deep-scrub ok
> 2013-02-01 16:38:05.125319 osd.213 [INF] 2.83d deep-scrub ok
> 2013-02-01 16:38:05.300474 osd.117 [INF] 1.3a0 deep-scrub ok
> 2013-02-01 16:38:05.381625 osd.14 [INF] 1.108 deep-scrub ok
> 2013-02-01 16:38:05.881223 osd.28 [INF] 1.634 deep-scrub ok
> 2013-02-01 16:38:06.078723 osd.210 [INF] 2.552 deep-scrub ok
> 2013-02-01 16:38:06.093018 osd.113 [INF] 1.349 deep-scrub ok
> 2013-02-01 16:38:06.125485 osd.27 [INF] 1.546 deep-scrub ok
> 2013-02-01 16:38:06.300759 osd.117 [INF] 1.3fc deep-scrub ok
> 2013-02-01 16:38:06.881229 osd.28 [INF] 1.647 deep-scrub ok
> 2013-02-01 16:38:06.946823 osd.12 [INF] 1.514 deep-scrub ok
> 2013-02-01 16:38:08.093758 osd.113 [INF] 1.3af deep-scrub ok
> 2013-02-01 16:38:08.791689 osd.119 [INF] 1.10a deep-scrub ok
> 2013-02-01 16:38:09.079355 osd.210 [INF] 2.55c deep-scrub ok
> 2013-02-01 16:38:09.125886 osd.27 [INF] 1.54a deep-scrub ok
> 2013-02-01 16:38:09.747373 osd.118 [INF] 0.992 deep-scrub ok
> 2013-02-01 16:38:09.792101 osd.119 [INF] 1.11a deep-scrub ok
> 2013-02-01 16:38:10.079707 osd.210 [INF] 2.5a0 deep-scrub ok
> 2013-02-01 16:38:10.747615 osd.118 [INF] 0.9a6 deep-scrub ok
> 2013-02-01 16:38:11.126305 osd.213 [INF] 2.85e deep-scrub ok
> 2013-02-01 16:38:11.390611 osd.214 [INF] 2.266 deep-scrub ok
> 2013-02-01 16:38:11.692015 osd.29 [INF] 1.8c6 deep-scrub ok
> 2013-02-01 16:38:11.947688 osd.12 [INF] 1.523 deep-scrub ok
> 2013-02-01 16:38:12.301511 osd.117 [INF] 1.442 deep-scrub ok
> 2013-02-01 16:38:12.390220 osd.214 [INF] 2.26c deep-scrub ok
>
> -----Original Message-----
> From: Sage Weil [mailto:sage@inktank.com]
> Sent: 2013?2?1? 4:01
> To: Jim Schutt
> Cc: Chen, Xiaoxi; ceph-devel@vger.kernel.org
> Subject: Re: Slow request in XFS
>
> On Thu, 31 Jan 2013, Jim Schutt wrote:
> > On 01/31/2013 05:43 AM, Sage Weil wrote:
> > > Hi-
> > >
> > > Can you reproduce this with logs? It looks like there are a few ops
> > > that are hanging for a very long time, but there isn't enough
> > > information here except to point to osds 610, 612, 615, and 68...
> >
> > FWIW, I have a small pile of disks with bad sections in them - these
> > sections aren't bad enough that writes fail, but they are bad enough
> > that throughput drops by a factor of ~20.
> >
> > Do OSDs already collect statistics on, say, op commit elapsed time
> > (assuming that's the statistic most directly related to slow writes on
> > only some sections of a disk), in a way that could be used to diagnose
> > such disks?
> >
> > If not, is there enough structure already in place that it would be
> > easy to add?
>
> They're tracking it internally, but it gets averaged into the totals before a user gets to see any per-request latencies. The per-daemon totals are available via the admin socket 'perf dump' command. Have you looked at that information yet?
>
> sage
> N????y????b?????v?????{.n??????z??ay????????j\a???f????????????????:+v????????\a??zZ+??????"?!?
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-02-01 12:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-31 11:34 Slow request in XFS Chen, Xiaoxi
2013-01-31 12:43 ` Sage Weil
2013-01-31 15:15 ` Jim Schutt
2013-01-31 20:00 ` Sage Weil
2013-01-31 20:41 ` Jim Schutt
2013-02-01 8:39 ` Chen, Xiaoxi
2013-02-01 12:19 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.