* Commit and Apply latency on nautilus
@ 2019-09-29 11:05 Alex Litvak
2019-09-30 16:13 ` Marc Roos
[not found] ` <H000007100150ea4.1569860022.sx.f1-outsourcing.eu*@MHS>
0 siblings, 2 replies; 7+ messages in thread
From: Alex Litvak @ 2019-09-29 11:05 UTC (permalink / raw)
To: ceph-users-idqoXFIVOFJgJs9I8MT0rw; +Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA
Hello everyone,
I am running a number of parallel benchmark tests against the cluster that should be ready to go to production.
I enabled prometheus to monitor various information and while cluster stays healthy through the tests with no errors or slow requests,
I noticed an apply / commit latency jumping between 40 - 600 ms on multiple SSDs. At the same time op_read and op_write are on average below 0.25 ms in the worth case scenario.
I am running nautilus 14.2.2, all bluestore, no separate NVME devices for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all drives Seagate Nytro 1551, osd spread across 6 nodes, running in
containers. Each node has plenty of RAM with utilization ~ 25 GB during the benchmark runs.
Here are benchmarks being run from 6 client systems in parallel, repeating the test for each block size in <4k,16k,128k,4M>.
On rbd mapped partition local to each client:
fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300 --group_reporting --time_based --rwmixread=70
On mounted cephfs volume with each client storing test file(s) in own sub-directory:
fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300 --group_reporting --time_based --rwmixread=70
dbench -t 30 30
Could you please let me know if huge jump in applied and committed latency is justified in my case and whether I can do anything to improve / fix it. Below is some additional cluster info.
Thank you,
root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
6 ssd 1.74609 1.00000 1.7 TiB 93 GiB 92 GiB 240 MiB 784 MiB 1.7 TiB 5.21 0.90 44 up
12 ssd 1.74609 1.00000 1.7 TiB 98 GiB 97 GiB 118 MiB 906 MiB 1.7 TiB 5.47 0.95 40 up
18 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 123 MiB 901 MiB 1.6 TiB 5.73 0.99 47 up
24 ssd 3.49219 1.00000 3.5 TiB 222 GiB 221 GiB 134 MiB 890 MiB 3.3 TiB 6.20 1.07 96 up
30 ssd 3.49219 1.00000 3.5 TiB 213 GiB 212 GiB 151 MiB 873 MiB 3.3 TiB 5.95 1.03 93 up
35 ssd 3.49219 1.00000 3.5 TiB 203 GiB 202 GiB 301 MiB 723 MiB 3.3 TiB 5.67 0.98 100 up
5 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB 123 MiB 901 MiB 1.6 TiB 5.78 1.00 49 up
11 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB 63 MiB 961 MiB 1.6 TiB 6.09 1.05 46 up
17 ssd 1.74609 1.00000 1.7 TiB 104 GiB 103 GiB 205 MiB 819 MiB 1.6 TiB 5.81 1.01 50 up
23 ssd 3.49219 1.00000 3.5 TiB 210 GiB 209 GiB 168 MiB 856 MiB 3.3 TiB 5.86 1.01 86 up
29 ssd 3.49219 1.00000 3.5 TiB 204 GiB 203 GiB 272 MiB 752 MiB 3.3 TiB 5.69 0.98 92 up
34 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 295 MiB 729 MiB 3.3 TiB 5.54 0.96 85 up
4 ssd 1.74609 1.00000 1.7 TiB 119 GiB 118 GiB 16 KiB 1024 MiB 1.6 TiB 6.67 1.15 50 up
10 ssd 1.74609 1.00000 1.7 TiB 95 GiB 94 GiB 183 MiB 841 MiB 1.7 TiB 5.31 0.92 46 up
16 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 122 MiB 902 MiB 1.6 TiB 5.72 0.99 50 up
22 ssd 3.49219 1.00000 3.5 TiB 218 GiB 217 GiB 109 MiB 915 MiB 3.3 TiB 6.11 1.06 91 up
28 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 343 MiB 681 MiB 3.3 TiB 5.54 0.96 95 up
33 ssd 3.49219 1.00000 3.5 TiB 198 GiB 196 GiB 297 MiB 1019 MiB 3.3 TiB 5.53 0.96 85 up
1 ssd 1.74609 1.00000 1.7 TiB 101 GiB 100 GiB 222 MiB 802 MiB 1.6 TiB 5.63 0.97 49 up
7 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 153 MiB 871 MiB 1.6 TiB 5.69 0.99 46 up
13 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB 67 MiB 957 MiB 1.6 TiB 5.96 1.03 42 up
19 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 179 MiB 845 MiB 3.3 TiB 5.77 1.00 83 up
25 ssd 3.49219 1.00000 3.5 TiB 195 GiB 194 GiB 352 MiB 672 MiB 3.3 TiB 5.45 0.94 97 up
31 ssd 3.49219 1.00000 3.5 TiB 201 GiB 200 GiB 305 MiB 719 MiB 3.3 TiB 5.62 0.97 90 up
0 ssd 1.74609 1.00000 1.7 TiB 110 GiB 109 GiB 29 MiB 995 MiB 1.6 TiB 6.14 1.06 43 up
3 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB 28 MiB 996 MiB 1.6 TiB 6.07 1.05 41 up
9 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB 149 MiB 875 MiB 1.6 TiB 5.76 1.00 52 up
15 ssd 3.49219 1.00000 3.5 TiB 209 GiB 208 GiB 253 MiB 771 MiB 3.3 TiB 5.83 1.01 98 up
21 ssd 3.49219 1.00000 3.5 TiB 199 GiB 198 GiB 302 MiB 722 MiB 3.3 TiB 5.56 0.96 90 up
27 ssd 3.49219 1.00000 3.5 TiB 208 GiB 207 GiB 226 MiB 798 MiB 3.3 TiB 5.81 1.00 95 up
2 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB 158 MiB 866 MiB 1.7 TiB 5.35 0.93 45 up
8 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB 132 MiB 892 MiB 1.6 TiB 5.91 1.02 50 up
14 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB 180 MiB 844 MiB 1.7 TiB 5.35 0.92 46 up
20 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 156 MiB 868 MiB 3.3 TiB 6.18 1.07 101 up
26 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 332 MiB 692 MiB 3.3 TiB 5.76 1.00 92 up
32 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 88 MiB 936 MiB 3.3 TiB 6.18 1.07 91 up
TOTAL 94 TiB 5.5 TiB 5.4 TiB 6.4 GiB 30 GiB 89 TiB 5.78
MIN/MAX VAR: 0.90/1.15 STDDEV: 0.30
root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph -s
cluster:
id: 9b4468b7-5bf2-4964-8aec-4b2f4bee87ad
health: HEALTH_OK
services:
mon: 3 daemons, quorum storage2n1-la,storage2n2-la,storage2n3-la (age 9w)
mgr: storage2n2-la(active, since 9w), standbys: storage2n1-la, storage2n3-la
mds: cephfs:1 {0=storage2n6-la=up:active} 1 up:standby-replay 1 up:standby
osd: 36 osds: 36 up (since 9w), 36 in (since 9w)
data:
pools: 3 pools, 832 pgs
objects: 4.18M objects, 1.8 TiB
usage: 5.5 TiB used, 89 TiB / 94 TiB avail
pgs: 832 active+clean
io:
client: 852 B/s rd, 15 KiB/s wr, 4 op/s rd, 2 op/s wr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Commit and Apply latency on nautilus
2019-09-29 11:05 Commit and Apply latency on nautilus Alex Litvak
@ 2019-09-30 16:13 ` Marc Roos
[not found] ` <H000007100150ea4.1569860022.sx.f1-outsourcing.eu*@MHS>
1 sibling, 0 replies; 7+ messages in thread
From: Marc Roos @ 2019-09-30 16:13 UTC (permalink / raw)
To: alexander.v.litvak, ceph-users; +Cc: ceph-devel
What parameters are you exactly using? I want to do a similar test on
luminous, before I upgrade to Nautilus. I have quite a lot (74+)
type_instance=Osd.opBeforeDequeueOpLat
type_instance=Osd.opBeforeQueueOpLat
type_instance=Osd.opLatency
type_instance=Osd.opPrepareLatency
type_instance=Osd.opProcessLatency
type_instance=Osd.opRLatency
type_instance=Osd.opRPrepareLatency
type_instance=Osd.opRProcessLatency
type_instance=Osd.opRwLatency
type_instance=Osd.opRwPrepareLatency
type_instance=Osd.opRwProcessLatency
type_instance=Osd.opWLatency
type_instance=Osd.opWPrepareLatency
type_instance=Osd.opWProcessLatency
type_instance=Osd.subopLatency
type_instance=Osd.subopWLatency
...
...
-----Original Message-----
From: Alex Litvak [mailto:alexander.v.litvak-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
Sent: zondag 29 september 2019 13:06
To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [ceph-users] Commit and Apply latency on nautilus
Hello everyone,
I am running a number of parallel benchmark tests against the cluster
that should be ready to go to production.
I enabled prometheus to monitor various information and while cluster
stays healthy through the tests with no errors or slow requests,
I noticed an apply / commit latency jumping between 40 - 600 ms on
multiple SSDs. At the same time op_read and op_write are on average
below 0.25 ms in the worth case scenario.
I am running nautilus 14.2.2, all bluestore, no separate NVME devices
for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all drives Seagate
Nytro 1551, osd spread across 6 nodes, running in
containers. Each node has plenty of RAM with utilization ~ 25 GB during
the benchmark runs.
Here are benchmarks being run from 6 client systems in parallel,
repeating the test for each block size in <4k,16k,128k,4M>.
On rbd mapped partition local to each client:
fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
--bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
--group_reporting --time_based --rwmixread=70
On mounted cephfs volume with each client storing test file(s) in own
sub-directory:
fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
--bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
--group_reporting --time_based --rwmixread=70
dbench -t 30 30
Could you please let me know if huge jump in applied and committed
latency is justified in my case and whether I can do anything to improve
/ fix it. Below is some additional cluster info.
Thank you,
root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL
%USE VAR PGS STATUS
6 ssd 1.74609 1.00000 1.7 TiB 93 GiB 92 GiB 240 MiB 784 MiB 1.7
TiB 5.21 0.90 44 up
12 ssd 1.74609 1.00000 1.7 TiB 98 GiB 97 GiB 118 MiB 906 MiB 1.7
TiB 5.47 0.95 40 up
18 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 123 MiB 901 MiB 1.6
TiB 5.73 0.99 47 up
24 ssd 3.49219 1.00000 3.5 TiB 222 GiB 221 GiB 134 MiB 890 MiB 3.3
TiB 6.20 1.07 96 up
30 ssd 3.49219 1.00000 3.5 TiB 213 GiB 212 GiB 151 MiB 873 MiB 3.3
TiB 5.95 1.03 93 up
35 ssd 3.49219 1.00000 3.5 TiB 203 GiB 202 GiB 301 MiB 723 MiB 3.3
TiB 5.67 0.98 100 up
5 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB 123 MiB 901 MiB 1.6
TiB 5.78 1.00 49 up
11 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB 63 MiB 961 MiB 1.6
TiB 6.09 1.05 46 up
17 ssd 1.74609 1.00000 1.7 TiB 104 GiB 103 GiB 205 MiB 819 MiB 1.6
TiB 5.81 1.01 50 up
23 ssd 3.49219 1.00000 3.5 TiB 210 GiB 209 GiB 168 MiB 856 MiB 3.3
TiB 5.86 1.01 86 up
29 ssd 3.49219 1.00000 3.5 TiB 204 GiB 203 GiB 272 MiB 752 MiB 3.3
TiB 5.69 0.98 92 up
34 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 295 MiB 729 MiB 3.3
TiB 5.54 0.96 85 up
4 ssd 1.74609 1.00000 1.7 TiB 119 GiB 118 GiB 16 KiB 1024 MiB 1.6
TiB 6.67 1.15 50 up
10 ssd 1.74609 1.00000 1.7 TiB 95 GiB 94 GiB 183 MiB 841 MiB 1.7
TiB 5.31 0.92 46 up
16 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 122 MiB 902 MiB 1.6
TiB 5.72 0.99 50 up
22 ssd 3.49219 1.00000 3.5 TiB 218 GiB 217 GiB 109 MiB 915 MiB 3.3
TiB 6.11 1.06 91 up
28 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 343 MiB 681 MiB 3.3
TiB 5.54 0.96 95 up
33 ssd 3.49219 1.00000 3.5 TiB 198 GiB 196 GiB 297 MiB 1019 MiB 3.3
TiB 5.53 0.96 85 up
1 ssd 1.74609 1.00000 1.7 TiB 101 GiB 100 GiB 222 MiB 802 MiB 1.6
TiB 5.63 0.97 49 up
7 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 153 MiB 871 MiB 1.6
TiB 5.69 0.99 46 up
13 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB 67 MiB 957 MiB 1.6
TiB 5.96 1.03 42 up
19 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 179 MiB 845 MiB 3.3
TiB 5.77 1.00 83 up
25 ssd 3.49219 1.00000 3.5 TiB 195 GiB 194 GiB 352 MiB 672 MiB 3.3
TiB 5.45 0.94 97 up
31 ssd 3.49219 1.00000 3.5 TiB 201 GiB 200 GiB 305 MiB 719 MiB 3.3
TiB 5.62 0.97 90 up
0 ssd 1.74609 1.00000 1.7 TiB 110 GiB 109 GiB 29 MiB 995 MiB 1.6
TiB 6.14 1.06 43 up
3 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB 28 MiB 996 MiB 1.6
TiB 6.07 1.05 41 up
9 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB 149 MiB 875 MiB 1.6
TiB 5.76 1.00 52 up
15 ssd 3.49219 1.00000 3.5 TiB 209 GiB 208 GiB 253 MiB 771 MiB 3.3
TiB 5.83 1.01 98 up
21 ssd 3.49219 1.00000 3.5 TiB 199 GiB 198 GiB 302 MiB 722 MiB 3.3
TiB 5.56 0.96 90 up
27 ssd 3.49219 1.00000 3.5 TiB 208 GiB 207 GiB 226 MiB 798 MiB 3.3
TiB 5.81 1.00 95 up
2 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB 158 MiB 866 MiB 1.7
TiB 5.35 0.93 45 up
8 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB 132 MiB 892 MiB 1.6
TiB 5.91 1.02 50 up
14 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB 180 MiB 844 MiB 1.7
TiB 5.35 0.92 46 up
20 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 156 MiB 868 MiB 3.3
TiB 6.18 1.07 101 up
26 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 332 MiB 692 MiB 3.3
TiB 5.76 1.00 92 up
32 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 88 MiB 936 MiB 3.3
TiB 6.18 1.07 91 up
TOTAL 94 TiB 5.5 TiB 5.4 TiB 6.4 GiB 30 GiB 89
TiB 5.78
MIN/MAX VAR: 0.90/1.15 STDDEV: 0.30
root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph -s
cluster:
id: 9b4468b7-5bf2-4964-8aec-4b2f4bee87ad
health: HEALTH_OK
services:
mon: 3 daemons, quorum storage2n1-la,storage2n2-la,storage2n3-la
(age 9w)
mgr: storage2n2-la(active, since 9w), standbys: storage2n1-la,
storage2n3-la
mds: cephfs:1 {0=storage2n6-la=up:active} 1 up:standby-replay 1
up:standby
osd: 36 osds: 36 up (since 9w), 36 in (since 9w)
data:
pools: 3 pools, 832 pgs
objects: 4.18M objects, 1.8 TiB
usage: 5.5 TiB used, 89 TiB / 94 TiB avail
pgs: 832 active+clean
io:
client: 852 B/s rd, 15 KiB/s wr, 4 op/s rd, 2 op/s wr
_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Commit and Apply latency on nautilus
[not found] ` <H000007100150ea4.1569860022.sx.f1-outsourcing.eu*@MHS>
@ 2019-09-30 18:45 ` Sasha Litvak
[not found] ` <CALi_L4-fNi=gP9sOCWPNcok9tVG=K-rtER68n1s9bkZzwuGhEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Sasha Litvak @ 2019-09-30 18:45 UTC (permalink / raw)
To: Marc Roos; +Cc: ceph-users, ceph-devel
[-- Attachment #1.1: Type: text/plain, Size: 8560 bytes --]
In my case, I am using premade Prometheus sourced dashboards in grafana.
For individual latency, the query looks like that
irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
(ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])
irate(ceph_osd_op_w_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
(ceph_daemon) irate(ceph_osd_op_w_latency_count[1m])
The other ones use
ceph_osd_commit_latency_ms
ceph_osd_apply_latency_ms
and graph the distribution of it over time
Also, average OSD op latency
avg(rate(ceph_osd_op_r_latency_sum{cluster="$cluster"}[5m]) /
rate(ceph_osd_op_r_latency_count{cluster="$cluster"}[5m]) >= 0)
avg(rate(ceph_osd_op_w_latency_sum{cluster="$cluster"}[5m]) /
rate(ceph_osd_op_w_latency_count{cluster="$cluster"}[5m]) >= 0)
Average OSD apply + commit latency
avg(ceph_osd_apply_latency_ms{cluster="$cluster"})
avg(ceph_osd_commit_latency_ms{cluster="$cluster"})
On Mon, Sep 30, 2019 at 11:13 AM Marc Roos <M.Roos-yG4tGvGIC004hcxptnrGZodd74u8MsAO@public.gmane.org> wrote:
>
> What parameters are you exactly using? I want to do a similar test on
> luminous, before I upgrade to Nautilus. I have quite a lot (74+)
>
> type_instance=Osd.opBeforeDequeueOpLat
> type_instance=Osd.opBeforeQueueOpLat
> type_instance=Osd.opLatency
> type_instance=Osd.opPrepareLatency
> type_instance=Osd.opProcessLatency
> type_instance=Osd.opRLatency
> type_instance=Osd.opRPrepareLatency
> type_instance=Osd.opRProcessLatency
> type_instance=Osd.opRwLatency
> type_instance=Osd.opRwPrepareLatency
> type_instance=Osd.opRwProcessLatency
> type_instance=Osd.opWLatency
> type_instance=Osd.opWPrepareLatency
> type_instance=Osd.opWProcessLatency
> type_instance=Osd.subopLatency
> type_instance=Osd.subopWLatency
> ...
> ...
>
>
>
>
>
> -----Original Message-----
> From: Alex Litvak [mailto:alexander.v.litvak-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
> Sent: zondag 29 september 2019 13:06
> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: [ceph-users] Commit and Apply latency on nautilus
>
> Hello everyone,
>
> I am running a number of parallel benchmark tests against the cluster
> that should be ready to go to production.
> I enabled prometheus to monitor various information and while cluster
> stays healthy through the tests with no errors or slow requests,
> I noticed an apply / commit latency jumping between 40 - 600 ms on
> multiple SSDs. At the same time op_read and op_write are on average
> below 0.25 ms in the worth case scenario.
>
> I am running nautilus 14.2.2, all bluestore, no separate NVME devices
> for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all drives Seagate
> Nytro 1551, osd spread across 6 nodes, running in
> containers. Each node has plenty of RAM with utilization ~ 25 GB during
> the benchmark runs.
>
> Here are benchmarks being run from 6 client systems in parallel,
> repeating the test for each block size in <4k,16k,128k,4M>.
>
> On rbd mapped partition local to each client:
>
> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
> --group_reporting --time_based --rwmixread=70
>
> On mounted cephfs volume with each client storing test file(s) in own
> sub-directory:
>
> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
> --group_reporting --time_based --rwmixread=70
>
> dbench -t 30 30
>
> Could you please let me know if huge jump in applied and committed
> latency is justified in my case and whether I can do anything to improve
> / fix it. Below is some additional cluster info.
>
> Thank you,
>
> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph osd df
> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL
> %USE VAR PGS STATUS
> 6 ssd 1.74609 1.00000 1.7 TiB 93 GiB 92 GiB 240 MiB 784 MiB 1.7
> TiB 5.21 0.90 44 up
> 12 ssd 1.74609 1.00000 1.7 TiB 98 GiB 97 GiB 118 MiB 906 MiB 1.7
> TiB 5.47 0.95 40 up
> 18 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 123 MiB 901 MiB 1.6
> TiB 5.73 0.99 47 up
> 24 ssd 3.49219 1.00000 3.5 TiB 222 GiB 221 GiB 134 MiB 890 MiB 3.3
> TiB 6.20 1.07 96 up
> 30 ssd 3.49219 1.00000 3.5 TiB 213 GiB 212 GiB 151 MiB 873 MiB 3.3
> TiB 5.95 1.03 93 up
> 35 ssd 3.49219 1.00000 3.5 TiB 203 GiB 202 GiB 301 MiB 723 MiB 3.3
> TiB 5.67 0.98 100 up
> 5 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB 123 MiB 901 MiB 1.6
> TiB 5.78 1.00 49 up
> 11 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB 63 MiB 961 MiB 1.6
> TiB 6.09 1.05 46 up
> 17 ssd 1.74609 1.00000 1.7 TiB 104 GiB 103 GiB 205 MiB 819 MiB 1.6
> TiB 5.81 1.01 50 up
> 23 ssd 3.49219 1.00000 3.5 TiB 210 GiB 209 GiB 168 MiB 856 MiB 3.3
> TiB 5.86 1.01 86 up
> 29 ssd 3.49219 1.00000 3.5 TiB 204 GiB 203 GiB 272 MiB 752 MiB 3.3
> TiB 5.69 0.98 92 up
> 34 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 295 MiB 729 MiB 3.3
> TiB 5.54 0.96 85 up
> 4 ssd 1.74609 1.00000 1.7 TiB 119 GiB 118 GiB 16 KiB 1024 MiB 1.6
> TiB 6.67 1.15 50 up
> 10 ssd 1.74609 1.00000 1.7 TiB 95 GiB 94 GiB 183 MiB 841 MiB 1.7
> TiB 5.31 0.92 46 up
> 16 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 122 MiB 902 MiB 1.6
> TiB 5.72 0.99 50 up
> 22 ssd 3.49219 1.00000 3.5 TiB 218 GiB 217 GiB 109 MiB 915 MiB 3.3
> TiB 6.11 1.06 91 up
> 28 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 343 MiB 681 MiB 3.3
> TiB 5.54 0.96 95 up
> 33 ssd 3.49219 1.00000 3.5 TiB 198 GiB 196 GiB 297 MiB 1019 MiB 3.3
> TiB 5.53 0.96 85 up
> 1 ssd 1.74609 1.00000 1.7 TiB 101 GiB 100 GiB 222 MiB 802 MiB 1.6
> TiB 5.63 0.97 49 up
> 7 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 153 MiB 871 MiB 1.6
> TiB 5.69 0.99 46 up
> 13 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB 67 MiB 957 MiB 1.6
> TiB 5.96 1.03 42 up
> 19 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 179 MiB 845 MiB 3.3
> TiB 5.77 1.00 83 up
> 25 ssd 3.49219 1.00000 3.5 TiB 195 GiB 194 GiB 352 MiB 672 MiB 3.3
> TiB 5.45 0.94 97 up
> 31 ssd 3.49219 1.00000 3.5 TiB 201 GiB 200 GiB 305 MiB 719 MiB 3.3
> TiB 5.62 0.97 90 up
> 0 ssd 1.74609 1.00000 1.7 TiB 110 GiB 109 GiB 29 MiB 995 MiB 1.6
> TiB 6.14 1.06 43 up
> 3 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB 28 MiB 996 MiB 1.6
> TiB 6.07 1.05 41 up
> 9 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB 149 MiB 875 MiB 1.6
> TiB 5.76 1.00 52 up
> 15 ssd 3.49219 1.00000 3.5 TiB 209 GiB 208 GiB 253 MiB 771 MiB 3.3
> TiB 5.83 1.01 98 up
> 21 ssd 3.49219 1.00000 3.5 TiB 199 GiB 198 GiB 302 MiB 722 MiB 3.3
> TiB 5.56 0.96 90 up
> 27 ssd 3.49219 1.00000 3.5 TiB 208 GiB 207 GiB 226 MiB 798 MiB 3.3
> TiB 5.81 1.00 95 up
> 2 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB 158 MiB 866 MiB 1.7
> TiB 5.35 0.93 45 up
> 8 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB 132 MiB 892 MiB 1.6
> TiB 5.91 1.02 50 up
> 14 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB 180 MiB 844 MiB 1.7
> TiB 5.35 0.92 46 up
> 20 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 156 MiB 868 MiB 3.3
> TiB 6.18 1.07 101 up
> 26 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 332 MiB 692 MiB 3.3
> TiB 5.76 1.00 92 up
> 32 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 88 MiB 936 MiB 3.3
> TiB 6.18 1.07 91 up
> TOTAL 94 TiB 5.5 TiB 5.4 TiB 6.4 GiB 30 GiB 89
> TiB 5.78
> MIN/MAX VAR: 0.90/1.15 STDDEV: 0.30
>
>
> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph -s
> cluster:
> id: 9b4468b7-5bf2-4964-8aec-4b2f4bee87ad
> health: HEALTH_OK
>
> services:
> mon: 3 daemons, quorum storage2n1-la,storage2n2-la,storage2n3-la
> (age 9w)
> mgr: storage2n2-la(active, since 9w), standbys: storage2n1-la,
> storage2n3-la
> mds: cephfs:1 {0=storage2n6-la=up:active} 1 up:standby-replay 1
> up:standby
> osd: 36 osds: 36 up (since 9w), 36 in (since 9w)
>
> data:
> pools: 3 pools, 832 pgs
> objects: 4.18M objects, 1.8 TiB
> usage: 5.5 TiB used, 89 TiB / 94 TiB avail
> pgs: 832 active+clean
>
> io:
> client: 852 B/s rd, 15 KiB/s wr, 4 op/s rd, 2 op/s wr
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
[-- Attachment #1.2: Type: text/html, Size: 11026 bytes --]
[-- Attachment #2: Type: text/plain, Size: 178 bytes --]
_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Commit and Apply latency on nautilus
[not found] ` <CALi_L4-fNi=gP9sOCWPNcok9tVG=K-rtER68n1s9bkZzwuGhEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-09-30 19:44 ` Paul Emmerich
[not found] ` <CAD9yTbEzPJwAqVgn2fWtjZCG8zFnAgjvtMOnO-+FJd4XQx364Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Paul Emmerich @ 2019-09-30 19:44 UTC (permalink / raw)
To: Sasha Litvak; +Cc: ceph-users, ceph-devel
BTW: commit and apply latency are the exact same thing since
BlueStore, so don't bother looking at both.
In fact you should mostly be looking at the op_*_latency counters
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon, Sep 30, 2019 at 8:46 PM Sasha Litvak
<alexander.v.litvak@gmail.com> wrote:
>
> In my case, I am using premade Prometheus sourced dashboards in grafana.
>
> For individual latency, the query looks like that
>
> irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"$osd"}[1m]) / on (ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])
> irate(ceph_osd_op_w_latency_sum{ceph_daemon=~"$osd"}[1m]) / on (ceph_daemon) irate(ceph_osd_op_w_latency_count[1m])
>
> The other ones use
>
> ceph_osd_commit_latency_ms
> ceph_osd_apply_latency_ms
>
> and graph the distribution of it over time
>
> Also, average OSD op latency
>
> avg(rate(ceph_osd_op_r_latency_sum{cluster="$cluster"}[5m]) / rate(ceph_osd_op_r_latency_count{cluster="$cluster"}[5m]) >= 0)
> avg(rate(ceph_osd_op_w_latency_sum{cluster="$cluster"}[5m]) / rate(ceph_osd_op_w_latency_count{cluster="$cluster"}[5m]) >= 0)
>
> Average OSD apply + commit latency
> avg(ceph_osd_apply_latency_ms{cluster="$cluster"})
> avg(ceph_osd_commit_latency_ms{cluster="$cluster"})
>
>
> On Mon, Sep 30, 2019 at 11:13 AM Marc Roos <M.Roos@f1-outsourcing.eu> wrote:
>>
>>
>> What parameters are you exactly using? I want to do a similar test on
>> luminous, before I upgrade to Nautilus. I have quite a lot (74+)
>>
>> type_instance=Osd.opBeforeDequeueOpLat
>> type_instance=Osd.opBeforeQueueOpLat
>> type_instance=Osd.opLatency
>> type_instance=Osd.opPrepareLatency
>> type_instance=Osd.opProcessLatency
>> type_instance=Osd.opRLatency
>> type_instance=Osd.opRPrepareLatency
>> type_instance=Osd.opRProcessLatency
>> type_instance=Osd.opRwLatency
>> type_instance=Osd.opRwPrepareLatency
>> type_instance=Osd.opRwProcessLatency
>> type_instance=Osd.opWLatency
>> type_instance=Osd.opWPrepareLatency
>> type_instance=Osd.opWProcessLatency
>> type_instance=Osd.subopLatency
>> type_instance=Osd.subopWLatency
>> ...
>> ...
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Alex Litvak [mailto:alexander.v.litvak@gmail.com]
>> Sent: zondag 29 september 2019 13:06
>> To: ceph-users@lists.ceph.com
>> Cc: ceph-devel@vger.kernel.org
>> Subject: [ceph-users] Commit and Apply latency on nautilus
>>
>> Hello everyone,
>>
>> I am running a number of parallel benchmark tests against the cluster
>> that should be ready to go to production.
>> I enabled prometheus to monitor various information and while cluster
>> stays healthy through the tests with no errors or slow requests,
>> I noticed an apply / commit latency jumping between 40 - 600 ms on
>> multiple SSDs. At the same time op_read and op_write are on average
>> below 0.25 ms in the worth case scenario.
>>
>> I am running nautilus 14.2.2, all bluestore, no separate NVME devices
>> for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all drives Seagate
>> Nytro 1551, osd spread across 6 nodes, running in
>> containers. Each node has plenty of RAM with utilization ~ 25 GB during
>> the benchmark runs.
>>
>> Here are benchmarks being run from 6 client systems in parallel,
>> repeating the test for each block size in <4k,16k,128k,4M>.
>>
>> On rbd mapped partition local to each client:
>>
>> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
>> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
>> --group_reporting --time_based --rwmixread=70
>>
>> On mounted cephfs volume with each client storing test file(s) in own
>> sub-directory:
>>
>> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
>> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
>> --group_reporting --time_based --rwmixread=70
>>
>> dbench -t 30 30
>>
>> Could you please let me know if huge jump in applied and committed
>> latency is justified in my case and whether I can do anything to improve
>> / fix it. Below is some additional cluster info.
>>
>> Thank you,
>>
>> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph osd df
>> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL
>> %USE VAR PGS STATUS
>> 6 ssd 1.74609 1.00000 1.7 TiB 93 GiB 92 GiB 240 MiB 784 MiB 1.7
>> TiB 5.21 0.90 44 up
>> 12 ssd 1.74609 1.00000 1.7 TiB 98 GiB 97 GiB 118 MiB 906 MiB 1.7
>> TiB 5.47 0.95 40 up
>> 18 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 123 MiB 901 MiB 1.6
>> TiB 5.73 0.99 47 up
>> 24 ssd 3.49219 1.00000 3.5 TiB 222 GiB 221 GiB 134 MiB 890 MiB 3.3
>> TiB 6.20 1.07 96 up
>> 30 ssd 3.49219 1.00000 3.5 TiB 213 GiB 212 GiB 151 MiB 873 MiB 3.3
>> TiB 5.95 1.03 93 up
>> 35 ssd 3.49219 1.00000 3.5 TiB 203 GiB 202 GiB 301 MiB 723 MiB 3.3
>> TiB 5.67 0.98 100 up
>> 5 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB 123 MiB 901 MiB 1.6
>> TiB 5.78 1.00 49 up
>> 11 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB 63 MiB 961 MiB 1.6
>> TiB 6.09 1.05 46 up
>> 17 ssd 1.74609 1.00000 1.7 TiB 104 GiB 103 GiB 205 MiB 819 MiB 1.6
>> TiB 5.81 1.01 50 up
>> 23 ssd 3.49219 1.00000 3.5 TiB 210 GiB 209 GiB 168 MiB 856 MiB 3.3
>> TiB 5.86 1.01 86 up
>> 29 ssd 3.49219 1.00000 3.5 TiB 204 GiB 203 GiB 272 MiB 752 MiB 3.3
>> TiB 5.69 0.98 92 up
>> 34 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 295 MiB 729 MiB 3.3
>> TiB 5.54 0.96 85 up
>> 4 ssd 1.74609 1.00000 1.7 TiB 119 GiB 118 GiB 16 KiB 1024 MiB 1.6
>> TiB 6.67 1.15 50 up
>> 10 ssd 1.74609 1.00000 1.7 TiB 95 GiB 94 GiB 183 MiB 841 MiB 1.7
>> TiB 5.31 0.92 46 up
>> 16 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 122 MiB 902 MiB 1.6
>> TiB 5.72 0.99 50 up
>> 22 ssd 3.49219 1.00000 3.5 TiB 218 GiB 217 GiB 109 MiB 915 MiB 3.3
>> TiB 6.11 1.06 91 up
>> 28 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 343 MiB 681 MiB 3.3
>> TiB 5.54 0.96 95 up
>> 33 ssd 3.49219 1.00000 3.5 TiB 198 GiB 196 GiB 297 MiB 1019 MiB 3.3
>> TiB 5.53 0.96 85 up
>> 1 ssd 1.74609 1.00000 1.7 TiB 101 GiB 100 GiB 222 MiB 802 MiB 1.6
>> TiB 5.63 0.97 49 up
>> 7 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 153 MiB 871 MiB 1.6
>> TiB 5.69 0.99 46 up
>> 13 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB 67 MiB 957 MiB 1.6
>> TiB 5.96 1.03 42 up
>> 19 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 179 MiB 845 MiB 3.3
>> TiB 5.77 1.00 83 up
>> 25 ssd 3.49219 1.00000 3.5 TiB 195 GiB 194 GiB 352 MiB 672 MiB 3.3
>> TiB 5.45 0.94 97 up
>> 31 ssd 3.49219 1.00000 3.5 TiB 201 GiB 200 GiB 305 MiB 719 MiB 3.3
>> TiB 5.62 0.97 90 up
>> 0 ssd 1.74609 1.00000 1.7 TiB 110 GiB 109 GiB 29 MiB 995 MiB 1.6
>> TiB 6.14 1.06 43 up
>> 3 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB 28 MiB 996 MiB 1.6
>> TiB 6.07 1.05 41 up
>> 9 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB 149 MiB 875 MiB 1.6
>> TiB 5.76 1.00 52 up
>> 15 ssd 3.49219 1.00000 3.5 TiB 209 GiB 208 GiB 253 MiB 771 MiB 3.3
>> TiB 5.83 1.01 98 up
>> 21 ssd 3.49219 1.00000 3.5 TiB 199 GiB 198 GiB 302 MiB 722 MiB 3.3
>> TiB 5.56 0.96 90 up
>> 27 ssd 3.49219 1.00000 3.5 TiB 208 GiB 207 GiB 226 MiB 798 MiB 3.3
>> TiB 5.81 1.00 95 up
>> 2 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB 158 MiB 866 MiB 1.7
>> TiB 5.35 0.93 45 up
>> 8 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB 132 MiB 892 MiB 1.6
>> TiB 5.91 1.02 50 up
>> 14 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB 180 MiB 844 MiB 1.7
>> TiB 5.35 0.92 46 up
>> 20 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 156 MiB 868 MiB 3.3
>> TiB 6.18 1.07 101 up
>> 26 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 332 MiB 692 MiB 3.3
>> TiB 5.76 1.00 92 up
>> 32 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 88 MiB 936 MiB 3.3
>> TiB 6.18 1.07 91 up
>> TOTAL 94 TiB 5.5 TiB 5.4 TiB 6.4 GiB 30 GiB 89
>> TiB 5.78
>> MIN/MAX VAR: 0.90/1.15 STDDEV: 0.30
>>
>>
>> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph -s
>> cluster:
>> id: 9b4468b7-5bf2-4964-8aec-4b2f4bee87ad
>> health: HEALTH_OK
>>
>> services:
>> mon: 3 daemons, quorum storage2n1-la,storage2n2-la,storage2n3-la
>> (age 9w)
>> mgr: storage2n2-la(active, since 9w), standbys: storage2n1-la,
>> storage2n3-la
>> mds: cephfs:1 {0=storage2n6-la=up:active} 1 up:standby-replay 1
>> up:standby
>> osd: 36 osds: 36 up (since 9w), 36 in (since 9w)
>>
>> data:
>> pools: 3 pools, 832 pgs
>> objects: 4.18M objects, 1.8 TiB
>> usage: 5.5 TiB used, 89 TiB / 94 TiB avail
>> pgs: 832 active+clean
>>
>> io:
>> client: 852 B/s rd, 15 KiB/s wr, 4 op/s rd, 2 op/s wr
>>
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Commit and Apply latency on nautilus
[not found] ` <CAD9yTbEzPJwAqVgn2fWtjZCG8zFnAgjvtMOnO-+FJd4XQx364Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-10-01 0:12 ` Sasha Litvak
[not found] ` <CALi_L4_dzsu3r4FGpc6K6Ce3iz6JAZmKaVTB8LXrLqVNOH1ong-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Sasha Litvak @ 2019-10-01 0:12 UTC (permalink / raw)
To: Paul Emmerich; +Cc: ceph-users, ceph-devel
[-- Attachment #1.1: Type: text/plain, Size: 10800 bytes --]
At this point, I ran out of ideas. I changed nr_requests and readahead
parameters to 128->1024 and 128->4096, tuned nodes to
performance-throughput. However, I still get high latency during benchmark
testing. I attempted to disable cache on ssd
for i in {a..f}; do hdparm -W 0 -A 0 /dev/sd$i; done
and I think it make things not better at all. I have H740 and H730
controllers with drives in HBA mode.
Other them converting them one by one to RAID0 I am not sure what else I
can try.
Any suggestions?
On Mon, Sep 30, 2019 at 2:45 PM Paul Emmerich <paul.emmerich-iZXOWb+A65A@public.gmane.org>
wrote:
> BTW: commit and apply latency are the exact same thing since
> BlueStore, so don't bother looking at both.
>
> In fact you should mostly be looking at the op_*_latency counters
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Mon, Sep 30, 2019 at 8:46 PM Sasha Litvak
> <alexander.v.litvak-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> > In my case, I am using premade Prometheus sourced dashboards in grafana.
> >
> > For individual latency, the query looks like that
> >
> > irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
> (ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])
> > irate(ceph_osd_op_w_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
> (ceph_daemon) irate(ceph_osd_op_w_latency_count[1m])
> >
> > The other ones use
> >
> > ceph_osd_commit_latency_ms
> > ceph_osd_apply_latency_ms
> >
> > and graph the distribution of it over time
> >
> > Also, average OSD op latency
> >
> > avg(rate(ceph_osd_op_r_latency_sum{cluster="$cluster"}[5m]) /
> rate(ceph_osd_op_r_latency_count{cluster="$cluster"}[5m]) >= 0)
> > avg(rate(ceph_osd_op_w_latency_sum{cluster="$cluster"}[5m]) /
> rate(ceph_osd_op_w_latency_count{cluster="$cluster"}[5m]) >= 0)
> >
> > Average OSD apply + commit latency
> > avg(ceph_osd_apply_latency_ms{cluster="$cluster"})
> > avg(ceph_osd_commit_latency_ms{cluster="$cluster"})
> >
> >
> > On Mon, Sep 30, 2019 at 11:13 AM Marc Roos <M.Roos-yG4tGvGIC004hcxptnrGZodd74u8MsAO@public.gmane.org>
> wrote:
> >>
> >>
> >> What parameters are you exactly using? I want to do a similar test on
> >> luminous, before I upgrade to Nautilus. I have quite a lot (74+)
> >>
> >> type_instance=Osd.opBeforeDequeueOpLat
> >> type_instance=Osd.opBeforeQueueOpLat
> >> type_instance=Osd.opLatency
> >> type_instance=Osd.opPrepareLatency
> >> type_instance=Osd.opProcessLatency
> >> type_instance=Osd.opRLatency
> >> type_instance=Osd.opRPrepareLatency
> >> type_instance=Osd.opRProcessLatency
> >> type_instance=Osd.opRwLatency
> >> type_instance=Osd.opRwPrepareLatency
> >> type_instance=Osd.opRwProcessLatency
> >> type_instance=Osd.opWLatency
> >> type_instance=Osd.opWPrepareLatency
> >> type_instance=Osd.opWProcessLatency
> >> type_instance=Osd.subopLatency
> >> type_instance=Osd.subopWLatency
> >> ...
> >> ...
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Alex Litvak [mailto:alexander.v.litvak-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
> >> Sent: zondag 29 september 2019 13:06
> >> To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Subject: [ceph-users] Commit and Apply latency on nautilus
> >>
> >> Hello everyone,
> >>
> >> I am running a number of parallel benchmark tests against the cluster
> >> that should be ready to go to production.
> >> I enabled prometheus to monitor various information and while cluster
> >> stays healthy through the tests with no errors or slow requests,
> >> I noticed an apply / commit latency jumping between 40 - 600 ms on
> >> multiple SSDs. At the same time op_read and op_write are on average
> >> below 0.25 ms in the worth case scenario.
> >>
> >> I am running nautilus 14.2.2, all bluestore, no separate NVME devices
> >> for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all drives Seagate
> >> Nytro 1551, osd spread across 6 nodes, running in
> >> containers. Each node has plenty of RAM with utilization ~ 25 GB during
> >> the benchmark runs.
> >>
> >> Here are benchmarks being run from 6 client systems in parallel,
> >> repeating the test for each block size in <4k,16k,128k,4M>.
> >>
> >> On rbd mapped partition local to each client:
> >>
> >> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
> >> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
> >> --group_reporting --time_based --rwmixread=70
> >>
> >> On mounted cephfs volume with each client storing test file(s) in own
> >> sub-directory:
> >>
> >> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
> >> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
> >> --group_reporting --time_based --rwmixread=70
> >>
> >> dbench -t 30 30
> >>
> >> Could you please let me know if huge jump in applied and committed
> >> latency is justified in my case and whether I can do anything to improve
> >> / fix it. Below is some additional cluster info.
> >>
> >> Thank you,
> >>
> >> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph osd
> df
> >> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL
> >> %USE VAR PGS STATUS
> >> 6 ssd 1.74609 1.00000 1.7 TiB 93 GiB 92 GiB 240 MiB 784 MiB 1.7
> >> TiB 5.21 0.90 44 up
> >> 12 ssd 1.74609 1.00000 1.7 TiB 98 GiB 97 GiB 118 MiB 906 MiB 1.7
> >> TiB 5.47 0.95 40 up
> >> 18 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 123 MiB 901 MiB 1.6
> >> TiB 5.73 0.99 47 up
> >> 24 ssd 3.49219 1.00000 3.5 TiB 222 GiB 221 GiB 134 MiB 890 MiB 3.3
> >> TiB 6.20 1.07 96 up
> >> 30 ssd 3.49219 1.00000 3.5 TiB 213 GiB 212 GiB 151 MiB 873 MiB 3.3
> >> TiB 5.95 1.03 93 up
> >> 35 ssd 3.49219 1.00000 3.5 TiB 203 GiB 202 GiB 301 MiB 723 MiB 3.3
> >> TiB 5.67 0.98 100 up
> >> 5 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB 123 MiB 901 MiB 1.6
> >> TiB 5.78 1.00 49 up
> >> 11 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB 63 MiB 961 MiB 1.6
> >> TiB 6.09 1.05 46 up
> >> 17 ssd 1.74609 1.00000 1.7 TiB 104 GiB 103 GiB 205 MiB 819 MiB 1.6
> >> TiB 5.81 1.01 50 up
> >> 23 ssd 3.49219 1.00000 3.5 TiB 210 GiB 209 GiB 168 MiB 856 MiB 3.3
> >> TiB 5.86 1.01 86 up
> >> 29 ssd 3.49219 1.00000 3.5 TiB 204 GiB 203 GiB 272 MiB 752 MiB 3.3
> >> TiB 5.69 0.98 92 up
> >> 34 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 295 MiB 729 MiB 3.3
> >> TiB 5.54 0.96 85 up
> >> 4 ssd 1.74609 1.00000 1.7 TiB 119 GiB 118 GiB 16 KiB 1024 MiB 1.6
> >> TiB 6.67 1.15 50 up
> >> 10 ssd 1.74609 1.00000 1.7 TiB 95 GiB 94 GiB 183 MiB 841 MiB 1.7
> >> TiB 5.31 0.92 46 up
> >> 16 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 122 MiB 902 MiB 1.6
> >> TiB 5.72 0.99 50 up
> >> 22 ssd 3.49219 1.00000 3.5 TiB 218 GiB 217 GiB 109 MiB 915 MiB 3.3
> >> TiB 6.11 1.06 91 up
> >> 28 ssd 3.49219 1.00000 3.5 TiB 198 GiB 197 GiB 343 MiB 681 MiB 3.3
> >> TiB 5.54 0.96 95 up
> >> 33 ssd 3.49219 1.00000 3.5 TiB 198 GiB 196 GiB 297 MiB 1019 MiB 3.3
> >> TiB 5.53 0.96 85 up
> >> 1 ssd 1.74609 1.00000 1.7 TiB 101 GiB 100 GiB 222 MiB 802 MiB 1.6
> >> TiB 5.63 0.97 49 up
> >> 7 ssd 1.74609 1.00000 1.7 TiB 102 GiB 101 GiB 153 MiB 871 MiB 1.6
> >> TiB 5.69 0.99 46 up
> >> 13 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB 67 MiB 957 MiB 1.6
> >> TiB 5.96 1.03 42 up
> >> 19 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 179 MiB 845 MiB 3.3
> >> TiB 5.77 1.00 83 up
> >> 25 ssd 3.49219 1.00000 3.5 TiB 195 GiB 194 GiB 352 MiB 672 MiB 3.3
> >> TiB 5.45 0.94 97 up
> >> 31 ssd 3.49219 1.00000 3.5 TiB 201 GiB 200 GiB 305 MiB 719 MiB 3.3
> >> TiB 5.62 0.97 90 up
> >> 0 ssd 1.74609 1.00000 1.7 TiB 110 GiB 109 GiB 29 MiB 995 MiB 1.6
> >> TiB 6.14 1.06 43 up
> >> 3 ssd 1.74609 1.00000 1.7 TiB 109 GiB 108 GiB 28 MiB 996 MiB 1.6
> >> TiB 6.07 1.05 41 up
> >> 9 ssd 1.74609 1.00000 1.7 TiB 103 GiB 102 GiB 149 MiB 875 MiB 1.6
> >> TiB 5.76 1.00 52 up
> >> 15 ssd 3.49219 1.00000 3.5 TiB 209 GiB 208 GiB 253 MiB 771 MiB 3.3
> >> TiB 5.83 1.01 98 up
> >> 21 ssd 3.49219 1.00000 3.5 TiB 199 GiB 198 GiB 302 MiB 722 MiB 3.3
> >> TiB 5.56 0.96 90 up
> >> 27 ssd 3.49219 1.00000 3.5 TiB 208 GiB 207 GiB 226 MiB 798 MiB 3.3
> >> TiB 5.81 1.00 95 up
> >> 2 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB 158 MiB 866 MiB 1.7
> >> TiB 5.35 0.93 45 up
> >> 8 ssd 1.74609 1.00000 1.7 TiB 106 GiB 105 GiB 132 MiB 892 MiB 1.6
> >> TiB 5.91 1.02 50 up
> >> 14 ssd 1.74609 1.00000 1.7 TiB 96 GiB 95 GiB 180 MiB 844 MiB 1.7
> >> TiB 5.35 0.92 46 up
> >> 20 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 156 MiB 868 MiB 3.3
> >> TiB 6.18 1.07 101 up
> >> 26 ssd 3.49219 1.00000 3.5 TiB 206 GiB 205 GiB 332 MiB 692 MiB 3.3
> >> TiB 5.76 1.00 92 up
> >> 32 ssd 3.49219 1.00000 3.5 TiB 221 GiB 220 GiB 88 MiB 936 MiB 3.3
> >> TiB 6.18 1.07 91 up
> >> TOTAL 94 TiB 5.5 TiB 5.4 TiB 6.4 GiB 30 GiB 89
> >> TiB 5.78
> >> MIN/MAX VAR: 0.90/1.15 STDDEV: 0.30
> >>
> >>
> >> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph -s
> >> cluster:
> >> id: 9b4468b7-5bf2-4964-8aec-4b2f4bee87ad
> >> health: HEALTH_OK
> >>
> >> services:
> >> mon: 3 daemons, quorum storage2n1-la,storage2n2-la,storage2n3-la
> >> (age 9w)
> >> mgr: storage2n2-la(active, since 9w), standbys: storage2n1-la,
> >> storage2n3-la
> >> mds: cephfs:1 {0=storage2n6-la=up:active} 1 up:standby-replay 1
> >> up:standby
> >> osd: 36 osds: 36 up (since 9w), 36 in (since 9w)
> >>
> >> data:
> >> pools: 3 pools, 832 pgs
> >> objects: 4.18M objects, 1.8 TiB
> >> usage: 5.5 TiB used, 89 TiB / 94 TiB avail
> >> pgs: 832 active+clean
> >>
> >> io:
> >> client: 852 B/s rd, 15 KiB/s wr, 4 op/s rd, 2 op/s wr
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
[-- Attachment #1.2: Type: text/html, Size: 14339 bytes --]
[-- Attachment #2: Type: text/plain, Size: 178 bytes --]
_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Commit and Apply latency on nautilus
[not found] ` <CALi_L4_dzsu3r4FGpc6K6Ce3iz6JAZmKaVTB8LXrLqVNOH1ong-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-10-01 14:54 ` Robert LeBlanc
[not found] ` <CAANLjFqW6WE_vfYd=39GVV8DgGJge+qBr52tHj+t0P3Aap4rBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Robert LeBlanc @ 2019-10-01 14:54 UTC (permalink / raw)
To: Sasha Litvak; +Cc: ceph-users, ceph-devel
On Mon, Sep 30, 2019 at 5:12 PM Sasha Litvak
<alexander.v.litvak-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> At this point, I ran out of ideas. I changed nr_requests and readahead parameters to 128->1024 and 128->4096, tuned nodes to performance-throughput. However, I still get high latency during benchmark testing. I attempted to disable cache on ssd
>
> for i in {a..f}; do hdparm -W 0 -A 0 /dev/sd$i; done
>
> and I think it make things not better at all. I have H740 and H730 controllers with drives in HBA mode.
>
> Other them converting them one by one to RAID0 I am not sure what else I can try.
>
> Any suggestions?
If you haven't already tried this, add this to your ceph.conf and
restart your OSDs, this should help bring down the variance in latency
(It will be the default in Octopus):
osd op queue = wpq
osd op queue cut off = high
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Commit and Apply latency on nautilus
[not found] ` <CAANLjFqW6WE_vfYd=39GVV8DgGJge+qBr52tHj+t0P3Aap4rBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-10-01 14:57 ` Robert LeBlanc
0 siblings, 0 replies; 7+ messages in thread
From: Robert LeBlanc @ 2019-10-01 14:57 UTC (permalink / raw)
To: Sasha Litvak; +Cc: ceph-users, ceph-devel
On Tue, Oct 1, 2019 at 7:54 AM Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
>
> On Mon, Sep 30, 2019 at 5:12 PM Sasha Litvak
> <alexander.v.litvak-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> > At this point, I ran out of ideas. I changed nr_requests and readahead parameters to 128->1024 and 128->4096, tuned nodes to performance-throughput. However, I still get high latency during benchmark testing. I attempted to disable cache on ssd
> >
> > for i in {a..f}; do hdparm -W 0 -A 0 /dev/sd$i; done
> >
> > and I think it make things not better at all. I have H740 and H730 controllers with drives in HBA mode.
> >
> > Other them converting them one by one to RAID0 I am not sure what else I can try.
> >
> > Any suggestions?
>
> If you haven't already tried this, add this to your ceph.conf and
> restart your OSDs, this should help bring down the variance in latency
> (It will be the default in Octopus):
>
> osd op queue = wpq
> osd op queue cut off = high
I should clarify. This will reduce the variance in latency for client
OPs. If this counter is also including recovery/backfill/deep_scrub
OP-, then the latency can still be high as these settings make
recovery/backfill/deep_scrub less impactful to client I/O at the cost
of them possibly being delayed a bit.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-10-01 14:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-29 11:05 Commit and Apply latency on nautilus Alex Litvak
2019-09-30 16:13 ` Marc Roos
[not found] ` <H000007100150ea4.1569860022.sx.f1-outsourcing.eu*@MHS>
2019-09-30 18:45 ` Sasha Litvak
[not found] ` <CALi_L4-fNi=gP9sOCWPNcok9tVG=K-rtER68n1s9bkZzwuGhEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-09-30 19:44 ` Paul Emmerich
[not found] ` <CAD9yTbEzPJwAqVgn2fWtjZCG8zFnAgjvtMOnO-+FJd4XQx364Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-10-01 0:12 ` Sasha Litvak
[not found] ` <CALi_L4_dzsu3r4FGpc6K6Ce3iz6JAZmKaVTB8LXrLqVNOH1ong-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-10-01 14:54 ` Robert LeBlanc
[not found] ` <CAANLjFqW6WE_vfYd=39GVV8DgGJge+qBr52tHj+t0P3Aap4rBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-10-01 14:57 ` Robert LeBlanc
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.