* Re: Running a separate fio process for each disk?
[not found] ` <56464ACC.9030605@kernel.dk>
@ 2015-11-13 22:04 ` Allen Schade
2015-11-13 22:06 ` Jens Axboe
0 siblings, 1 reply; 13+ messages in thread
From: Allen Schade @ 2015-11-13 22:04 UTC (permalink / raw)
To: Jens Axboe; +Cc: fio
[-- Attachment #1: Type: text/plain, Size: 235 bytes --]
I'm actually launching a completely separate instance of fio for each disk.
I want to say its because when I ran them under the same fio process I had
issues with the json files merging the data in an unexpected way.
Version is 2.2.6
[-- Attachment #2: Type: text/html, Size: 285 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-11-13 22:04 ` Running a separate fio process for each disk? Allen Schade
@ 2015-11-13 22:06 ` Jens Axboe
2015-11-20 18:28 ` Allen Schade
0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-13 22:06 UTC (permalink / raw)
To: Allen Schade; +Cc: fio
On 11/13/2015 03:04 PM, Allen Schade wrote:
> I'm actually launching a completely separate instance of fio for each
> disk. I want to say its because when I ran them under the same fio
> process I had issues with the json files merging the data in an
> unexpected way.
OK - in any case, that should be fine.
> Version is 2.2.6
Could you try current -git? I vaguely remember some clock issue that
could have caused this.
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-11-13 22:06 ` Jens Axboe
@ 2015-11-20 18:28 ` Allen Schade
2015-11-20 19:37 ` Caio Villela
0 siblings, 1 reply; 13+ messages in thread
From: Allen Schade @ 2015-11-20 18:28 UTC (permalink / raw)
To: Jens Axboe, Caio Villela; +Cc: fio
Jens,
We tried with the latest version and gathered some detailed info.
Hey Caio,
Can you paste your experiment information as a reply here. Also switch
your email to plain text mode or the vger.kernel.org email address
will reject your email as spam.
On Fri, Nov 13, 2015 at 2:06 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 11/13/2015 03:04 PM, Allen Schade wrote:
>>
>> I'm actually launching a completely separate instance of fio for each
>> disk. I want to say its because when I ran them under the same fio
>> process I had issues with the json files merging the data in an
>> unexpected way.
>
>
> OK - in any case, that should be fine.
>
>> Version is 2.2.6
>
>
> Could you try current -git? I vaguely remember some clock issue that could
> have caused this.
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-11-20 18:28 ` Allen Schade
@ 2015-11-20 19:37 ` Caio Villela
2015-11-20 19:50 ` Jens Axboe
0 siblings, 1 reply; 13+ messages in thread
From: Caio Villela @ 2015-11-20 19:37 UTC (permalink / raw)
To: Allen Schade; +Cc: Jens Axboe, fio
[-- Attachment #1: Type: text/plain, Size: 27592 bytes --]
Hello Allen and Jens,
Sorry for the long output, this is just in case you want the details.
Here is a simple explanation for the problem. I want to run a 15 minute
random write, using 1 Meg requests, and measure throughput and latency.
What seems to be the problem is that if the test system has a large number
of drives - the system that I am testing here has 28 drives - then the time
accounting seems to go bad for some of the processes.
What you see below is that during the 15 minutes from start, all disks are
getting hit the same, as they should. Then, after 15 minutes, there are 15
drives that are still running.... after 5 minutes over the specified 15
minutes, there is still one drive running. Then looking at the amount of
IOs sent to each drive, the ones that ran on that excess time have much
more IOs. FIO still reports that all drives ran for 15 minutes, although
some ran for more than 20 minutes.
We will attempt to run a single process instead of 28 instances of FIO to
see if this goes away.
Details:
I have the following control file to do this:
cat write_random_1M.fio
# Synthetic Latency Analysis Experimental FIO Control File
# Copyright (C) 2015 CODENAME
# Shared with VENDOR under NDA.
# Authors: Caio Villela
# --------------------Global Settings--------------------#
[global]
runtime=900
ioengine=sync
time_based=1
norandommap=1
bwavgtime=5000
direct=1
thread=1
do_verify=0
numjobs=1
continue_on_error=io
ramp_time=10
# ----------------Pure Random Workloads----------------#
# Random Write Workload
# 100% Writes @1024k
[random_write_1024k]
rw=randwrite
bs=1048576
I am monitoring the number of IOs per drive by using the script
cat disp_total_iops_512k.sh
for i in b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac; do
echo -n "sd$i "; cat /sys/block/sd$i/write_request_histo | grep 524288 |
awk '{print $2 + $3 + $4 + $5 + $6 + $7 + $8 + $9 + $10 + $11 + $12}'; done
And as you can see it starts with all drives with zero IOs.
****___ EXPERIMENT WITH 28 DRIVES ____*****
Starts at 10:07:33, should end at 10:22:33
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:07:33 PST 2015
sdb 0
sdc 0
sdd 0
sde 0
sdf 0
sdg 0
sdh 0
sdi 0
sdj 0
sdk 0
sdl 0
sdm 0
sdn 0
sdo 0
sdp 0
sdq 0
sdr 0
sds 0
sdt 0
sdu 0
sdv 0
sdw 0
sdx 0
sdy 0
sdz 0
sdaa 0
sdab 0
sdac 0
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:09:24 PST 2015
sdb 21732
sdc 21835
sdd 21655
sde 21907
sdf 21949
sdg 21753
sdh 21863
sdi 21745
sdj 21679
sdk 21621
sdl 21894
sdm 21437
sdn 21555
sdo 21492
sdp 21677
sdq 21717
sdr 21736
sds 21350
sdt 21909
sdu 22082
sdv 22148
sdw 21778
sdx 21380
sdy 21770
sdz 21749
sdaa 22161
sdab 21798
sdac 21485
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:13:33 PST 2015
sdb 71328
sdc 71811
sdd 70860
sde 71680
sdf 71743
sdg 71171
sdh 71843
sdi 71512
sdj 71328
sdk 70977
sdl 71614
sdm 70715
sdn 70622
sdo 70424
sdp 71338
sdq 70990
sdr 71458
sds 70082
sdt 71995
sdu 72433
sdv 72504
sdw 71687
sdx 70402
sdy 71299
sdz 71376
sdaa 72729
sdab 71302
sdac 70775
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:22:25 PST 2015
sdb 177075
sdc 178134
sdd 176100
sde 178115
sdf 177950
sdg 176743
sdh 178478
sdi 177500
sdj 177239
sdk 176294
sdl 177594
sdm 175325
sdn 175189
sdo 174962
sdp 177076
sdq 176177
sdr 177336
sds 173842
sdt 178753
sdu 179514
sdv 179681
sdw 178229
sdx 174532
sdy 176912
sdz 176911
sdaa 180506
sdab 177070
sdac 175550
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:22:31 PST 2015 ---->> SHOULD END HERE !!!!
sdb 178204
sdc 179207
sdd 177169
sde 179235
sdf 179028
sdg 177822
sdh 179602
sdi 178644
sdj 178364
sdk 177397
sdl 178707
sdm 176411
sdn 176273
sdo 176054
sdp 178182
sdq 177258
sdr 178428
sds 174951
sdt 179844
sdu 180614
sdv 180792
sdw 179292
sdx 175625
sdy 178005
sdz 177983
sdaa 181612
sdab 178120
sdac 176646
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 41.50 1678.00 0.00 3356 0
sdb 170.00 0.00 87040.00 0 174080
sdc 186.50 0.00 95488.00 0 190976
sdd 210.00 0.00 107520.00 0 215040
sde 0.00 0.00 0.00 0 0
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
sdh 221.00 0.00 113152.00 0 226304
sdi 208.50 0.00 106752.00 0 213504
sdj 201.00 0.00 102912.00 0 205824
sdk 209.50 0.00 107264.00 0 214528
sdl 204.50 0.00 104704.00 0 209408
sdm 206.00 0.00 105472.00 0 210944
sdn 0.00 0.00 0.00 0 0
sdo 0.00 0.00 0.00 0 0
sdp 0.00 0.00 0.00 0 0
sdq 0.00 0.00 0.00 0 0
sdr 194.50 0.00 99584.00 0 199168
sds 192.00 0.00 98304.00 0 196608
sdt 198.00 0.00 101376.00 0 202752
sdu 0.00 0.00 0.00 0 0
sdv 0.00 0.00 0.00 0 0
sdw 0.00 0.00 0.00 0 0
sdx 211.00 0.00 108032.00 0 216064
sdy 205.00 0.00 104960.00 0 209920
sdz 0.00 0.00 0.00 0 0
sdaa 0.00 0.00 0.00 0 0
sdab 205.50 0.00 105216.00 0 210432
sdac 0.00 0.00 0.00 0 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 55.50 1842.00 32.00 3684 64
sdb 186.50 0.00 95488.00 0 190976
sdc 0.00 0.00 0.00 0 0
sdd 212.50 0.00 108800.00 0 217600
sde 0.00 0.00 0.00 0 0
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
sdh 217.00 0.00 111104.00 0 222208
sdi 206.50 0.00 105728.00 0 211456
sdj 195.50 0.00 100096.00 0 200192
sdk 201.00 0.00 102912.00 0 205824
sdl 199.00 0.00 101888.00 0 203776
sdm 0.00 0.00 0.00 0 0
sdn 0.00 0.00 0.00 0 0
sdo 0.00 0.00 0.00 0 0
sdp 0.00 0.00 0.00 0 0
sdq 0.00 0.00 0.00 0 0
sdr 181.00 0.00 92672.00 0 185344
sds 192.50 0.00 98560.00 0 197120
sdt 207.50 0.00 106240.00 0 212480
sdu 0.00 0.00 0.00 0 0
sdv 0.00 0.00 0.00 0 0
sdw 0.00 0.00 0.00 0 0
sdx 211.00 0.00 108032.00 0 216064
sdy 190.00 0.00 97280.00 0 194560
sdz 0.00 0.00 0.00 0 0
sdaa 0.00 0.00 0.00 0 0
sdab 202.50 0.00 103680.00 0 207360
sdac 0.00 0.00 0.00 0 0
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:23:12 PST 2015
sdb 186323
sdc 183418
sdd 185207
sde 182338
sdf 182166
sdg 181010
sdh 187822
sdi 186776
sdj 186457
sdk 184790
sdl 186859
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 186584
sds 181994
sdt 188087
sdu 183868
sdv 183976
sdw 182562
sdx 183664
sdy 186096
sdz 181240
sdaa 184892
sdab 186197
sdac 179804
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 17.00 362.00 56.00 724 112
sdb 207.50 0.00 106240.00 0 212480
sdc 0.00 0.00 0.00 0 0
sdd 190.50 0.00 97536.00 0 195072
sde 0.00 0.00 0.00 0 0
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
sdh 189.50 0.00 97024.00 0 194048
sdi 0.00 0.00 0.00 0 0
sdj 208.00 0.00 106496.00 0 212992
sdk 0.00 0.00 0.00 0 0
sdl 195.50 0.00 100096.00 0 200192
sdm 0.00 0.00 0.00 0 0
sdn 0.00 0.00 0.00 0 0
sdo 0.00 0.00 0.00 0 0
sdp 0.00 0.00 0.00 0 0
sdq 0.00 0.00 0.00 0 0
sdr 213.00 0.00 109056.00 0 218112
sds 0.00 0.00 0.00 0 0
sdt 208.00 0.00 106496.00 0 212992
sdu 0.00 0.00 0.00 0 0
sdv 0.00 0.00 0.00 0 0
sdw 0.00 0.00 0.00 0 0
sdx 199.50 0.00 102144.00 0 204288
sdy 189.50 0.00 97024.00 0 194048
sdz 0.00 0.00 0.00 0 0
sdaa 0.00 0.00 0.00 0 0
sdab 215.00 0.00 110080.00 0 220160
sdac 0.00 0.00 0.00 0 0
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:23:29 PST 2015
sdb 189869
sdc 183418
sdd 188716
sde 182338
sdf 182166
sdg 181010
sdh 191382
sdi 187256
sdj 190025
sdk 184790
sdl 190415
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 190136
sds 181994
sdt 191684
sdu 183868
sdv 183976
sdw 182562
sdx 187147
sdy 189641
sdz 181240
sdaa 184892
sdab 189731
sdac 179804
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 41.50 1712.00 0.00 3424 0
sdb 206.00 0.00 105472.00 0 210944
sdc 0.00 0.00 0.00 0 0
sdd 198.50 0.00 101632.00 0 203264
sde 0.00 0.00 0.00 0 0
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
sdh 189.00 0.00 96768.00 0 193536
sdi 0.00 0.00 0.00 0 0
sdj 199.50 0.00 102144.00 0 204288
sdk 0.00 0.00 0.00 0 0
sdl 203.00 0.00 103936.00 0 207872
sdm 0.00 0.00 0.00 0 0
sdn 0.00 0.00 0.00 0 0
sdo 0.00 0.00 0.00 0 0
sdp 0.00 0.00 0.00 0 0
sdq 0.00 0.00 0.00 0 0
sdr 201.00 0.00 102912.00 0 205824
sds 0.00 0.00 0.00 0 0
sdt 195.00 0.00 99840.00 0 199680
sdu 0.00 0.00 0.00 0 0
sdv 0.00 0.00 0.00 0 0
sdw 0.00 0.00 0.00 0 0
sdx 202.00 0.00 103424.00 0 206848
sdy 31.50 0.00 16128.00 0 32256
sdz 0.00 0.00 0.00 0 0
sdaa 0.00 0.00 0.00 0 0
sdab 0.00 0.00 0.00 0 0
sdac 0.00 0.00 0.00 0 0
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:23:55 PST 2015
sdb 194927
sdc 183418
sdd 193720
sde 182338
sdf 182166
sdg 181010
sdh 196466
sdi 187256
sdj 195111
sdk 184790
sdl 195463
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 195167
sds 181994
sdt 196775
sdu 183868
sdv 183976
sdw 182562
sdx 192105
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 61.50 2086.00 0.00 4172 0
sdb 189.00 0.00 96768.00 0 193536
sdc 0.00 0.00 0.00 0 0
sdd 200.00 0.00 102400.00 0 204800
sde 0.00 0.00 0.00 0 0
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
sdh 0.00 0.00 0.00 0 0
sdi 0.00 0.00 0.00 0 0
sdj 0.00 0.00 0.00 0 0
sdk 0.00 0.00 0.00 0 0
sdl 184.50 0.00 94464.00 0 188928
sdm 0.00 0.00 0.00 0 0
sdn 0.00 0.00 0.00 0 0
sdo 0.00 0.00 0.00 0 0
sdp 0.00 0.00 0.00 0 0
sdq 0.00 0.00 0.00 0 0
sdr 215.50 0.00 110336.00 0 220672
sds 0.00 0.00 0.00 0 0
sdt 223.00 0.00 114176.00 0 228352
sdu 0.00 0.00 0.00 0 0
sdv 0.00 0.00 0.00 0 0
sdw 0.00 0.00 0.00 0 0
sdx 205.00 0.00 104960.00 0 209920
sdy 0.00 0.00 0.00 0 0
sdz 0.00 0.00 0.00 0 0
sdaa 0.00 0.00 0.00 0 0
sdab 0.00 0.00 0.00 0 0
sdac 0.00 0.00 0.00 0 0
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:24:36 PST 2015
sdb 203100
sdc 183418
sdd 201814
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 203577
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 203330
sds 181994
sdt 204983
sdu 183868
sdv 183976
sdw 182562
sdx 200165
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 18.50 2060.00 8.00 4120 16
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 204.50 0.00 104704.00 0 209408
sde 0.00 0.00 0.00 0 0
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
sdh 0.00 0.00 0.00 0 0
sdi 0.00 0.00 0.00 0 0
sdj 0.00 0.00 0.00 0 0
sdk 0.00 0.00 0.00 0 0
sdl 209.50 0.00 107264.00 0 214528
sdm 0.00 0.00 0.00 0 0
sdn 0.00 0.00 0.00 0 0
sdo 0.00 0.00 0.00 0 0
sdp 0.00 0.00 0.00 0 0
sdq 0.00 0.00 0.00 0 0
sdr 208.00 0.00 106496.00 0 212992
sds 0.00 0.00 0.00 0 0
sdt 210.00 0.00 107520.00 0 215040
sdu 0.00 0.00 0.00 0 0
sdv 0.00 0.00 0.00 0 0
sdw 0.00 0.00 0.00 0 0
sdx 0.00 0.00 0.00 0 0
sdy 0.00 0.00 0.00 0 0
sdz 0.00 0.00 0.00 0 0
sdaa 0.00 0.00 0.00 0 0
sdab 0.00 0.00 0.00 0 0
sdac 0.00 0.00 0.00 0 0
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:25:09 PST 2015
sdb 207564
sdc 183418
sdd 208503
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 210362
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 210114
sds 181994
sdt 211744
sdu 183868
sdv 183976
sdw 182562
sdx 201236
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 19.00 2050.00 12.00 4100 24
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 195.00 0.00 99840.00 0 199680
sde 0.00 0.00 0.00 0 0
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
sdh 0.00 0.00 0.00 0 0
sdi 0.00 0.00 0.00 0 0
sdj 0.00 0.00 0.00 0 0
sdk 0.00 0.00 0.00 0 0
sdl 197.00 0.00 100864.00 0 201728
sdm 0.00 0.00 0.00 0 0
sdn 0.00 0.00 0.00 0 0
sdo 0.00 0.00 0.00 0 0
sdp 0.00 0.00 0.00 0 0
sdq 0.00 0.00 0.00 0 0
sdr 212.00 0.00 108544.00 0 217088
sds 0.00 0.00 0.00 0 0
sdt 198.00 0.00 101376.00 0 202752
sdu 0.00 0.00 0.00 0 0
sdv 0.00 0.00 0.00 0 0
sdw 0.00 0.00 0.00 0 0
sdx 0.00 0.00 0.00 0 0
sdy 0.00 0.00 0.00 0 0
sdz 0.00 0.00 0.00 0 0
sdaa 0.00 0.00 0.00 0 0
sdab 0.00 0.00 0.00 0 0
sdac 0.00 0.00 0.00 0 0
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:26:12 PST 2015
sdb 207564
sdc 183418
sdd 220962
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 222883
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 222560
sds 181994
sdt 224287
sdu 183868
sdv 183976
sdw 182562
sdx 201236
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 19.50 2128.00 0.00 4256 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 219.00 0.00 112128.00 0 224256
sde 0.00 0.00 0.00 0 0
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
sdh 0.00 0.00 0.00 0 0
sdi 0.00 0.00 0.00 0 0
sdj 0.00 0.00 0.00 0 0
sdk 0.00 0.00 0.00 0 0
sdl 0.00 0.00 0.00 0 0
sdm 0.00 0.00 0.00 0 0
sdn 0.00 0.00 0.00 0 0
sdo 0.00 0.00 0.00 0 0
sdp 0.00 0.00 0.00 0 0
sdq 0.00 0.00 0.00 0 0
sdr 207.00 0.00 105984.00 0 211968
sds 0.00 0.00 0.00 0 0
sdt 183.50 0.00 93952.00 0 187904
sdu 0.00 0.00 0.00 0 0
sdv 0.00 0.00 0.00 0 0
sdw 0.00 0.00 0.00 0 0
sdx 0.00 0.00 0.00 0 0
sdy 0.00 0.00 0.00 0 0
sdz 0.00 0.00 0.00 0 0
sdaa 0.00 0.00 0.00 0 0
sdab 0.00 0.00 0.00 0 0
sdac 0.00 0.00 0.00 0 0
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:27:00 PST 2015
sdb 207564
sdc 183418
sdd 230499
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 228202
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 232184
sds 181994
sdt 234098
sdu 183868
sdv 183976
sdw 182562
sdx 201236
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 19.00 2056.00 8.00 4112 16
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
sdh 0.00 0.00 0.00 0 0
sdi 0.00 0.00 0.00 0 0
sdj 0.00 0.00 0.00 0 0
sdk 0.00 0.00 0.00 0 0
sdl 0.00 0.00 0.00 0 0
sdm 0.00 0.00 0.00 0 0
sdn 0.00 0.00 0.00 0 0
sdo 0.00 0.00 0.00 0 0
sdp 0.00 0.00 0.00 0 0
sdq 0.00 0.00 0.00 0 0
sdr 197.00 0.00 100864.00 0 201728
sds 0.00 0.00 0.00 0 0
sdt 0.00 0.00 0.00 0 0
sdu 0.00 0.00 0.00 0 0
sdv 0.00 0.00 0.00 0 0
sdw 0.00 0.00 0.00 0 0
sdx 0.00 0.00 0.00 0 0
sdy 0.00 0.00 0.00 0 0
sdz 0.00 0.00 0.00 0 0
sdaa 0.00 0.00 0.00 0 0
sdab 0.00 0.00 0.00 0 0
sdac 0.00 0.00 0.00 0 0
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:27:31 PST 2015
sdb 207564
sdc 183418
sdd 232922
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 228202
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 238277
sds 181994
sdt 238404
sdu 183868
sdv 183976
sdw 182562
sdx 201236
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804
All done at more than 5 minutes past the 15 minute mark.
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:27:51 PST 2015
sdb 207564
sdc 183418
sdd 232922
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 228202
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 238426
sds 181994
sdt 238404
sdu 183868
sdv 183976
sdw 182562
sdx 201236
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804
On Fri, Nov 20, 2015 at 10:28 AM, Allen Schade <aschade@google.com> wrote:
> Jens,
> We tried with the latest version and gathered some detailed info.
>
> Hey Caio,
> Can you paste your experiment information as a reply here. Also switch
> your email to plain text mode or the vger.kernel.org email address
> will reject your email as spam.
>
> On Fri, Nov 13, 2015 at 2:06 PM, Jens Axboe <axboe@kernel.dk> wrote:
> > On 11/13/2015 03:04 PM, Allen Schade wrote:
> >>
> >> I'm actually launching a completely separate instance of fio for each
> >> disk. I want to say its because when I ran them under the same fio
> >> process I had issues with the json files merging the data in an
> >> unexpected way.
> >
> >
> > OK - in any case, that should be fine.
> >
> >> Version is 2.2.6
> >
> >
> > Could you try current -git? I vaguely remember some clock issue that
> could
> > have caused this.
> >
> > --
> > Jens Axboe
> >
>
[-- Attachment #2: Type: text/html, Size: 90856 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-11-20 19:37 ` Caio Villela
@ 2015-11-20 19:50 ` Jens Axboe
2015-11-20 22:20 ` Akash Verma
0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-20 19:50 UTC (permalink / raw)
To: Caio Villela, Allen Schade; +Cc: fio
On 11/20/2015 12:37 PM, Caio Villela wrote:
> Hello Allen and Jens,
>
> Sorry for the long output, this is just in case you want the details.
> Here is a simple explanation for the problem. I want to run a 15 minute
> random write, using 1 Meg requests, and measure throughput and latency.
> What seems to be the problem is that if the test system has a large
> number of drives - the system that I am testing here has 28 drives -
> then the time accounting seems to go bad for some of the processes.
> What you see below is that during the 15 minutes from start, all disks
> are getting hit the same, as they should. Then, after 15 minutes, there
> are 15 drives that are still running.... after 5 minutes over the
> specified 15 minutes, there is still one drive running. Then looking at
> the amount of IOs sent to each drive, the ones that ran on that excess
> time have much more IOs. FIO still reports that all drives ran for 15
> minutes, although some ran for more than 20 minutes.
>
> We will attempt to run a single process instead of 28 instances of FIO
> to see if this goes away.
Could you also check if adding clocksource=gettimeofday makes any
difference? This sounds very odd.
Assuming this was run with fio -git?
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-11-20 19:50 ` Jens Axboe
@ 2015-11-20 22:20 ` Akash Verma
2015-11-21 0:03 ` Jens Axboe
0 siblings, 1 reply; 13+ messages in thread
From: Akash Verma @ 2015-11-20 22:20 UTC (permalink / raw)
To: Jens Axboe; +Cc: Caio Villela, Allen Schade, fio
Hi Jens,
The issue is not seen with non-cpu clock sources, or when using a
single process (with individual threads, the only config I tried). We
only see the issue when using multiple processes and the cpu clock
source.
On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 11/20/2015 12:37 PM, Caio Villela wrote:
>>
>> Hello Allen and Jens,
>>
>> Sorry for the long output, this is just in case you want the details.
>> Here is a simple explanation for the problem. I want to run a 15 minute
>> random write, using 1 Meg requests, and measure throughput and latency.
>> What seems to be the problem is that if the test system has a large
>> number of drives - the system that I am testing here has 28 drives -
>> then the time accounting seems to go bad for some of the processes.
>> What you see below is that during the 15 minutes from start, all disks
>> are getting hit the same, as they should. Then, after 15 minutes, there
>> are 15 drives that are still running.... after 5 minutes over the
>> specified 15 minutes, there is still one drive running. Then looking at
>> the amount of IOs sent to each drive, the ones that ran on that excess
>> time have much more IOs. FIO still reports that all drives ran for 15
>> minutes, although some ran for more than 20 minutes.
>>
>> We will attempt to run a single process instead of 28 instances of FIO
>> to see if this goes away.
>
>
> Could you also check if adding clocksource=gettimeofday makes any
> difference? This sounds very odd.
>
> Assuming this was run with fio -git?
>
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-11-20 22:20 ` Akash Verma
@ 2015-11-21 0:03 ` Jens Axboe
2015-11-21 0:21 ` Jens Axboe
0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-21 0:03 UTC (permalink / raw)
To: Akash Verma; +Cc: Caio Villela, Allen Schade, fio
[-- Attachment #1: Type: text/plain, Size: 2104 bytes --]
Hi,
OK, I see. Can you pull the latest -git, and then run fio --cpuclock-test
on one of the boxes where you see the issue? It should have
commit 5896d827e1e2 or later.
On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com> wrote:
> Hi Jens,
> The issue is not seen with non-cpu clock sources, or when using a
> single process (with individual threads, the only config I tried). We
> only see the issue when using multiple processes and the cpu clock
> source.
>
> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk> wrote:
> > On 11/20/2015 12:37 PM, Caio Villela wrote:
> >>
> >> Hello Allen and Jens,
> >>
> >> Sorry for the long output, this is just in case you want the details.
> >> Here is a simple explanation for the problem. I want to run a 15 minute
> >> random write, using 1 Meg requests, and measure throughput and latency.
> >> What seems to be the problem is that if the test system has a large
> >> number of drives - the system that I am testing here has 28 drives -
> >> then the time accounting seems to go bad for some of the processes.
> >> What you see below is that during the 15 minutes from start, all disks
> >> are getting hit the same, as they should. Then, after 15 minutes, there
> >> are 15 drives that are still running.... after 5 minutes over the
> >> specified 15 minutes, there is still one drive running. Then looking at
> >> the amount of IOs sent to each drive, the ones that ran on that excess
> >> time have much more IOs. FIO still reports that all drives ran for 15
> >> minutes, although some ran for more than 20 minutes.
> >>
> >> We will attempt to run a single process instead of 28 instances of FIO
> >> to see if this goes away.
> >
> >
> > Could you also check if adding clocksource=gettimeofday makes any
> > difference? This sounds very odd.
> >
> > Assuming this was run with fio -git?
> >
> >
> > --
> > Jens Axboe
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe fio" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
[-- Attachment #2: Type: text/html, Size: 3018 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-11-21 0:03 ` Jens Axboe
@ 2015-11-21 0:21 ` Jens Axboe
2015-11-24 15:51 ` Jens Axboe
0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-21 0:21 UTC (permalink / raw)
To: Akash Verma; +Cc: Caio Villela, Allen Schade, fio
And finally, there's a potential fix, if you run commit
99afcdb53dc3 or later. So please do try that as well, and
see if that behaves any better for you.
On 11/20/2015 05:03 PM, Jens Axboe wrote:
> Hi,
>
> OK, I see. Can you pull the latest -git, and then run fio
> --cpuclock-test on one of the boxes where you see the issue? It should
> have commit 5896d827e1e2 or later.
>
>
> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com
> <mailto:akashv@google.com>> wrote:
>
> Hi Jens,
> The issue is not seen with non-cpu clock sources, or when using a
> single process (with individual threads, the only config I tried). We
> only see the issue when using multiple processes and the cpu clock
> source.
>
> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk
> <mailto:axboe@kernel.dk>> wrote:
> > On 11/20/2015 12:37 PM, Caio Villela wrote:
> >>
> >> Hello Allen and Jens,
> >>
> >> Sorry for the long output, this is just in case you want the
> details.
> >> Here is a simple explanation for the problem. I want to run a 15
> minute
> >> random write, using 1 Meg requests, and measure throughput and
> latency.
> >> What seems to be the problem is that if the test system has a large
> >> number of drives - the system that I am testing here has 28 drives -
> >> then the time accounting seems to go bad for some of the processes.
> >> What you see below is that during the 15 minutes from start, all
> disks
> >> are getting hit the same, as they should. Then, after 15
> minutes, there
> >> are 15 drives that are still running.... after 5 minutes over the
> >> specified 15 minutes, there is still one drive running. Then
> looking at
> >> the amount of IOs sent to each drive, the ones that ran on that
> excess
> >> time have much more IOs. FIO still reports that all drives ran
> for 15
> >> minutes, although some ran for more than 20 minutes.
> >>
> >> We will attempt to run a single process instead of 28 instances
> of FIO
> >> to see if this goes away.
> >
> >
> > Could you also check if adding clocksource=gettimeofday makes any
> > difference? This sounds very odd.
> >
> > Assuming this was run with fio -git?
> >
> >
> > --
> > Jens Axboe
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe fio" in
> > the body of a message tomajordomo@vger.kernel.org <mailto:majordomo@vger.kernel.org>
> > More majordomo info athttp://vger.kernel.org/majordomo-info.html
>
>
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-11-21 0:21 ` Jens Axboe
@ 2015-11-24 15:51 ` Jens Axboe
2015-11-24 20:51 ` Akash Verma
0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-24 15:51 UTC (permalink / raw)
To: Akash Verma; +Cc: Caio Villela, Allen Schade, fio
Did you try current -git yet? I think it should work for both scenarios.
It's a silly bug, would be great to have confirmation that it's fixed.
Then I'll spin a new release.
On 11/20/2015 05:21 PM, Jens Axboe wrote:
> And finally, there's a potential fix, if you run commit
> 99afcdb53dc3 or later. So please do try that as well, and
> see if that behaves any better for you.
>
>
> On 11/20/2015 05:03 PM, Jens Axboe wrote:
>> Hi,
>>
>> OK, I see. Can you pull the latest -git, and then run fio
>> --cpuclock-test on one of the boxes where you see the issue? It should
>> have commit 5896d827e1e2 or later.
>>
>>
>> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com
>> <mailto:akashv@google.com>> wrote:
>>
>> Hi Jens,
>> The issue is not seen with non-cpu clock sources, or when using a
>> single process (with individual threads, the only config I tried). We
>> only see the issue when using multiple processes and the cpu clock
>> source.
>>
>> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk
>> <mailto:axboe@kernel.dk>> wrote:
>> > On 11/20/2015 12:37 PM, Caio Villela wrote:
>> >>
>> >> Hello Allen and Jens,
>> >>
>> >> Sorry for the long output, this is just in case you want the
>> details.
>> >> Here is a simple explanation for the problem. I want to run a 15
>> minute
>> >> random write, using 1 Meg requests, and measure throughput and
>> latency.
>> >> What seems to be the problem is that if the test system has a
>> large
>> >> number of drives - the system that I am testing here has 28
>> drives -
>> >> then the time accounting seems to go bad for some of the
>> processes.
>> >> What you see below is that during the 15 minutes from start, all
>> disks
>> >> are getting hit the same, as they should. Then, after 15
>> minutes, there
>> >> are 15 drives that are still running.... after 5 minutes over the
>> >> specified 15 minutes, there is still one drive running. Then
>> looking at
>> >> the amount of IOs sent to each drive, the ones that ran on that
>> excess
>> >> time have much more IOs. FIO still reports that all drives ran
>> for 15
>> >> minutes, although some ran for more than 20 minutes.
>> >>
>> >> We will attempt to run a single process instead of 28 instances
>> of FIO
>> >> to see if this goes away.
>> >
>> >
>> > Could you also check if adding clocksource=gettimeofday makes any
>> > difference? This sounds very odd.
>> >
>> > Assuming this was run with fio -git?
>> >
>> >
>> > --
>> > Jens Axboe
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe fio" in
>> > the body of a message tomajordomo@vger.kernel.org
>> <mailto:majordomo@vger.kernel.org>
>> > More majordomo info athttp://vger.kernel.org/majordomo-info.html
>>
>>
>
>
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-11-24 15:51 ` Jens Axboe
@ 2015-11-24 20:51 ` Akash Verma
2015-11-25 1:18 ` Jens Axboe
0 siblings, 1 reply; 13+ messages in thread
From: Akash Verma @ 2015-11-24 20:51 UTC (permalink / raw)
To: Jens Axboe, Michael Bella; +Cc: Caio Villela, Allen Schade, fio
Sorry for not getting back - I didn't get a chance to try the latest
git, and I'm off on vacation soon; I'm ccing Michael and Caio who
might have a chance to try it out before Thursday. Michael or Caio,
could you try run the two things Jens asked (the cpuclock test using
the FIO we've been currently using as well as the latest from Git; and
the regular multi-process FIO run with the latest git)?
On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk> wrote:
> Did you try current -git yet? I think it should work for both scenarios.
> It's a silly bug, would be great to have confirmation that it's fixed. Then
> I'll spin a new release.
>
>
>
> On 11/20/2015 05:21 PM, Jens Axboe wrote:
>>
>> And finally, there's a potential fix, if you run commit
>> 99afcdb53dc3 or later. So please do try that as well, and
>> see if that behaves any better for you.
>>
>>
>> On 11/20/2015 05:03 PM, Jens Axboe wrote:
>>>
>>> Hi,
>>>
>>> OK, I see. Can you pull the latest -git, and then run fio
>>> --cpuclock-test on one of the boxes where you see the issue? It should
>>> have commit 5896d827e1e2 or later.
>>>
>>>
>>> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com
>>> <mailto:akashv@google.com>> wrote:
>>>
>>> Hi Jens,
>>> The issue is not seen with non-cpu clock sources, or when using a
>>> single process (with individual threads, the only config I tried). We
>>> only see the issue when using multiple processes and the cpu clock
>>> source.
>>>
>>> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk
>>> <mailto:axboe@kernel.dk>> wrote:
>>> > On 11/20/2015 12:37 PM, Caio Villela wrote:
>>> >>
>>> >> Hello Allen and Jens,
>>> >>
>>> >> Sorry for the long output, this is just in case you want the
>>> details.
>>> >> Here is a simple explanation for the problem. I want to run a 15
>>> minute
>>> >> random write, using 1 Meg requests, and measure throughput and
>>> latency.
>>> >> What seems to be the problem is that if the test system has a
>>> large
>>> >> number of drives - the system that I am testing here has 28
>>> drives -
>>> >> then the time accounting seems to go bad for some of the
>>> processes.
>>> >> What you see below is that during the 15 minutes from start, all
>>> disks
>>> >> are getting hit the same, as they should. Then, after 15
>>> minutes, there
>>> >> are 15 drives that are still running.... after 5 minutes over the
>>> >> specified 15 minutes, there is still one drive running. Then
>>> looking at
>>> >> the amount of IOs sent to each drive, the ones that ran on that
>>> excess
>>> >> time have much more IOs. FIO still reports that all drives ran
>>> for 15
>>> >> minutes, although some ran for more than 20 minutes.
>>> >>
>>> >> We will attempt to run a single process instead of 28 instances
>>> of FIO
>>> >> to see if this goes away.
>>> >
>>> >
>>> > Could you also check if adding clocksource=gettimeofday makes any
>>> > difference? This sounds very odd.
>>> >
>>> > Assuming this was run with fio -git?
>>> >
>>> >
>>> > --
>>> > Jens Axboe
>>> >
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe fio" in
>>> > the body of a message tomajordomo@vger.kernel.org
>>> <mailto:majordomo@vger.kernel.org>
>>> > More majordomo info athttp://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>>
>
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-11-24 20:51 ` Akash Verma
@ 2015-11-25 1:18 ` Jens Axboe
2015-12-03 18:54 ` Akash Verma
0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-25 1:18 UTC (permalink / raw)
To: Akash Verma, Michael Bella; +Cc: Caio Villela, Allen Schade, fio
No worries, I know this week is a bit more problematic than usual. I'll
hold off on the new release until I know.
On 11/24/2015 01:51 PM, Akash Verma wrote:
> Sorry for not getting back - I didn't get a chance to try the latest
> git, and I'm off on vacation soon; I'm ccing Michael and Caio who
> might have a chance to try it out before Thursday. Michael or Caio,
> could you try run the two things Jens asked (the cpuclock test using
> the FIO we've been currently using as well as the latest from Git; and
> the regular multi-process FIO run with the latest git)?
>
> On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk> wrote:
>> Did you try current -git yet? I think it should work for both scenarios.
>> It's a silly bug, would be great to have confirmation that it's fixed. Then
>> I'll spin a new release.
>>
>>
>>
>> On 11/20/2015 05:21 PM, Jens Axboe wrote:
>>>
>>> And finally, there's a potential fix, if you run commit
>>> 99afcdb53dc3 or later. So please do try that as well, and
>>> see if that behaves any better for you.
>>>
>>>
>>> On 11/20/2015 05:03 PM, Jens Axboe wrote:
>>>>
>>>> Hi,
>>>>
>>>> OK, I see. Can you pull the latest -git, and then run fio
>>>> --cpuclock-test on one of the boxes where you see the issue? It should
>>>> have commit 5896d827e1e2 or later.
>>>>
>>>>
>>>> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com
>>>> <mailto:akashv@google.com>> wrote:
>>>>
>>>> Hi Jens,
>>>> The issue is not seen with non-cpu clock sources, or when using a
>>>> single process (with individual threads, the only config I tried). We
>>>> only see the issue when using multiple processes and the cpu clock
>>>> source.
>>>>
>>>> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk
>>>> <mailto:axboe@kernel.dk>> wrote:
>>>> > On 11/20/2015 12:37 PM, Caio Villela wrote:
>>>> >>
>>>> >> Hello Allen and Jens,
>>>> >>
>>>> >> Sorry for the long output, this is just in case you want the
>>>> details.
>>>> >> Here is a simple explanation for the problem. I want to run a 15
>>>> minute
>>>> >> random write, using 1 Meg requests, and measure throughput and
>>>> latency.
>>>> >> What seems to be the problem is that if the test system has a
>>>> large
>>>> >> number of drives - the system that I am testing here has 28
>>>> drives -
>>>> >> then the time accounting seems to go bad for some of the
>>>> processes.
>>>> >> What you see below is that during the 15 minutes from start, all
>>>> disks
>>>> >> are getting hit the same, as they should. Then, after 15
>>>> minutes, there
>>>> >> are 15 drives that are still running.... after 5 minutes over the
>>>> >> specified 15 minutes, there is still one drive running. Then
>>>> looking at
>>>> >> the amount of IOs sent to each drive, the ones that ran on that
>>>> excess
>>>> >> time have much more IOs. FIO still reports that all drives ran
>>>> for 15
>>>> >> minutes, although some ran for more than 20 minutes.
>>>> >>
>>>> >> We will attempt to run a single process instead of 28 instances
>>>> of FIO
>>>> >> to see if this goes away.
>>>> >
>>>> >
>>>> > Could you also check if adding clocksource=gettimeofday makes any
>>>> > difference? This sounds very odd.
>>>> >
>>>> > Assuming this was run with fio -git?
>>>> >
>>>> >
>>>> > --
>>>> > Jens Axboe
>>>> >
>>>> > --
>>>> > To unsubscribe from this list: send the line "unsubscribe fio" in
>>>> > the body of a message tomajordomo@vger.kernel.org
>>>> <mailto:majordomo@vger.kernel.org>
>>>> > More majordomo info athttp://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Jens Axboe
>>
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-11-25 1:18 ` Jens Axboe
@ 2015-12-03 18:54 ` Akash Verma
2015-12-03 18:58 ` Jens Axboe
0 siblings, 1 reply; 13+ messages in thread
From: Akash Verma @ 2015-12-03 18:54 UTC (permalink / raw)
To: Jens Axboe; +Cc: Michael Bella, Caio Villela, Allen Schade, fio
[-- Attachment #1: Type: text/plain, Size: 4255 bytes --]
Jens, I confirmed that the issue is not seen with the latest FIO (I used
version fio-2.2.12-15-gcdab).
On Tue, Nov 24, 2015 at 5:18 PM, Jens Axboe <axboe@kernel.dk> wrote:
> No worries, I know this week is a bit more problematic than usual. I'll
> hold off on the new release until I know.
>
>
>
> On 11/24/2015 01:51 PM, Akash Verma wrote:
>
>> Sorry for not getting back - I didn't get a chance to try the latest
>> git, and I'm off on vacation soon; I'm ccing Michael and Caio who
>> might have a chance to try it out before Thursday. Michael or Caio,
>> could you try run the two things Jens asked (the cpuclock test using
>> the FIO we've been currently using as well as the latest from Git; and
>> the regular multi-process FIO run with the latest git)?
>>
>> On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>>> Did you try current -git yet? I think it should work for both scenarios.
>>> It's a silly bug, would be great to have confirmation that it's fixed.
>>> Then
>>> I'll spin a new release.
>>>
>>>
>>>
>>> On 11/20/2015 05:21 PM, Jens Axboe wrote:
>>>
>>>>
>>>> And finally, there's a potential fix, if you run commit
>>>> 99afcdb53dc3 or later. So please do try that as well, and
>>>> see if that behaves any better for you.
>>>>
>>>>
>>>> On 11/20/2015 05:03 PM, Jens Axboe wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> OK, I see. Can you pull the latest -git, and then run fio
>>>>> --cpuclock-test on one of the boxes where you see the issue? It should
>>>>> have commit 5896d827e1e2 or later.
>>>>>
>>>>>
>>>>> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com
>>>>> <mailto:akashv@google.com>> wrote:
>>>>>
>>>>> Hi Jens,
>>>>> The issue is not seen with non-cpu clock sources, or when using a
>>>>> single process (with individual threads, the only config I
>>>>> tried). We
>>>>> only see the issue when using multiple processes and the cpu clock
>>>>> source.
>>>>>
>>>>> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk
>>>>> <mailto:axboe@kernel.dk>> wrote:
>>>>> > On 11/20/2015 12:37 PM, Caio Villela wrote:
>>>>> >>
>>>>> >> Hello Allen and Jens,
>>>>> >>
>>>>> >> Sorry for the long output, this is just in case you want the
>>>>> details.
>>>>> >> Here is a simple explanation for the problem. I want to run a
>>>>> 15
>>>>> minute
>>>>> >> random write, using 1 Meg requests, and measure throughput and
>>>>> latency.
>>>>> >> What seems to be the problem is that if the test system has a
>>>>> large
>>>>> >> number of drives - the system that I am testing here has 28
>>>>> drives -
>>>>> >> then the time accounting seems to go bad for some of the
>>>>> processes.
>>>>> >> What you see below is that during the 15 minutes from start,
>>>>> all
>>>>> disks
>>>>> >> are getting hit the same, as they should. Then, after 15
>>>>> minutes, there
>>>>> >> are 15 drives that are still running.... after 5 minutes over
>>>>> the
>>>>> >> specified 15 minutes, there is still one drive running. Then
>>>>> looking at
>>>>> >> the amount of IOs sent to each drive, the ones that ran on
>>>>> that
>>>>> excess
>>>>> >> time have much more IOs. FIO still reports that all drives ran
>>>>> for 15
>>>>> >> minutes, although some ran for more than 20 minutes.
>>>>> >>
>>>>> >> We will attempt to run a single process instead of 28
>>>>> instances
>>>>> of FIO
>>>>> >> to see if this goes away.
>>>>> >
>>>>> >
>>>>> > Could you also check if adding clocksource=gettimeofday makes
>>>>> any
>>>>> > difference? This sounds very odd.
>>>>> >
>>>>> > Assuming this was run with fio -git?
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Jens Axboe
>>>>> >
>>>>> > --
>>>>> > To unsubscribe from this list: send the line "unsubscribe fio"
>>>>> in
>>>>> > the body of a message tomajordomo@vger.kernel.org
>>>>> <mailto:majordomo@vger.kernel.org>
>>>>> > More majordomo info athttp://
>>>>> vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Jens Axboe
>>>
>>>
>
> --
> Jens Axboe
>
>
[-- Attachment #2: Type: text/html, Size: 5938 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk?
2015-12-03 18:54 ` Akash Verma
@ 2015-12-03 18:58 ` Jens Axboe
0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2015-12-03 18:58 UTC (permalink / raw)
To: Akash Verma; +Cc: Michael Bella, Caio Villela, Allen Schade, fio
Perfect! Thanks for reporting and re-testing.
On 12/03/2015 11:54 AM, Akash Verma wrote:
> Jens, I confirmed that the issue is not seen with the latest FIO (I used
> version fio-2.2.12-15-gcdab).
>
> On Tue, Nov 24, 2015 at 5:18 PM, Jens Axboe <axboe@kernel.dk
> <mailto:axboe@kernel.dk>> wrote:
>
> No worries, I know this week is a bit more problematic than usual.
> I'll hold off on the new release until I know.
>
>
>
> On 11/24/2015 01:51 PM, Akash Verma wrote:
>
> Sorry for not getting back - I didn't get a chance to try the latest
> git, and I'm off on vacation soon; I'm ccing Michael and Caio who
> might have a chance to try it out before Thursday. Michael or Caio,
> could you try run the two things Jens asked (the cpuclock test using
> the FIO we've been currently using as well as the latest from
> Git; and
> the regular multi-process FIO run with the latest git)?
>
> On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk
> <mailto:axboe@kernel.dk>> wrote:
>
> Did you try current -git yet? I think it should work for
> both scenarios.
> It's a silly bug, would be great to have confirmation that
> it's fixed. Then
> I'll spin a new release.
>
>
>
> On 11/20/2015 05:21 PM, Jens Axboe wrote:
>
>
> And finally, there's a potential fix, if you run commit
> 99afcdb53dc3 or later. So please do try that as well, and
> see if that behaves any better for you.
>
>
> On 11/20/2015 05:03 PM, Jens Axboe wrote:
>
>
> Hi,
>
> OK, I see. Can you pull the latest -git, and then
> run fio
> --cpuclock-test on one of the boxes where you see
> the issue? It should
> have commit 5896d827e1e2 or later.
>
>
> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma
> <akashv@google.com <mailto:akashv@google.com>
> <mailto:akashv@google.com
> <mailto:akashv@google.com>>> wrote:
>
> Hi Jens,
> The issue is not seen with non-cpu clock
> sources, or when using a
> single process (with individual threads, the
> only config I tried). We
> only see the issue when using multiple
> processes and the cpu clock
> source.
>
> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe
> <axboe@kernel.dk <mailto:axboe@kernel.dk>
> <mailto:axboe@kernel.dk
> <mailto:axboe@kernel.dk>>> wrote:
> > On 11/20/2015 12:37 PM, Caio Villela wrote:
> >>
> >> Hello Allen and Jens,
> >>
> >> Sorry for the long output, this is just in
> case you want the
> details.
> >> Here is a simple explanation for the
> problem. I want to run a 15
> minute
> >> random write, using 1 Meg requests, and
> measure throughput and
> latency.
> >> What seems to be the problem is that if
> the test system has a
> large
> >> number of drives - the system that I am
> testing here has 28
> drives -
> >> then the time accounting seems to go bad
> for some of the
> processes.
> >> What you see below is that during the 15
> minutes from start, all
> disks
> >> are getting hit the same, as they should.
> Then, after 15
> minutes, there
> >> are 15 drives that are still running....
> after 5 minutes over the
> >> specified 15 minutes, there is still one
> drive running. Then
> looking at
> >> the amount of IOs sent to each drive, the
> ones that ran on that
> excess
> >> time have much more IOs. FIO still reports
> that all drives ran
> for 15
> >> minutes, although some ran for more than
> 20 minutes.
> >>
> >> We will attempt to run a single process
> instead of 28 instances
> of FIO
> >> to see if this goes away.
> >
> >
> > Could you also check if adding
> clocksource=gettimeofday makes any
> > difference? This sounds very odd.
> >
> > Assuming this was run with fio -git?
> >
> >
> > --
> > Jens Axboe
> >
> > --
> > To unsubscribe from this list: send the line
> "unsubscribe fio" in
> > the body of a message
> tomajordomo@vger.kernel.org
> <mailto:tomajordomo@vger.kernel.org>
> <mailto:majordomo@vger.kernel.org
> <mailto:majordomo@vger.kernel.org>>
> > More majordomo info
> athttp://vger.kernel.org/majordomo-info.html
> <http://vger.kernel.org/majordomo-info.html>
>
>
>
>
>
>
> --
> Jens Axboe
>
>
>
> --
> Jens Axboe
>
>
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-12-03 18:58 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CADp+U7ibiKciX8_cpzGzob4oL-UF-H+W7kYuiujovD0ba=hM6A@mail.gmail.com>
[not found] ` <56464ACC.9030605@kernel.dk>
2015-11-13 22:04 ` Running a separate fio process for each disk? Allen Schade
2015-11-13 22:06 ` Jens Axboe
2015-11-20 18:28 ` Allen Schade
2015-11-20 19:37 ` Caio Villela
2015-11-20 19:50 ` Jens Axboe
2015-11-20 22:20 ` Akash Verma
2015-11-21 0:03 ` Jens Axboe
2015-11-21 0:21 ` Jens Axboe
2015-11-24 15:51 ` Jens Axboe
2015-11-24 20:51 ` Akash Verma
2015-11-25 1:18 ` Jens Axboe
2015-12-03 18:54 ` Akash Verma
2015-12-03 18:58 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.