All of lore.kernel.org
 help / color / mirror / Atom feed
* Inconsistent Status Intervals
       [not found] <BYAPR04MB56568021114303501613DB1AEA2D0@BYAPR04MB5656.namprd04.prod.outlook.com>
@ 2020-09-04 20:16 ` Jeffrey Lien
  2020-09-05  4:19   ` Bart Van Assche
  0 siblings, 1 reply; 13+ messages in thread
From: Jeffrey Lien @ 2020-09-04 20:16 UTC (permalink / raw)
  To: fio, bvanassche; +Cc: Siriporn Swart, Jeff Furlong, Kris Davis



An issue with fio version 3.20 was reported to me by our performance team where the status intervals for the iops and bw values gradually increase over the run time of the job.  Eventually, the delta gets large enough that it causes issues with the python script that formats and summarizes the fio output.  I included an example of the output at the bottom of this email.   

I found this commit from Bart in version 3.18 that introduces the status interval variability:
https://github.com/axboe/fio/commit/31eca641ad91634e5ffcf369cd756b0506a700c1
And then this commit from Vincent that makes it better but doesn't completely resolve the issue:
https://github.com/axboe/fio/commit/0f77d30d90b809fbf233b9474cc5d17c6bf73541

I looked at both of these patches but am not familiar enough with the fio code to know how best to resolve the issue.  Does anyone have any suggestions on what could be done to make the status intervals consistent?  

Example BW log 

 Version 3.15 Output
1000, 802924, 1, 0
2000, 801532, 1, 0
3000, 774084, 1, 0
4000, 423840, 1, 0
5001, 428355, 1, 0
6000, 427228, 1, 0
7000, 424040, 1, 0
8000, 428556, 1, 0
9000, 425144, 1, 0
10000, 430824, 1, 0
11000, 426304, 1, 0
12000, 432148, 1, 0
13000, 422860, 1, 0
14000, 433216, 1, 0
15000, 427520, 1, 0
16000, 428940, 1, 0
17000, 429840, 1, 0
18000, 423876, 1, 0
19000, 430240, 1, 0
20000, 424224, 1, 0
21000, 429568, 1, 0
22000, 425660, 1, 0
23000, 430572, 1, 0
24001, 426569, 1, 0
25000, 432576, 1, 0
26000, 428800, 1, 0
27000, 428776, 1, 0
28000, 431152, 1, 0
29000, 426356, 1, 0
30000, 428288, 1, 0
31000, 427544, 1, 0
32000, 429144, 1, 0
33000, 426240, 1, 0

Version 3.20 Output
1000, 832088, 1, 0, 0
2000, 810048, 1, 0, 0
3000, 767012, 1, 0, 0
4001, 425120, 1, 0, 0
5001, 434592, 1, 0, 0
6002, 420224, 1, 0, 0
7002, 436056, 1, 0, 0
8003, 422568, 1, 0, 0
9003, 434752, 1, 0, 0
10004, 431520, 1, 0, 0
11004, 427456, 1, 0, 0
12005, 431680, 1, 0, 0
13005, 427420, 1, 0, 0
14005, 433104, 1, 0, 0
15006, 426756, 1, 0, 0
16006, 434096, 1, 0, 0
17007, 423008, 1, 0, 0
18007, 435200, 1, 0, 0
19008, 423164, 1, 0, 0
20008, 434088, 1, 0, 0
21009, 425660, 1, 0, 0
22009, 431480, 1, 0, 0
23010, 434312, 1, 0, 0
24010, 428032, 1, 0, 0
25010, 433120, 1, 0, 0
26011, 429184, 1, 0, 0
27011, 432576, 1, 0, 0
28012, 427552, 1, 0, 0
29012, 434400, 1, 0, 0
30013, 427008, 1, 0, 0
31013, 433012, 1, 0, 0
32014, 424940, 1, 0, 0
33014, 435144, 1, 0, 0


                                                 



Jeff Lien
eSSD Core SW Tools & Drivers

Western Digital
2900 37th St NW
Building 108-1
Rochester, MN 55901
Email:  mailto:Jeff.Lien@wdc.com
Office: +1-507-322-2416



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Inconsistent Status Intervals
  2020-09-04 20:16 ` Inconsistent Status Intervals Jeffrey Lien
@ 2020-09-05  4:19   ` Bart Van Assche
  2020-09-08 13:25     ` Jeffrey Lien
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2020-09-05  4:19 UTC (permalink / raw)
  To: Jeffrey Lien, fio; +Cc: Siriporn Swart, Jeff Furlong, Kris Davis

On 2020-09-04 13:16, Jeffrey Lien wrote:
> An issue with fio version 3.20 was reported to me by our performance team where the status intervals for the iops and bw values gradually increase over the run time of the job.  Eventually, the delta gets large enough that it causes issues with the python script that formats and summarizes the fio output.  I included an example of the output at the bottom of this email.
> 
> I found this commit from Bart in version 3.18 that introduces the status interval variability:
> https://github.com/axboe/fio/commit/31eca641ad91634e5ffcf369cd756b0506a700c1
> And then this commit from Vincent that makes it better but doesn't completely resolve the issue:
> https://github.com/axboe/fio/commit/0f77d30d90b809fbf233b9474cc5d17c6bf73541
> 
> I looked at both of these patches but am not familiar enough with the fio code to know how best to resolve the issue.  Does anyone have any suggestions on what could be done to make the status intervals consistent?  

A candidate fix is available on the master branch of
https://github.com/bvanassche/fio. If the fix on that branch
helps I will send a pull request to Jens.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Inconsistent Status Intervals
  2020-09-05  4:19   ` Bart Van Assche
@ 2020-09-08 13:25     ` Jeffrey Lien
  2020-09-08 17:59       ` Jeffrey Lien
  0 siblings, 1 reply; 13+ messages in thread
From: Jeffrey Lien @ 2020-09-08 13:25 UTC (permalink / raw)
  To: Bart Van Assche, fio; +Cc: Siriporn Swart, Jeff Furlong, Kris Davis

Bart,
I'll give your fix a try this today and let you know.  Thanks.   

-----Original Message-----
From: Bart Van Assche <bvanassche@acm.org> 
Sent: Friday, September 4, 2020 11:19 PM
To: Jeffrey Lien <Jeff.Lien@wdc.com>; fio@vger.kernel.org
Cc: Siriporn Swart <Siriporn.Swart@wdc.com>; Jeff Furlong <jeff.furlong@wdc.com>; Kris Davis <Kris.Davis@wdc.com>
Subject: Re: Inconsistent Status Intervals

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.


On 2020-09-04 13:16, Jeffrey Lien wrote:
> An issue with fio version 3.20 was reported to me by our performance team where the status intervals for the iops and bw values gradually increase over the run time of the job.  Eventually, the delta gets large enough that it causes issues with the python script that formats and summarizes the fio output.  I included an example of the output at the bottom of this email.
>
> I found this commit from Bart in version 3.18 that introduces the status interval variability:
> https://github.com/axboe/fio/commit/31eca641ad91634e5ffcf369cd756b0506
> a700c1 And then this commit from Vincent that makes it better but 
> doesn't completely resolve the issue:
> https://github.com/axboe/fio/commit/0f77d30d90b809fbf233b9474cc5d17c6b
> f73541
>
> I looked at both of these patches but am not familiar enough with the fio code to know how best to resolve the issue.  Does anyone have any suggestions on what could be done to make the status intervals consistent?

A candidate fix is available on the master branch of https://github.com/bvanassche/fio. If the fix on that branch helps I will send a pull request to Jens.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Inconsistent Status Intervals
  2020-09-08 13:25     ` Jeffrey Lien
@ 2020-09-08 17:59       ` Jeffrey Lien
  2020-09-08 19:09         ` Bart Van Assche
  0 siblings, 1 reply; 13+ messages in thread
From: Jeffrey Lien @ 2020-09-08 17:59 UTC (permalink / raw)
  To: Bart Van Assche, fio; +Cc: Siriporn Swart, Jeff Furlong, Kris Davis

Bart, 
Your fix did not resolve the issue.  The time interval is still increasing by 1-2 usec every 2nd or 3rd interval.  
1000, 704468, 1, 0, 0
2001, 829472, 1, 0, 0
3001, 837708, 1, 0, 0
4002, 828268, 1, 0, 0
5002, 830692, 1, 0, 0
6002, 826260, 1, 0, 0
7003, 838460, 1, 0, 0
8003, 831724, 1, 0, 0
9004, 841744, 1, 0, 0
10004, 834260, 1, 0, 0
11005, 835928, 1, 0, 0
12005, 824180, 1, 0, 0
13005, 827860, 1, 0, 0
14006, 834212, 1, 0, 0
15006, 821324, 1, 0, 0
16007, 822864, 1, 0, 0
17007, 824908, 1, 0, 0
18007, 841144, 1, 0, 0
19008, 826712, 1, 0, 0
20008, 822496, 1, 0, 0
21009, 826352, 1, 0, 0
22009, 828428, 1, 0, 0
23010, 821648, 1, 0, 0
24010, 836984, 1, 0, 0
25011, 842084, 1, 0, 0
26011, 833924, 1, 0, 0
27011, 830084, 1, 0, 0
28012, 835876, 1, 0, 0
29012, 840720, 1, 0, 0
30013, 832967, 1, 0, 0
31014, 827632, 1, 0, 0
32014, 823948, 1, 0, 0
33015, 840832, 1, 0, 0
34015, 833556, 1, 0, 0
35015, 827896, 1, 0, 0

-----Original Message-----
From: Jeffrey Lien 
Sent: Tuesday, September 8, 2020 8:26 AM
To: Bart Van Assche <bvanassche@acm.org>; fio@vger.kernel.org
Cc: Siriporn Swart <Siriporn.Swart@wdc.com>; Jeff Furlong <jeff.furlong@wdc.com>; Kris Davis <Kris.Davis@wdc.com>
Subject: RE: Inconsistent Status Intervals

Bart,
I'll give your fix a try this today and let you know.  Thanks.   

-----Original Message-----
From: Bart Van Assche <bvanassche@acm.org>
Sent: Friday, September 4, 2020 11:19 PM
To: Jeffrey Lien <Jeff.Lien@wdc.com>; fio@vger.kernel.org
Cc: Siriporn Swart <Siriporn.Swart@wdc.com>; Jeff Furlong <jeff.furlong@wdc.com>; Kris Davis <Kris.Davis@wdc.com>
Subject: Re: Inconsistent Status Intervals

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.


On 2020-09-04 13:16, Jeffrey Lien wrote:
> An issue with fio version 3.20 was reported to me by our performance team where the status intervals for the iops and bw values gradually increase over the run time of the job.  Eventually, the delta gets large enough that it causes issues with the python script that formats and summarizes the fio output.  I included an example of the output at the bottom of this email.
>
> I found this commit from Bart in version 3.18 that introduces the status interval variability:
> https://github.com/axboe/fio/commit/31eca641ad91634e5ffcf369cd756b0506
> a700c1 And then this commit from Vincent that makes it better but 
> doesn't completely resolve the issue:
> https://github.com/axboe/fio/commit/0f77d30d90b809fbf233b9474cc5d17c6b
> f73541
>
> I looked at both of these patches but am not familiar enough with the fio code to know how best to resolve the issue.  Does anyone have any suggestions on what could be done to make the status intervals consistent?

A candidate fix is available on the master branch of https://github.com/bvanassche/fio. If the fix on that branch helps I will send a pull request to Jens.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Inconsistent Status Intervals
  2020-09-08 17:59       ` Jeffrey Lien
@ 2020-09-08 19:09         ` Bart Van Assche
  2020-09-08 20:44           ` Jeffrey Lien
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2020-09-08 19:09 UTC (permalink / raw)
  To: Jeffrey Lien, fio; +Cc: Siriporn Swart, Jeff Furlong, Kris Davis

On 2020-09-08 10:59, Jeffrey Lien wrote:
> Your fix did not resolve the issue.
Hi Jeff,

Thanks for retesting. Something I should have asked first: which operating
system are you using? Linux, Windows, OS/X or perhaps yet another operating
system?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Inconsistent Status Intervals
  2020-09-08 19:09         ` Bart Van Assche
@ 2020-09-08 20:44           ` Jeffrey Lien
  2020-09-11 17:10             ` Jeffrey Lien
  0 siblings, 1 reply; 13+ messages in thread
From: Jeffrey Lien @ 2020-09-08 20:44 UTC (permalink / raw)
  To: Bart Van Assche, fio; +Cc: Siriporn Swart, Jeff Furlong, Kris Davis

Bart, 
I'm running on Linux Redhat 7.8, kernel version 3.10.0-1127.18.2.


-----Original Message-----
From: Bart Van Assche <bvanassche@acm.org> 
Sent: Tuesday, September 8, 2020 2:10 PM
To: Jeffrey Lien <Jeff.Lien@wdc.com>; fio@vger.kernel.org
Cc: Siriporn Swart <Siriporn.Swart@wdc.com>; Jeff Furlong <jeff.furlong@wdc.com>; Kris Davis <Kris.Davis@wdc.com>
Subject: Re: Inconsistent Status Intervals

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.


On 2020-09-08 10:59, Jeffrey Lien wrote:
> Your fix did not resolve the issue.
Hi Jeff,

Thanks for retesting. Something I should have asked first: which operating system are you using? Linux, Windows, OS/X or perhaps yet another operating system?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Inconsistent Status Intervals
  2020-09-08 20:44           ` Jeffrey Lien
@ 2020-09-11 17:10             ` Jeffrey Lien
  2020-09-12  4:16               ` Bart Van Assche
  0 siblings, 1 reply; 13+ messages in thread
From: Jeffrey Lien @ 2020-09-11 17:10 UTC (permalink / raw)
  To: Jeffrey Lien, Bart Van Assche, fio
  Cc: Siriporn Swart, Jeff Furlong, Kris Davis

Bart,
Do you have any other suggestions or potential fixes to try?  If so, please let us know.  Thanks.  

-----Original Message-----
From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of Jeffrey Lien
Sent: Tuesday, September 8, 2020 3:44 PM
To: Bart Van Assche <bvanassche@acm.org>; fio@vger.kernel.org
Cc: Siriporn Swart <Siriporn.Swart@wdc.com>; Jeff Furlong <jeff.furlong@wdc.com>; Kris Davis <Kris.Davis@wdc.com>
Subject: RE: Inconsistent Status Intervals

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.


Bart,
I'm running on Linux Redhat 7.8, kernel version 3.10.0-1127.18.2.


-----Original Message-----
From: Bart Van Assche <bvanassche@acm.org>
Sent: Tuesday, September 8, 2020 2:10 PM
To: Jeffrey Lien <Jeff.Lien@wdc.com>; fio@vger.kernel.org
Cc: Siriporn Swart <Siriporn.Swart@wdc.com>; Jeff Furlong <jeff.furlong@wdc.com>; Kris Davis <Kris.Davis@wdc.com>
Subject: Re: Inconsistent Status Intervals

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.


On 2020-09-08 10:59, Jeffrey Lien wrote:
> Your fix did not resolve the issue.
Hi Jeff,

Thanks for retesting. Something I should have asked first: which operating system are you using? Linux, Windows, OS/X or perhaps yet another operating system?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Inconsistent Status Intervals
  2020-09-11 17:10             ` Jeffrey Lien
@ 2020-09-12  4:16               ` Bart Van Assche
  2020-09-14 23:22                 ` Siriporn Swart
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2020-09-12  4:16 UTC (permalink / raw)
  To: Jeffrey Lien, fio; +Cc: Siriporn Swart, Jeff Furlong, Kris Davis

On 2020-09-11 10:10, Jeffrey Lien wrote:
> Do you have any other suggestions or potential fixes to try?  If so, please let us know.  Thanks.  

Hi Jeff,

The master branch of https://github.com/bvanassche/fio has been
updated. Please retry.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Inconsistent Status Intervals
  2020-09-12  4:16               ` Bart Van Assche
@ 2020-09-14 23:22                 ` Siriporn Swart
  2020-09-18 19:24                   ` Siriporn Swart
  0 siblings, 1 reply; 13+ messages in thread
From: Siriporn Swart @ 2020-09-14 23:22 UTC (permalink / raw)
  To: Bart Van Assche, Jeffrey Lien, fio; +Cc: Jeff Furlong, Kris Davis

Hello Bart, 

Thank you very much for your support. Jeff Lien is on vacation this week.    My name is Siriporn from performance test team. 
I ran quick test for 30sec with master branch of https://github.com/bvanassche/fio and result looks good.  I will rerun with longer duration. 
Any update will be sent.  

----- 
test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.17-458-g3803
Starting 1 process

 [root@echo-sr635-01 result]# cat test_iops.1.log
1000, 90808, 0, 0, 0
2000, 99240, 0, 0, 0
3000, 99095, 0, 0, 0
4000, 99229, 0, 0, 0
5000, 99270, 0, 0, 0
6000, 99351, 0, 0, 0
7000, 99209, 0, 0, 0
8000, 99299, 0, 0, 0
9000, 99297, 0, 0, 0
10000, 99395, 0, 0, 0
11000, 99292, 0, 0, 0
12000, 99385, 0, 0, 0
13000, 99322, 0, 0, 0
14000, 99293, 0, 0, 0
15000, 99266, 0, 0, 0
16000, 99335, 0, 0, 0
17000, 99367, 0, 0, 0
18000, 99380, 0, 0, 0
19000, 99309, 0, 0, 0
20000, 99290, 0, 0, 0
21000, 99300, 0, 0, 0
22000, 99377, 0, 0, 0
23000, 99381, 0, 0, 0
24000, 99395, 0, 0, 0
25000, 99355, 0, 0, 0
26000, 99371, 0, 0, 0
27000, 99312, 0, 0, 0
28000, 99337, 0, 0, 0
29000, 99264, 0, 0, 0
[root@echo-sr635-01 result]#

Result from fio-3.20
[root@echo-sr635-01 PB319]# cat test_iops.1.log
1000, 84767, 0, 0, 0
2001, 99750, 0, 0, 0
3002, 99973, 0, 0, 0
4003, 99833, 0, 0, 0
5004, 99844, 0, 0, 0
6005, 99817, 0, 0, 0
7006, 99842, 0, 0, 0
8007, 99856, 0, 0, 0
9008, 99824, 0, 0, 0
10009, 99824, 0, 0, 0
11010, 99807, 0, 0, 0
12011, 99874, 0, 0, 0
13012, 99779, 0, 0, 0
14013, 99968, 0, 0, 0
15014, 99954, 0, 0, 0
16015, 99900, 0, 0, 0
17016, 99867, 0, 0, 0
18017, 99812, 0, 0, 0
19018, 99826, 0, 0, 0
20019, 99844, 0, 0, 0
21020, 99833, 0, 0, 0
22021, 99782, 0, 0, 0
23022, 99814, 0, 0, 0
24023, 99875, 0, 0, 0
25024, 99964, 0, 0, 0
26025, 99847, 0, 0, 0
27026, 99859, 0, 0, 0
28027, 99825, 0, 0, 0
29028, 99863, 0, 0, 0

 


Best Regards
**************************************************
Siriporn Swart 
SIT Lab SSD Performance Test Engineer 

Western Digital®
Email:  siriporn.swart@wdc.com
Office:  +1-507-322-2123

-----Original Message-----
From: Bart Van Assche <bvanassche@acm.org> 
Sent: Friday, September 11, 2020 11:16 PM
To: Jeffrey Lien <Jeff.Lien@wdc.com>; fio@vger.kernel.org
Cc: Siriporn Swart <Siriporn.Swart@wdc.com>; Jeff Furlong <jeff.furlong@wdc.com>; Kris Davis <Kris.Davis@wdc.com>
Subject: Re: Inconsistent Status Intervals

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.


On 2020-09-11 10:10, Jeffrey Lien wrote:
> Do you have any other suggestions or potential fixes to try?  If so, please let us know.  Thanks.

Hi Jeff,

The master branch of https://github.com/bvanassche/fio has been updated. Please retry.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Inconsistent Status Intervals
  2020-09-14 23:22                 ` Siriporn Swart
@ 2020-09-18 19:24                   ` Siriporn Swart
  2020-09-21  1:48                     ` Bart Van Assche
  0 siblings, 1 reply; 13+ messages in thread
From: Siriporn Swart @ 2020-09-18 19:24 UTC (permalink / raw)
  To: Bart Van Assche, Jeffrey Lien, fio; +Cc: Jeff Furlong, Kris Davis

Hello Bart, 

I got addition test results from https://github.com/bvanassche/fio.   Timestamp issue is fixed when running in single job ( --numjobs=1), however on a test with multiple jobs, each logging latency at 1s, there are sporadic deltas of 0.240 - 0.250s in iops and bw log.
Second issue, there are +1 second in lat log of the intended runtime.  I was able to recreate the problem with fio command line in below including snippet of output from my test bench.

fio --name=multijobs_noramp --filename=/dev/nvme1n1 --ioengine=libaio  --rw=randread --bs=4k --direct=1 --iodepth=64 --numjobs=4 --runtime=20m --write_iops_log=multijobs_noramp --write_lat_log=multijobs_noramp --group_reporting=1 --log_avg_msec=1000 --output=multijobs_noramp

cat multijobs_noramp_iops.4.log
1120, 195213, 0, 0, 0
2120, 184542, 0, 0, 0
3120, 184310, 0, 0, 0

cat multijobs_noramp_iops.3.log
1121, 195074, 0, 0, 0
2121, 184694, 0, 0, 0
3121, 184359, 0, 0, 0

cat multijobs_noramp_iops.2.log
1000, 194010, 0, 0, 0
2000, 206103, 0, 0, 0
3000, 206836, 0, 0, 0

cat multijobs_noramp_iops.1.log
1000, 193926, 0, 0, 0
2000, 206252, 0, 0, 0
3000, 206148, 0, 0, 0

cat multijobs_noramp_lat.4.log
1198000, 309048, 0, 0, 0
1199000, 309715, 0, 0, 0
1200000, 309382, 0, 0, 0
1200000, 587410, 0, 0, 0

cat multijobs_noramp_lat.3.log
1198000, 308499, 0, 0, 0
1199000, 308733, 0, 0, 0
1200000, 308176, 0, 0, 0
1200000, 818831, 0, 0, 0

cat multijobs_noramp_lat.2.log
1198000, 310959, 0, 0, 0
1199000, 308661, 0, 0, 0
1200000, 309743, 0, 0, 0
1200000, 1036165, 0, 0, 0

cat multijobs_noramp_lat.1.log
1198000, 308318, 0, 0, 0
1199000, 309911, 0, 0, 0
1200000, 309643, 0, 0, 0
1200000, 1005811, 0, 0, 0

Best Regards
**************************************************
Siriporn Swart 
SIT Lab SSD Performance Test Engineer 

Western Digital®
Email:  siriporn.swart@wdc.com
Office:  +1-507-322-2123

-----Original Message-----
From: Siriporn Swart 
Sent: Monday, September 14, 2020 6:23 PM
To: Bart Van Assche <bvanassche@acm.org>; Jeffrey Lien <Jeff.Lien@wdc.com>; fio@vger.kernel.org
Cc: Jeff Furlong <jeff.furlong@wdc.com>; Kris Davis <Kris.Davis@wdc.com>
Subject: RE: Inconsistent Status Intervals

Hello Bart, 

Thank you very much for your support. Jeff Lien is on vacation this week.    My name is Siriporn from performance test team. 
I ran quick test for 30sec with master branch of https://github.com/bvanassche/fio and result looks good.  I will rerun with longer duration. 
Any update will be sent.  

-----
test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.17-458-g3803
Starting 1 process

 [root@echo-sr635-01 result]# cat test_iops.1.log 1000, 90808, 0, 0, 0 2000, 99240, 0, 0, 0 3000, 99095, 0, 0, 0 4000, 99229, 0, 0, 0 5000, 99270, 0, 0, 0 6000, 99351, 0, 0, 0 7000, 99209, 0, 0, 0 8000, 99299, 0, 0, 0 9000, 99297, 0, 0, 0 10000, 99395, 0, 0, 0 11000, 99292, 0, 0, 0 12000, 99385, 0, 0, 0 13000, 99322, 0, 0, 0 14000, 99293, 0, 0, 0 15000, 99266, 0, 0, 0 16000, 99335, 0, 0, 0 17000, 99367, 0, 0, 0 18000, 99380, 0, 0, 0 19000, 99309, 0, 0, 0 20000, 99290, 0, 0, 0 21000, 99300, 0, 0, 0 22000, 99377, 0, 0, 0 23000, 99381, 0, 0, 0 24000, 99395, 0, 0, 0 25000, 99355, 0, 0, 0 26000, 99371, 0, 0, 0 27000, 99312, 0, 0, 0 28000, 99337, 0, 0, 0 29000, 99264, 0, 0, 0
[root@echo-sr635-01 result]#

Result from fio-3.20
[root@echo-sr635-01 PB319]# cat test_iops.1.log 1000, 84767, 0, 0, 0 2001, 99750, 0, 0, 0 3002, 99973, 0, 0, 0 4003, 99833, 0, 0, 0 5004, 99844, 0, 0, 0 6005, 99817, 0, 0, 0 7006, 99842, 0, 0, 0 8007, 99856, 0, 0, 0 9008, 99824, 0, 0, 0 10009, 99824, 0, 0, 0 11010, 99807, 0, 0, 0 12011, 99874, 0, 0, 0 13012, 99779, 0, 0, 0 14013, 99968, 0, 0, 0 15014, 99954, 0, 0, 0 16015, 99900, 0, 0, 0 17016, 99867, 0, 0, 0 18017, 99812, 0, 0, 0 19018, 99826, 0, 0, 0 20019, 99844, 0, 0, 0 21020, 99833, 0, 0, 0 22021, 99782, 0, 0, 0 23022, 99814, 0, 0, 0 24023, 99875, 0, 0, 0 25024, 99964, 0, 0, 0 26025, 99847, 0, 0, 0 27026, 99859, 0, 0, 0 28027, 99825, 0, 0, 0 29028, 99863, 0, 0, 0

 


Best Regards
**************************************************
Siriporn Swart
SIT Lab SSD Performance Test Engineer 

Western Digital®
Email:  siriporn.swart@wdc.com
Office:  +1-507-322-2123

-----Original Message-----
From: Bart Van Assche <bvanassche@acm.org>
Sent: Friday, September 11, 2020 11:16 PM
To: Jeffrey Lien <Jeff.Lien@wdc.com>; fio@vger.kernel.org
Cc: Siriporn Swart <Siriporn.Swart@wdc.com>; Jeff Furlong <jeff.furlong@wdc.com>; Kris Davis <Kris.Davis@wdc.com>
Subject: Re: Inconsistent Status Intervals

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.


On 2020-09-11 10:10, Jeffrey Lien wrote:
> Do you have any other suggestions or potential fixes to try?  If so, please let us know.  Thanks.

Hi Jeff,

The master branch of https://github.com/bvanassche/fio has been updated. Please retry.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Inconsistent Status Intervals
  2020-09-18 19:24                   ` Siriporn Swart
@ 2020-09-21  1:48                     ` Bart Van Assche
  2020-09-23 17:07                       ` Jeffrey Lien
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2020-09-21  1:48 UTC (permalink / raw)
  To: Siriporn Swart, Jeffrey Lien, fio; +Cc: Jeff Furlong, Kris Davis

On 2020-09-18 12:24, Siriporn Swart wrote:
> I got addition test results from https://github.com/bvanassche/fio.   Timestamp issue is fixed when running in single job ( --numjobs=1), however on a test with multiple jobs, each logging latency at 1s, there are sporadic deltas of 0.240 - 0.250s in iops and bw log.
> Second issue, there are +1 second in lat log of the intended runtime.  I was able to recreate the problem with fio command line in below including snippet of output from my test bench.
> 
> fio --name=multijobs_noramp --filename=/dev/nvme1n1 --ioengine=libaio  --rw=randread --bs=4k --direct=1 --iodepth=64 --numjobs=4 --runtime=20m --write_iops_log=multijobs_noramp --write_lat_log=multijobs_noramp --group_reporting=1 --log_avg_msec=1000 --output=multijobs_noramp

Hi Siriporn,

Thank you for the detailed report. I have done something this weekend that I
should have done much earlier, namely analyze on the context of which thread
the latency logging happens. My conclusion is that this happens on the
context of the I/O threads. Or in other words, the sporadic deltas can't be
a side effect of the changes I made in the helper thread. If someone else
wants to work on eliminating these latency logging deltas I will be happy to
assist.

Bart.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Inconsistent Status Intervals
  2020-09-21  1:48                     ` Bart Van Assche
@ 2020-09-23 17:07                       ` Jeffrey Lien
  2020-09-24  1:12                         ` Bart Van Assche
  0 siblings, 1 reply; 13+ messages in thread
From: Jeffrey Lien @ 2020-09-23 17:07 UTC (permalink / raw)
  To: Bart Van Assche, Siriporn Swart, fio; +Cc: Jeff Furlong, Kris Davis

Hi Bart,
I would be willing to work on the latency logging issue in the IO threads, but don't know where to start.  Do you know which parts and/or functions to start looking at?  

-----Original Message-----
From: Bart Van Assche <bvanassche@acm.org> 
Sent: Sunday, September 20, 2020 8:49 PM
To: Siriporn Swart <Siriporn.Swart@wdc.com>; Jeffrey Lien <Jeff.Lien@wdc.com>; fio@vger.kernel.org
Cc: Jeff Furlong <jeff.furlong@wdc.com>; Kris Davis <Kris.Davis@wdc.com>
Subject: Re: Inconsistent Status Intervals

CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.


On 2020-09-18 12:24, Siriporn Swart wrote:
> I got addition test results from https://github.com/bvanassche/fio.   Timestamp issue is fixed when running in single job ( --numjobs=1), however on a test with multiple jobs, each logging latency at 1s, there are sporadic deltas of 0.240 - 0.250s in iops and bw log.
> Second issue, there are +1 second in lat log of the intended runtime.  I was able to recreate the problem with fio command line in below including snippet of output from my test bench.
>
> fio --name=multijobs_noramp --filename=/dev/nvme1n1 --ioengine=libaio  
> --rw=randread --bs=4k --direct=1 --iodepth=64 --numjobs=4 
> --runtime=20m --write_iops_log=multijobs_noramp 
> --write_lat_log=multijobs_noramp --group_reporting=1 
> --log_avg_msec=1000 --output=multijobs_noramp

Hi Siriporn,

Thank you for the detailed report. I have done something this weekend that I should have done much earlier, namely analyze on the context of which thread the latency logging happens. My conclusion is that this happens on the context of the I/O threads. Or in other words, the sporadic deltas can't be a side effect of the changes I made in the helper thread. If someone else wants to work on eliminating these latency logging deltas I will be happy to assist.

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Inconsistent Status Intervals
  2020-09-23 17:07                       ` Jeffrey Lien
@ 2020-09-24  1:12                         ` Bart Van Assche
  0 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2020-09-24  1:12 UTC (permalink / raw)
  To: Jeffrey Lien, Siriporn Swart, fio; +Cc: Jeff Furlong, Kris Davis

On 2020-09-23 10:07, Jeffrey Lien wrote:
> Hi Bart,
> I would be willing to work on the latency logging issue in the IO threads, but don't know where to start.  Do you know which parts and/or functions to start looking at?  

Hi Jeff,

Please take a look at add_lat_sample() and add_clat_sample() in stat.c. I
think these are called from the I/O path (account_io_completion()). The
__add_log_sample() function is called by these functions and records a
sample in a data structure.
 Bart.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-24  1:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <BYAPR04MB56568021114303501613DB1AEA2D0@BYAPR04MB5656.namprd04.prod.outlook.com>
2020-09-04 20:16 ` Inconsistent Status Intervals Jeffrey Lien
2020-09-05  4:19   ` Bart Van Assche
2020-09-08 13:25     ` Jeffrey Lien
2020-09-08 17:59       ` Jeffrey Lien
2020-09-08 19:09         ` Bart Van Assche
2020-09-08 20:44           ` Jeffrey Lien
2020-09-11 17:10             ` Jeffrey Lien
2020-09-12  4:16               ` Bart Van Assche
2020-09-14 23:22                 ` Siriporn Swart
2020-09-18 19:24                   ` Siriporn Swart
2020-09-21  1:48                     ` Bart Van Assche
2020-09-23 17:07                       ` Jeffrey Lien
2020-09-24  1:12                         ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.