All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: FIO 3.11
       [not found] <BN7PR08MB409869032E10FEC58C9CBE08D9FF0@BN7PR08MB4098.namprd08.prod.outlook.com>
@ 2018-10-17 19:03 ` Jens Axboe
  2018-10-17 21:41   ` Elliott, Robert (Persistent Memory)
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2018-10-17 19:03 UTC (permalink / raw)
  To: fio; +Cc: Etienne-Hugues Fortin




-------- Forwarded Message --------
Subject: 	FIO 3.11
Date: 	Wed, 17 Oct 2018 18:45:03 +0000
From: 	Etienne-Hugues Fortin <efortin@cyberspicace.com>
To: 	axboe@kernel.dk <axboe@kernel.dk>



Hi,

�

Sorry to bother you with those questions but I�ve been unable to find the answers by myself. I�ve been experimenting with FIO for a little over a month but I�m still unsure about those aspects.

�

First, when we create a profile, my assumption is that if I have two jobs without using �stonewall�, each job will run at the same time and will have the same priority. So, if I want to do a job with a 50% read/write but with 25% random and 75% sequential, I would need a job that look like this:

�

[128k_ random]

rw=randread

[128k_ seq_1]

numjobs=3

rw=read

[128k_ random_w]

rw=randwrite

[128k_ seq_1_w]

numjobs=3

rw=write

�

Here,� I have exactly 6 jobs on 8 (75%) sequential and of this, 50% are read. I�ve run this job over a standard NFS server that has SSD as the backend. As I have 4 clients running FIO simultaneous, I�m getting 4 results which are 65/113, 68/108, 67/115, 66/111 MiB. That�s about 37% read, 63% write. What would explain the fact that I�m getting more write then read with this scenario? I�ve also run the job on an all-flash array that do NAS and I got similar results. So, I think this may have with the speed at which the storage is answering. On standard storage unity, write are usually cached in memory so we get acknowledgement faster. I could make the hypothesis that as soon as FIO is getting an answer, it will resend a new IO right away which would then skew the results the way I�m seeing it. Is that what is happening? If so, outside of trying to work the numjobs required to get about 50% R/W, is there a better way of doing this?

�

Related to the same simulation, I saw that we can also use rw and randrw to combine read/write in sequential/random workload. Then, we can set the percentage of the mixed with rwmixread and/or rwmixwrite. What I�ve not been able to find is if the read/write are made on each file or not. With a single jobs, it has to be. However, if I put numjobs=10, will all of the 10 files got read and write or I can have 40% of the file to get the write and 60% of them to get read? My current thinking is that the mixed workload is executed on each file. If that�s the case, that�s not what I want to simulate. Is there another way of doing this?

�

I also did some testing on raw device (block storage) and I tried the latency target profile. I can see that if I give it the latency target needed (0.75 ms) with a 75 percentile, it will get to about the level of IOPS that the unit is able to sustain while being under the target, 75% of the time. However, I�m not seeing the iothread gradually increasing as the test is progressing. As such, creating the final job file is still not as simple as looking at the result and taking the iothread that make sense. Did I missed something in the results or the latency profile job? My job file is the following:

�

[global]

ioengine=libaio

invalidate=1

iodepth=32

direct=1

latency_target=750

latency_window=30s

latency_percentile=75

per_job_logs=0

group_reporting

bs=32k

rw=randrw

rwmixread=60

write_bw_log=blocks

write_lat_log=blocks

write_iops_log=blocks

�

[sdb-32k]

filename=/dev/sdb

[sdc-32k]

filename=/dev/sdc

[sdd-32k]

filename=/dev/sdd

[sde-32k]

filename=/dev/sde

�

The last thing is about the bw/iops/lat logs. I�ve used them in the latency test and in some other jobfile and the filename format I�m getting is always filename_[bw|clat|iops|lat|slat].log.server_name

�

With this format, the fio_generate_plots and fio2gnuplot can�t find any files. It seems those program expect to find files in the format *_bw.log so with no servername at the end. Is there a way to get the servername at the beginning of the filename? The documentation seems to indicate that it should end in .log but that�s not what I�m getting up to now.

�

Thank you for your assistance and keep up the good work on such a great tool.

�

Etienne Fortin



^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: FIO 3.11
  2018-10-17 19:03 ` Fwd: FIO 3.11 Jens Axboe
@ 2018-10-17 21:41   ` Elliott, Robert (Persistent Memory)
  2018-10-18  0:42     ` Etienne-Hugues Fortin
  2018-10-18  4:19     ` Sitsofe Wheeler
  0 siblings, 2 replies; 5+ messages in thread
From: Elliott, Robert (Persistent Memory) @ 2018-10-17 21:41 UTC (permalink / raw)
  To: Etienne-Hugues Fortin, fio, 'Jens Axboe'

> First, when we create a profile, my assumption is that if I have two jobs without using 'stonewall',
> each job will run at the same time and will have the same priority. So, if I want to do a job with a
> 50% read/write but with 25% random and 75% sequential, I would need a job that look like this:
...
> If so, outside of trying to work the numjobs required to get about 50% R/W, is there a
> better way of doing this?

The OS controls how different threads progress, and there's nothing keeping them in
lockstep.

These will make each thread implement your desired mix:

       rwmixread=int
              Percentage of a mixed workload that should be reads. Default: 50.

       rwmixwrite=int
              Percentage of a mixed workload that should be writes. If both rwmixread and
              rwmixwrite is given and the values do not add up to 100%, the latter of the two
              will be used to override the first.

       percentage_random=int[,int][,int]
              For  a  random  workload,  set how big a percentage should be random. This defaults
              to 100%, in which case the workload is fully random. It can be set from anywhere
              from 0 to 100. Setting it to 0 would make the workload fully sequential. Any setting
              in between will result in a random mix of sequential and  random  I/O,  at  the
              given percentages. Comma-separated values may be specified for reads, writes, and
              trims as described in blocksize.

Threads running sequential accesses can easily benefit from cache hits from each other, if
there is any caching or prefetching done by the involved drivers or devices.  One thread
takes the lead and suffers delays, while the others benefit from its work and stay close
behind.  They can take turns, but tend to stay clustered together. This can distort results.
Random accesses avoid that problem, provided the capacity is much larger than any caches.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: FIO 3.11
  2018-10-17 21:41   ` Elliott, Robert (Persistent Memory)
@ 2018-10-18  0:42     ` Etienne-Hugues Fortin
  2018-10-18  4:19     ` Sitsofe Wheeler
  1 sibling, 0 replies; 5+ messages in thread
From: Etienne-Hugues Fortin @ 2018-10-18  0:42 UTC (permalink / raw)
  To: Elliott, Robert (Persistent Memory), fio, 'Jens Axboe'

Hello,

Well, that explain it. I just tested it on my new test environment and it seems to work fine. Read and write are a lot more aligned this way. This seems to work well on block device (LUN) but performance are not as good on files that are on NAS. It keep the proper ratio between read and write but at a slower pace compared to my previous setup that was segmented by jobs (read sequential, write sequential, read random, write random). By manually adjusting numjobs on each, I was able to get the proper read/write ratio needed and as my jobs are on large files, I was able to see from the storage that the disks where providing the data, not the cache.

I guess I'll have to play around to see if that's really what is happening but at first, it look like it.

Thank you for your answer. It corrected my assumptions which will help going forward.


Etienne

-----Message d'origine-----
De : Elliott, Robert (Persistent Memory) <elliott@hpe.com> 
Envoyé : October 17, 2018 5:42 PM
À : Etienne-Hugues Fortin <efortin@cyberspicace.com>; fio@vger.kernel.org; 'Jens Axboe' <axboe@kernel.dk>
Objet : RE: FIO 3.11

> First, when we create a profile, my assumption is that if I have two 
> jobs without using 'stonewall', each job will run at the same time and 
> will have the same priority. So, if I want to do a job with a 50% read/write but with 25% random and 75% sequential, I would need a job that look like this:
...
> If so, outside of trying to work the numjobs required to get about 50% 
> R/W, is there a better way of doing this?

The OS controls how different threads progress, and there's nothing keeping them in lockstep.

These will make each thread implement your desired mix:

       rwmixread=int
              Percentage of a mixed workload that should be reads. Default: 50.

       rwmixwrite=int
              Percentage of a mixed workload that should be writes. If both rwmixread and
              rwmixwrite is given and the values do not add up to 100%, the latter of the two
              will be used to override the first.

       percentage_random=int[,int][,int]
              For  a  random  workload,  set how big a percentage should be random. This defaults
              to 100%, in which case the workload is fully random. It can be set from anywhere
              from 0 to 100. Setting it to 0 would make the workload fully sequential. Any setting
              in between will result in a random mix of sequential and  random  I/O,  at  the
              given percentages. Comma-separated values may be specified for reads, writes, and
              trims as described in blocksize.

Threads running sequential accesses can easily benefit from cache hits from each other, if there is any caching or prefetching done by the involved drivers or devices.  One thread takes the lead and suffers delays, while the others benefit from its work and stay close behind.  They can take turns, but tend to stay clustered together. This can distort results.
Random accesses avoid that problem, provided the capacity is much larger than any caches.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FIO 3.11
  2018-10-17 21:41   ` Elliott, Robert (Persistent Memory)
  2018-10-18  0:42     ` Etienne-Hugues Fortin
@ 2018-10-18  4:19     ` Sitsofe Wheeler
  2018-10-18 10:26       ` Etienne-Hugues Fortin
  1 sibling, 1 reply; 5+ messages in thread
From: Sitsofe Wheeler @ 2018-10-18  4:19 UTC (permalink / raw)
  To: Elliott, Robert (Persistent Memory); +Cc: efortin, fio, Jens Axboe

On Wed, 17 Oct 2018 at 22:45, Elliott, Robert (Persistent Memory)
<elliott@hpe.com> wrote:
>
>
> These will make each thread implement your desired mix:
>
>        rwmixread=int
[...]
>        rwmixwrite=int
[...]
>        percentage_random=int[,int][,int]
[...]

Another (but more overlooked) option for forcing different jobs into
lockstep with each other is the flow parameter -
https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-flow
but it tends to be easiest to reason about when there are just two
jobs.

>
> Threads running sequential accesses can easily benefit from cache hits from each other, if
> there is any caching or prefetching done by the involved drivers or devices.  One thread
> takes the lead and suffers delays, while the others benefit from its work and stay close
> behind.  They can take turns, but tend to stay clustered together. This can distort results.
> Random accesses avoid that problem, provided the capacity is much larger than any caches.

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: FIO 3.11
  2018-10-18  4:19     ` Sitsofe Wheeler
@ 2018-10-18 10:26       ` Etienne-Hugues Fortin
  0 siblings, 0 replies; 5+ messages in thread
From: Etienne-Hugues Fortin @ 2018-10-18 10:26 UTC (permalink / raw)
  To: Sitsofe Wheeler, Elliott, Robert (Persistent Memory); +Cc: fio, Jens Axboe

Hi,

I tried the flow feature on a two jobs workload as a simple test but when using fio in VM as load generator, fio use 100% of the CPU in each VM. Is that expected? As a result, there was basically no workload read/write to the disks. I may have misconfigured something but I did use it as in the butterfly seek pattern example replacing the jobs by one for read and a second one for write. I put flow=7 in first job and flow=3 in the second job to get 70%/30%. As it was using all the resources, I didn't push it more and try it on physical servers as that's not how we want to operate when doing load simulation.

I don't have this issue when using fio without flow in those same VM.

Let me know if you have suggestion around flow with VM, I'm willing to test it further if you think it should work without overloading the CPU in such a setup.

Thank you.

Etienne

-----Message d'origine-----
De : Sitsofe Wheeler <sitsofe@gmail.com> 
Envoyé : October 18, 2018 12:19 AM
À : Elliott, Robert (Persistent Memory) <elliott@hpe.com>
Cc : efortin@cyberspicace.com; fio <fio@vger.kernel.org>; Jens Axboe <axboe@kernel.dk>
Objet : Re: FIO 3.11

On Wed, 17 Oct 2018 at 22:45, Elliott, Robert (Persistent Memory) <elliott@hpe.com> wrote:
>
>
> These will make each thread implement your desired mix:
>
>        rwmixread=int
[...]
>        rwmixwrite=int
[...]
>        percentage_random=int[,int][,int]
[...]

Another (but more overlooked) option for forcing different jobs into lockstep with each other is the flow parameter -
https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-flow but it tends to be easiest to reason about when there are just two jobs.

>
> Threads running sequential accesses can easily benefit from cache hits 
> from each other, if there is any caching or prefetching done by the 
> involved drivers or devices.  One thread takes the lead and suffers 
> delays, while the others benefit from its work and stay close behind.  They can take turns, but tend to stay clustered together. This can distort results.
> Random accesses avoid that problem, provided the capacity is much larger than any caches.

--
Sitsofe | https://nam03.safelinks.protection.outlook.com/?url=http:%2F%2Fsucs.org%2F~sits%2F&amp;data=02%7C01%7C%7C2de2c820b1974b8382a108d634b0ef20%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636754331855075106&amp;sdata=20GnbqizCda6HeFLBoGmyHwLjtd%2BbBGeVrTkgua0vgE%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-10-18 10:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <BN7PR08MB409869032E10FEC58C9CBE08D9FF0@BN7PR08MB4098.namprd08.prod.outlook.com>
2018-10-17 19:03 ` Fwd: FIO 3.11 Jens Axboe
2018-10-17 21:41   ` Elliott, Robert (Persistent Memory)
2018-10-18  0:42     ` Etienne-Hugues Fortin
2018-10-18  4:19     ` Sitsofe Wheeler
2018-10-18 10:26       ` Etienne-Hugues Fortin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.