All of lore.kernel.org
 help / color / mirror / Atom feed
* CPUs, threads, and speed
@ 2020-01-15 15:50 Mauricio Tavares
  2020-01-15 17:28 ` Gruher, Joseph R
  2020-01-15 21:33 ` Elliott, Robert (Servers)
  0 siblings, 2 replies; 18+ messages in thread
From: Mauricio Tavares @ 2020-01-15 15:50 UTC (permalink / raw)
  To: fio

Let's say I have a config file to preload drive that looks like this
(stolen from https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill_4KRandom_NVMe.ini)

[global]
name=4k random write 4 ios in the queue in 32 queues
filename=/dev/nvme0n1
ioengine=libaio
direct=1
bs=4k
rw=randwrite
iodepth=4
numjobs=32
buffered=0
size=100%
loops=2
randrepeat=0
norandommap
refill_buffers

[job1]

That is taking a ton of time, like days to go. Is there anything I can
do to speed it up? For instance, what is the default value for
cpus_allowed (or cpumask)[2]? Is it all CPUs? If not what would I gain
by throwing more cpus at the problem?

I also read[2] by default fio uses fork. What would I get by going to threads?

[2] https://fio.readthedocs.io/en/latest/fio_doc.html#threads-processes-and-job-synchronization


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: CPUs, threads, and speed
  2020-01-15 15:50 CPUs, threads, and speed Mauricio Tavares
@ 2020-01-15 17:28 ` Gruher, Joseph R
  2020-01-15 18:04   ` Andrey Kuzmin
  2020-01-15 18:33   ` Kudryavtsev, Andrey O
  2020-01-15 21:33 ` Elliott, Robert (Servers)
  1 sibling, 2 replies; 18+ messages in thread
From: Gruher, Joseph R @ 2020-01-15 17:28 UTC (permalink / raw)
  To: Mauricio Tavares, fio

> -----Original Message-----
> From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> Mauricio Tavares
> Sent: Wednesday, January 15, 2020 7:51 AM
> To: fio@vger.kernel.org
> Subject: CPUs, threads, and speed
> 
> Let's say I have a config file to preload drive that looks like this (stolen from
> https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> _4KRandom_NVMe.ini)
> 
> [global]
> name=4k random write 4 ios in the queue in 32 queues
> filename=/dev/nvme0n1
> ioengine=libaio
> direct=1
> bs=4k
> rw=randwrite
> iodepth=4
> numjobs=32
> buffered=0
> size=100%
> loops=2
> randrepeat=0
> norandommap
> refill_buffers
> 
> [job1]
> 
> That is taking a ton of time, like days to go. Is there anything I can do to speed it
> up? 

When you say preload, do you just want to write in the full capacity of the drive?  A sequential workload with larger blocks will be faster, like:

[global]
ioengine=libaio
thread=1
direct=1
bs=128k
rw=write
numjobs=1
iodepth=128
size=100%
loops=2
[job00]
filename=/dev/nvme0n1

Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.

-Joe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-15 17:28 ` Gruher, Joseph R
@ 2020-01-15 18:04   ` Andrey Kuzmin
  2020-01-15 18:29     ` Mauricio Tavares
  2020-01-15 18:33   ` Kudryavtsev, Andrey O
  1 sibling, 1 reply; 18+ messages in thread
From: Andrey Kuzmin @ 2020-01-15 18:04 UTC (permalink / raw)
  To: Gruher, Joseph R; +Cc: Mauricio Tavares, fio

On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
<joseph.r.gruher@intel.com> wrote:
>
> > -----Original Message-----
> > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > Mauricio Tavares
> > Sent: Wednesday, January 15, 2020 7:51 AM
> > To: fio@vger.kernel.org
> > Subject: CPUs, threads, and speed
> >
> > Let's say I have a config file to preload drive that looks like this (stolen from
> > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > _4KRandom_NVMe.ini)
> >
> > [global]
> > name=4k random write 4 ios in the queue in 32 queues
> > filename=/dev/nvme0n1
> > ioengine=libaio
> > direct=1
> > bs=4k
> > rw=randwrite
> > iodepth=4
> > numjobs=32
> > buffered=0
> > size=100%
> > loops=2
> > randrepeat=0
> > norandommap
> > refill_buffers
> >
> > [job1]
> >
> > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > up?
>
> When you say preload, do you just want to write in the full capacity of the drive?

I believe that preload here means what in SSD world is called drive
preconditioning. It means bringing a fresh drive into steady mode
where it gives you the true performance in production over months of
use rather than the unrealistic fresh drive random write IOPS.

> A sequential workload with larger blocks will be faster,

No, you cannot get the job done by sequential writes since it doesn't
populate FTL translation tables like random writes do.

As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
worth of random writes. At today speeds, that should take just a
couple of hours.

Regards,
Andrey

> like:
>
> [global]
> ioengine=libaio
> thread=1
> direct=1
> bs=128k
> rw=write
> numjobs=1
> iodepth=128
> size=100%
> loops=2
> [job00]
> filename=/dev/nvme0n1
>
> Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
>
> -Joe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-15 18:04   ` Andrey Kuzmin
@ 2020-01-15 18:29     ` Mauricio Tavares
  2020-01-15 19:00       ` Andrey Kuzmin
  0 siblings, 1 reply; 18+ messages in thread
From: Mauricio Tavares @ 2020-01-15 18:29 UTC (permalink / raw)
  To: Andrey Kuzmin; +Cc: Gruher, Joseph R, fio

On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
>
> On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
> <joseph.r.gruher@intel.com> wrote:
> >
> > > -----Original Message-----
> > > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > > Mauricio Tavares
> > > Sent: Wednesday, January 15, 2020 7:51 AM
> > > To: fio@vger.kernel.org
> > > Subject: CPUs, threads, and speed
> > >
> > > Let's say I have a config file to preload drive that looks like this (stolen from
> > > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > > _4KRandom_NVMe.ini)
> > >
> > > [global]
> > > name=4k random write 4 ios in the queue in 32 queues
> > > filename=/dev/nvme0n1
> > > ioengine=libaio
> > > direct=1
> > > bs=4k
> > > rw=randwrite
> > > iodepth=4
> > > numjobs=32
> > > buffered=0
> > > size=100%
> > > loops=2
> > > randrepeat=0
> > > norandommap
> > > refill_buffers
> > >
> > > [job1]
> > >
> > > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > > up?
> >
> > When you say preload, do you just want to write in the full capacity of the drive?
>
> I believe that preload here means what in SSD world is called drive
> preconditioning. It means bringing a fresh drive into steady mode
> where it gives you the true performance in production over months of
> use rather than the unrealistic fresh drive random write IOPS.
>
> > A sequential workload with larger blocks will be faster,
>
> No, you cannot get the job done by sequential writes since it doesn't
> populate FTL translation tables like random writes do.
>
> As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
> worth of random writes. At today speeds, that should take just a
> couple of hours.
>
      When you say 2xcapacity worth of random writes, do you mean just
setting size=200%?

> Regards,
> Andrey
>
> > like:
> >
> > [global]
> > ioengine=libaio
> > thread=1
> > direct=1
> > bs=128k
> > rw=write
> > numjobs=1
> > iodepth=128
> > size=100%
> > loops=2
> > [job00]
> > filename=/dev/nvme0n1
> >
> > Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
> >
> > -Joe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-15 17:28 ` Gruher, Joseph R
  2020-01-15 18:04   ` Andrey Kuzmin
@ 2020-01-15 18:33   ` Kudryavtsev, Andrey O
  1 sibling, 0 replies; 18+ messages in thread
From: Kudryavtsev, Andrey O @ 2020-01-15 18:33 UTC (permalink / raw)
  To: Gruher, Joseph R, Mauricio Tavares, fio

I can clarify that as I posted the original script on github. 
Sequential preconditioning is mandatory for bandwidth test. Random 4k preconditioning is for everything else. For all mixed scenarios data has to be randomized, so, that puts highest pressure on the drive (and internally WAF in case of NAND SSD). That makes all following benchmarks fair.  
Norandommap - I agree in general, but the FIO overhead of tracking LBAs impacts the performance and extends pre-fill time. 

-- 
Andrey Kudryavtsev, 
SSD Solution Architect
Intel Corp. 


On 1/15/20, 9:29 AM, "fio-owner@vger.kernel.org on behalf of Gruher, Joseph R" <fio-owner@vger.kernel.org on behalf of joseph.r.gruher@intel.com> wrote:

    > -----Original Message-----
    > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
    > Mauricio Tavares
    > Sent: Wednesday, January 15, 2020 7:51 AM
    > To: fio@vger.kernel.org
    > Subject: CPUs, threads, and speed
    > 
    > Let's say I have a config file to preload drive that looks like this (stolen from
    > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
    > _4KRandom_NVMe.ini)
    > 
    > [global]
    > name=4k random write 4 ios in the queue in 32 queues
    > filename=/dev/nvme0n1
    > ioengine=libaio
    > direct=1
    > bs=4k
    > rw=randwrite
    > iodepth=4
    > numjobs=32
    > buffered=0
    > size=100%
    > loops=2
    > randrepeat=0
    > norandommap
    > refill_buffers
    > 
    > [job1]
    > 
    > That is taking a ton of time, like days to go. Is there anything I can do to speed it
    > up? 
    
    When you say preload, do you just want to write in the full capacity of the drive?  A sequential workload with larger blocks will be faster, like:
    
    [global]
    ioengine=libaio
    thread=1
    direct=1
    bs=128k
    rw=write
    numjobs=1
    iodepth=128
    size=100%
    loops=2
    [job00]
    filename=/dev/nvme0n1
    
    Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
    
    -Joe
    


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-15 18:29     ` Mauricio Tavares
@ 2020-01-15 19:00       ` Andrey Kuzmin
  2020-01-15 20:36         ` Mauricio Tavares
  0 siblings, 1 reply; 18+ messages in thread
From: Andrey Kuzmin @ 2020-01-15 19:00 UTC (permalink / raw)
  To: Mauricio Tavares; +Cc: Gruher, Joseph R, fio

On Wed, Jan 15, 2020 at 9:29 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
>
> On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> >
> > On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
> > <joseph.r.gruher@intel.com> wrote:
> > >
> > > > -----Original Message-----
> > > > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > > > Mauricio Tavares
> > > > Sent: Wednesday, January 15, 2020 7:51 AM
> > > > To: fio@vger.kernel.org
> > > > Subject: CPUs, threads, and speed
> > > >
> > > > Let's say I have a config file to preload drive that looks like this (stolen from
> > > > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > > > _4KRandom_NVMe.ini)
> > > >
> > > > [global]
> > > > name=4k random write 4 ios in the queue in 32 queues
> > > > filename=/dev/nvme0n1
> > > > ioengine=libaio
> > > > direct=1
> > > > bs=4k
> > > > rw=randwrite
> > > > iodepth=4
> > > > numjobs=32
> > > > buffered=0
> > > > size=100%
> > > > loops=2
> > > > randrepeat=0
> > > > norandommap
> > > > refill_buffers
> > > >
> > > > [job1]
> > > >
> > > > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > > > up?
> > >
> > > When you say preload, do you just want to write in the full capacity of the drive?
> >
> > I believe that preload here means what in SSD world is called drive
> > preconditioning. It means bringing a fresh drive into steady mode
> > where it gives you the true performance in production over months of
> > use rather than the unrealistic fresh drive random write IOPS.
> >
> > > A sequential workload with larger blocks will be faster,
> >
> > No, you cannot get the job done by sequential writes since it doesn't
> > populate FTL translation tables like random writes do.
> >
> > As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
> > worth of random writes. At today speeds, that should take just a
> > couple of hours.
> >
>       When you say 2xcapacity worth of random writes, do you mean just
> setting size=200%?

Right.

Regards,
Andrey
>
> > Regards,
> > Andrey
> >
> > > like:
> > >
> > > [global]
> > > ioengine=libaio
> > > thread=1
> > > direct=1
> > > bs=128k
> > > rw=write
> > > numjobs=1
> > > iodepth=128
> > > size=100%
> > > loops=2
> > > [job00]
> > > filename=/dev/nvme0n1
> > >
> > > Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
> > >
> > > -Joe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-15 19:00       ` Andrey Kuzmin
@ 2020-01-15 20:36         ` Mauricio Tavares
  2020-01-16  6:59           ` Andrey Kuzmin
  0 siblings, 1 reply; 18+ messages in thread
From: Mauricio Tavares @ 2020-01-15 20:36 UTC (permalink / raw)
  To: Andrey Kuzmin; +Cc: Gruher, Joseph R, fio

On Wed, Jan 15, 2020 at 2:00 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
>
> On Wed, Jan 15, 2020 at 9:29 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> >
> > On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > >
> > > On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
> > > <joseph.r.gruher@intel.com> wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > > > > Mauricio Tavares
> > > > > Sent: Wednesday, January 15, 2020 7:51 AM
> > > > > To: fio@vger.kernel.org
> > > > > Subject: CPUs, threads, and speed
> > > > >
> > > > > Let's say I have a config file to preload drive that looks like this (stolen from
> > > > > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > > > > _4KRandom_NVMe.ini)
> > > > >
> > > > > [global]
> > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > filename=/dev/nvme0n1
> > > > > ioengine=libaio
> > > > > direct=1
> > > > > bs=4k
> > > > > rw=randwrite
> > > > > iodepth=4
> > > > > numjobs=32
> > > > > buffered=0
> > > > > size=100%
> > > > > loops=2
> > > > > randrepeat=0
> > > > > norandommap
> > > > > refill_buffers
> > > > >
> > > > > [job1]
> > > > >
> > > > > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > > > > up?
> > > >
> > > > When you say preload, do you just want to write in the full capacity of the drive?
> > >
> > > I believe that preload here means what in SSD world is called drive
> > > preconditioning. It means bringing a fresh drive into steady mode
> > > where it gives you the true performance in production over months of
> > > use rather than the unrealistic fresh drive random write IOPS.
> > >
> > > > A sequential workload with larger blocks will be faster,
> > >
> > > No, you cannot get the job done by sequential writes since it doesn't
> > > populate FTL translation tables like random writes do.
> > >
> > > As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
> > > worth of random writes. At today speeds, that should take just a
> > > couple of hours.
> > >
> >       When you say 2xcapacity worth of random writes, do you mean just
> > setting size=200%?
>
> Right.
>
      Then I wonder what I am doing wrong now. I changed the config file to

[root@testbox tests]# cat preload.conf
[global]
name=4k random write 4 ios in the queue in 32 queues
ioengine=libaio
direct=1
bs=4k
rw=randwrite
iodepth=4
numjobs=32
buffered=0
size=200%
loops=2
random_generator=tausworthe64
thread=1

[job1]
filename=/dev/nvme0n1
[root@testbox tests]#

but when I run it, now it spits out much larger eta times:

Jobs: 32 (f=32): [w(32)][0.0%][w=382MiB/s][w=97.7k IOPS][eta
16580099d:14h:55m:27s]]

Compare with what I was getting with size=100%

 Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]

> Regards,
> Andrey
> >
> > > Regards,
> > > Andrey
> > >
> > > > like:
> > > >
> > > > [global]
> > > > ioengine=libaio
> > > > thread=1
> > > > direct=1
> > > > bs=128k
> > > > rw=write
> > > > numjobs=1
> > > > iodepth=128
> > > > size=100%
> > > > loops=2
> > > > [job00]
> > > > filename=/dev/nvme0n1
> > > >
> > > > Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
> > > >
> > > > -Joe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: CPUs, threads, and speed
  2020-01-15 15:50 CPUs, threads, and speed Mauricio Tavares
  2020-01-15 17:28 ` Gruher, Joseph R
@ 2020-01-15 21:33 ` Elliott, Robert (Servers)
  2020-01-15 22:39   ` Mauricio Tavares
  1 sibling, 1 reply; 18+ messages in thread
From: Elliott, Robert (Servers) @ 2020-01-15 21:33 UTC (permalink / raw)
  To: Mauricio Tavares, fio



> -----Original Message-----
> From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> Mauricio Tavares
> Sent: Wednesday, January 15, 2020 9:51 AM
> Subject: CPUs, threads, and speed
> 
...
> [global]
> name=4k random write 4 ios in the queue in 32 queues
> filename=/dev/nvme0n1
> ioengine=libaio
> direct=1
> bs=4k
> rw=randwrite
> iodepth=4
> numjobs=32
> buffered=0
> size=100%
> loops=2
> randrepeat=0
> norandommap
> refill_buffers
> 
> [job1]
> 
> That is taking a ton of time, like days to go. Is there anything I can
> do to speed it up? For instance, what is the default value for
> cpus_allowed (or cpumask)[2]? Is it all CPUs? If not what would I gain
> by throwing more cpus at the problem?
> 
> I also read[2] by default fio uses fork. What would I get by going to
> threads?

> Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]

77 kIOPs for random writes isn't bad - check your drive data sheet.
If the drive is 1 TB, it should take 
    1 TB / (77k * 4 KiB) = 3170 s = 52.8 minutes 
to write the whole drive.

Best practice is to use all CPU cores, lock threads to cores, and
be NUMA aware. If the device is attached to physical CPU 0 and that CPU
has 12 cores known to linux as 0-11 (per "lscpu" or "numactl --hardware"),
try:
  iodepth=16
  numjobs=12
  cpus_allowed=0-11
  cpus_allowed_policy=split

Based on these:
  numjobs=32, size=100%, loops=2
fio will run each job for that many bytes, so a 1 TB drive will result 
in IOs for 64 TB rather than 1 TB. That could easily result in the
multi-day estimate.

Other nits:
* thread - threading might be slightly more efficient than 
  spawning full processes
* gtod_reduce=1 - precision latency measurements don't matter for this
* refill_buffers - presuming you don't care about the data contents,
  don't include this. zero_buffers is the simplest/fastest, unless you're
  concerned that the device might do compression or zero detection
* norandommap - if you want it to hit each LBA a precise number
  of times, you can't include this; fio won't remember what it's 
  done. There is a lot of overhead in keeping track, though.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-15 21:33 ` Elliott, Robert (Servers)
@ 2020-01-15 22:39   ` Mauricio Tavares
  2020-01-16  0:49     ` Ming Lei
  0 siblings, 1 reply; 18+ messages in thread
From: Mauricio Tavares @ 2020-01-15 22:39 UTC (permalink / raw)
  To: Elliott, Robert (Servers); +Cc: fio

On Wed, Jan 15, 2020 at 4:33 PM Elliott, Robert (Servers)
<elliott@hpe.com> wrote:
>
>
>
> > -----Original Message-----
> > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > Mauricio Tavares
> > Sent: Wednesday, January 15, 2020 9:51 AM
> > Subject: CPUs, threads, and speed
> >
> ...
> > [global]
> > name=4k random write 4 ios in the queue in 32 queues
> > filename=/dev/nvme0n1
> > ioengine=libaio
> > direct=1
> > bs=4k
> > rw=randwrite
> > iodepth=4
> > numjobs=32
> > buffered=0
> > size=100%
> > loops=2
> > randrepeat=0
> > norandommap
> > refill_buffers
> >
> > [job1]
> >
> > That is taking a ton of time, like days to go. Is there anything I can
> > do to speed it up? For instance, what is the default value for
> > cpus_allowed (or cpumask)[2]? Is it all CPUs? If not what would I gain
> > by throwing more cpus at the problem?
> >
> > I also read[2] by default fio uses fork. What would I get by going to
> > threads?
>
> > Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]
>
> 77 kIOPs for random writes isn't bad - check your drive data sheet.
> If the drive is 1 TB, it should take
>     1 TB / (77k * 4 KiB) = 3170 s = 52.8 minutes
> to write the whole drive.
>
      Since the drive is 4TB, we are talking about 3.5h to complete
the task, right?

> Best practice is to use all CPU cores, lock threads to cores, and
> be NUMA aware. If the device is attached to physical CPU 0 and that CPU
> has 12 cores known to linux as 0-11 (per "lscpu" or "numactl --hardware"),

I have two CPUs with 16 cores each; I thought that meant numjobs=32.
If Iw as wrong, I learned something new!

> try:
>   iodepth=16
>   numjobs=12
>   cpus_allowed=0-11
>   cpus_allowed_policy=split
>
> Based on these:
>   numjobs=32, size=100%, loops=2
> fio will run each job for that many bytes, so a 1 TB drive will result
> in IOs for 64 TB rather than 1 TB. That could easily result in the
> multi-day estimate.
>
      Let's see if I understand this: your 64TB number came from 32*1TB*1*2?

> Other nits:
> * thread - threading might be slightly more efficient than
>   spawning full processes
> * gtod_reduce=1 - precision latency measurements don't matter for this
> * refill_buffers - presuming you don't care about the data contents,
>   don't include this. zero_buffers is the simplest/fastest, unless you're
>   concerned that the device might do compression or zero detection
> * norandommap - if you want it to hit each LBA a precise number
>   of times, you can't include this; fio won't remember what it's
>   done. There is a lot of overhead in keeping track, though.
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-15 22:39   ` Mauricio Tavares
@ 2020-01-16  0:49     ` Ming Lei
  0 siblings, 0 replies; 18+ messages in thread
From: Ming Lei @ 2020-01-16  0:49 UTC (permalink / raw)
  To: Mauricio Tavares; +Cc: Elliott, Robert (Servers), fio

On Thu, Jan 16, 2020 at 8:15 AM Mauricio Tavares <raubvogel@gmail.com> wrote:
>
> On Wed, Jan 15, 2020 at 4:33 PM Elliott, Robert (Servers)
> <elliott@hpe.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > > Mauricio Tavares
> > > Sent: Wednesday, January 15, 2020 9:51 AM
> > > Subject: CPUs, threads, and speed
> > >
> > ...
> > > [global]
> > > name=4k random write 4 ios in the queue in 32 queues
> > > filename=/dev/nvme0n1
> > > ioengine=libaio
> > > direct=1
> > > bs=4k
> > > rw=randwrite
> > > iodepth=4
> > > numjobs=32
> > > buffered=0
> > > size=100%
> > > loops=2
> > > randrepeat=0
> > > norandommap
> > > refill_buffers
> > >
> > > [job1]
> > >
> > > That is taking a ton of time, like days to go. Is there anything I can
> > > do to speed it up? For instance, what is the default value for
> > > cpus_allowed (or cpumask)[2]? Is it all CPUs? If not what would I gain
> > > by throwing more cpus at the problem?
> > >
> > > I also read[2] by default fio uses fork. What would I get by going to
> > > threads?
> >
> > > Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]
> >
> > 77 kIOPs for random writes isn't bad - check your drive data sheet.
> > If the drive is 1 TB, it should take
> >     1 TB / (77k * 4 KiB) = 3170 s = 52.8 minutes
> > to write the whole drive.
> >
>       Since the drive is 4TB, we are talking about 3.5h to complete
> the task, right?
>
> > Best practice is to use all CPU cores, lock threads to cores, and
> > be NUMA aware. If the device is attached to physical CPU 0 and that CPU
> > has 12 cores known to linux as 0-11 (per "lscpu" or "numactl --hardware"),
>
> I have two CPUs with 16 cores each; I thought that meant numjobs=32.
> If Iw as wrong, I learned something new!

Not sure 32 jobs is good, which will write the drive 32 times, and each job
runs the random write on the whole drive. Does that preconditioning need to
run random write over the drive so many times?

Thanks,
Ming Lei


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-15 20:36         ` Mauricio Tavares
@ 2020-01-16  6:59           ` Andrey Kuzmin
  2020-01-16 16:12             ` Mauricio Tavares
  0 siblings, 1 reply; 18+ messages in thread
From: Andrey Kuzmin @ 2020-01-16  6:59 UTC (permalink / raw)
  To: Mauricio Tavares; +Cc: Gruher, Joseph R, fio

On Wed, Jan 15, 2020 at 11:36 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
>
> On Wed, Jan 15, 2020 at 2:00 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> >
> > On Wed, Jan 15, 2020 at 9:29 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > >
> > > On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > >
> > > > On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
> > > > <joseph.r.gruher@intel.com> wrote:
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > > > > > Mauricio Tavares
> > > > > > Sent: Wednesday, January 15, 2020 7:51 AM
> > > > > > To: fio@vger.kernel.org
> > > > > > Subject: CPUs, threads, and speed
> > > > > >
> > > > > > Let's say I have a config file to preload drive that looks like this (stolen from
> > > > > > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > > > > > _4KRandom_NVMe.ini)
> > > > > >
> > > > > > [global]
> > > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > > filename=/dev/nvme0n1
> > > > > > ioengine=libaio
> > > > > > direct=1
> > > > > > bs=4k
> > > > > > rw=randwrite
> > > > > > iodepth=4
> > > > > > numjobs=32
> > > > > > buffered=0
> > > > > > size=100%
> > > > > > loops=2
> > > > > > randrepeat=0
> > > > > > norandommap
> > > > > > refill_buffers
> > > > > >
> > > > > > [job1]
> > > > > >
> > > > > > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > > > > > up?
> > > > >
> > > > > When you say preload, do you just want to write in the full capacity of the drive?
> > > >
> > > > I believe that preload here means what in SSD world is called drive
> > > > preconditioning. It means bringing a fresh drive into steady mode
> > > > where it gives you the true performance in production over months of
> > > > use rather than the unrealistic fresh drive random write IOPS.
> > > >
> > > > > A sequential workload with larger blocks will be faster,
> > > >
> > > > No, you cannot get the job done by sequential writes since it doesn't
> > > > populate FTL translation tables like random writes do.
> > > >
> > > > As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
> > > > worth of random writes. At today speeds, that should take just a
> > > > couple of hours.
> > > >
> > >       When you say 2xcapacity worth of random writes, do you mean just
> > > setting size=200%?
> >
> > Right.
> >
>       Then I wonder what I am doing wrong now. I changed the config file to
>
> [root@testbox tests]# cat preload.conf
> [global]
> name=4k random write 4 ios in the queue in 32 queues
> ioengine=libaio
> direct=1
> bs=4k
> rw=randwrite
> iodepth=4
> numjobs=32
> buffered=0
> size=200%
> loops=2
> random_generator=tausworthe64
> thread=1
>
> [job1]
> filename=/dev/nvme0n1
> [root@testbox tests]#
>
> but when I run it, now it spits out much larger eta times:
>
> Jobs: 32 (f=32): [w(32)][0.0%][w=382MiB/s][w=97.7k IOPS][eta
> 16580099d:14h:55m:27s]]

 Size is set on per thread basis, so you're doing 32x200%x2 loops=128
drive capacities here.

Also, using 32 threads doesn't improve anything. 2 (and even one)
threads with qd=128 will push the drive
to its limits.

Regards,
Andrey
>
> Compare with what I was getting with size=100%
>
>  Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]
>
> > Regards,
> > Andrey
> > >
> > > > Regards,
> > > > Andrey
> > > >
> > > > > like:
> > > > >
> > > > > [global]
> > > > > ioengine=libaio
> > > > > thread=1
> > > > > direct=1
> > > > > bs=128k
> > > > > rw=write
> > > > > numjobs=1
> > > > > iodepth=128
> > > > > size=100%
> > > > > loops=2
> > > > > [job00]
> > > > > filename=/dev/nvme0n1
> > > > >
> > > > > Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
> > > > >
> > > > > -Joe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-16  6:59           ` Andrey Kuzmin
@ 2020-01-16 16:12             ` Mauricio Tavares
  2020-01-16 17:03               ` Andrey Kuzmin
  2020-01-16 17:25               ` Jared Walton
  0 siblings, 2 replies; 18+ messages in thread
From: Mauricio Tavares @ 2020-01-16 16:12 UTC (permalink / raw)
  To: fio

On Thu, Jan 16, 2020 at 2:00 AM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
>
> On Wed, Jan 15, 2020 at 11:36 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> >
> > On Wed, Jan 15, 2020 at 2:00 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > >
> > > On Wed, Jan 15, 2020 at 9:29 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > > >
> > > > On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
> > > > > <joseph.r.gruher@intel.com> wrote:
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > > > > > > Mauricio Tavares
> > > > > > > Sent: Wednesday, January 15, 2020 7:51 AM
> > > > > > > To: fio@vger.kernel.org
> > > > > > > Subject: CPUs, threads, and speed
> > > > > > >
> > > > > > > Let's say I have a config file to preload drive that looks like this (stolen from
> > > > > > > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > > > > > > _4KRandom_NVMe.ini)
> > > > > > >
> > > > > > > [global]
> > > > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > > > filename=/dev/nvme0n1
> > > > > > > ioengine=libaio
> > > > > > > direct=1
> > > > > > > bs=4k
> > > > > > > rw=randwrite
> > > > > > > iodepth=4
> > > > > > > numjobs=32
> > > > > > > buffered=0
> > > > > > > size=100%
> > > > > > > loops=2
> > > > > > > randrepeat=0
> > > > > > > norandommap
> > > > > > > refill_buffers
> > > > > > >
> > > > > > > [job1]
> > > > > > >
> > > > > > > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > > > > > > up?
> > > > > >
> > > > > > When you say preload, do you just want to write in the full capacity of the drive?
> > > > >
> > > > > I believe that preload here means what in SSD world is called drive
> > > > > preconditioning. It means bringing a fresh drive into steady mode
> > > > > where it gives you the true performance in production over months of
> > > > > use rather than the unrealistic fresh drive random write IOPS.
> > > > >
> > > > > > A sequential workload with larger blocks will be faster,
> > > > >
> > > > > No, you cannot get the job done by sequential writes since it doesn't
> > > > > populate FTL translation tables like random writes do.
> > > > >
> > > > > As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
> > > > > worth of random writes. At today speeds, that should take just a
> > > > > couple of hours.
> > > > >
> > > >       When you say 2xcapacity worth of random writes, do you mean just
> > > > setting size=200%?
> > >
> > > Right.
> > >
> >       Then I wonder what I am doing wrong now. I changed the config file to
> >
> > [root@testbox tests]# cat preload.conf
> > [global]
> > name=4k random write 4 ios in the queue in 32 queues
> > ioengine=libaio
> > direct=1
> > bs=4k
> > rw=randwrite
> > iodepth=4
> > numjobs=32
> > buffered=0
> > size=200%
> > loops=2
> > random_generator=tausworthe64
> > thread=1
> >
> > [job1]
> > filename=/dev/nvme0n1
> > [root@testbox tests]#
> >
> > but when I run it, now it spits out much larger eta times:
> >
> > Jobs: 32 (f=32): [w(32)][0.0%][w=382MiB/s][w=97.7k IOPS][eta
> > 16580099d:14h:55m:27s]]
>
>  Size is set on per thread basis, so you're doing 32x200%x2 loops=128
> drive capacities here.
>
> Also, using 32 threads doesn't improve anything. 2 (and even one)
> threads with qd=128 will push the drive
> to its limits.
>
     Update: so I redid the config file a bit to pass some of the
arguments from command line, and cut down number of jobs and loops.
And I ran it again, this time sequential write to the drive I have not
touched to see how fast it was going to go. My eta is still
astronomical:

[root@testbox tests]# cat preload_fio.conf
[global]
name=4k random
ioengine=${ioengine}
direct=1
bs=${bs_size}
rw=${iotype}
iodepth=4
numjobs=1
buffered=0
size=200%
loops=1

[job1]
filename=${devicename}
[root@testbox tests]# devicename=/dev/nvme1n1 ioengine=libaio
iotype=write bs_size=128k ~/dev/fio/fio ./preload_fio.conf
job1: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T)
128KiB-128KiB, ioengine=libaio, iodepth=4
fio-3.17-68-g3f1e
Starting 1 process
Jobs: 1 (f=1): [W(1)][0.0%][w=1906MiB/s][w=15.2k IOPS][eta 108616d:00h:00m:24s]

> Regards,
> Andrey
> >
> > Compare with what I was getting with size=100%
> >
> >  Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]
> >
> > > Regards,
> > > Andrey
> > > >
> > > > > Regards,
> > > > > Andrey
> > > > >
> > > > > > like:
> > > > > >
> > > > > > [global]
> > > > > > ioengine=libaio
> > > > > > thread=1
> > > > > > direct=1
> > > > > > bs=128k
> > > > > > rw=write
> > > > > > numjobs=1
> > > > > > iodepth=128
> > > > > > size=100%
> > > > > > loops=2
> > > > > > [job00]
> > > > > > filename=/dev/nvme0n1
> > > > > >
> > > > > > Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
> > > > > >
> > > > > > -Joe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-16 16:12             ` Mauricio Tavares
@ 2020-01-16 17:03               ` Andrey Kuzmin
  2020-01-16 17:25               ` Jared Walton
  1 sibling, 0 replies; 18+ messages in thread
From: Andrey Kuzmin @ 2020-01-16 17:03 UTC (permalink / raw)
  To: Mauricio Tavares; +Cc: fio

On Thu, Jan 16, 2020 at 7:13 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
>
> On Thu, Jan 16, 2020 at 2:00 AM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> >
> > On Wed, Jan 15, 2020 at 11:36 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > >
> > > On Wed, Jan 15, 2020 at 2:00 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > >
> > > > On Wed, Jan 15, 2020 at 9:29 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
> > > > > > <joseph.r.gruher@intel.com> wrote:
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > > > > > > > Mauricio Tavares
> > > > > > > > Sent: Wednesday, January 15, 2020 7:51 AM
> > > > > > > > To: fio@vger.kernel.org
> > > > > > > > Subject: CPUs, threads, and speed
> > > > > > > >
> > > > > > > > Let's say I have a config file to preload drive that looks like this (stolen from
> > > > > > > > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > > > > > > > _4KRandom_NVMe.ini)
> > > > > > > >
> > > > > > > > [global]
> > > > > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > > > > filename=/dev/nvme0n1
> > > > > > > > ioengine=libaio
> > > > > > > > direct=1
> > > > > > > > bs=4k
> > > > > > > > rw=randwrite
> > > > > > > > iodepth=4
> > > > > > > > numjobs=32
> > > > > > > > buffered=0
> > > > > > > > size=100%
> > > > > > > > loops=2
> > > > > > > > randrepeat=0
> > > > > > > > norandommap
> > > > > > > > refill_buffers
> > > > > > > >
> > > > > > > > [job1]
> > > > > > > >
> > > > > > > > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > > > > > > > up?
> > > > > > >
> > > > > > > When you say preload, do you just want to write in the full capacity of the drive?
> > > > > >
> > > > > > I believe that preload here means what in SSD world is called drive
> > > > > > preconditioning. It means bringing a fresh drive into steady mode
> > > > > > where it gives you the true performance in production over months of
> > > > > > use rather than the unrealistic fresh drive random write IOPS.
> > > > > >
> > > > > > > A sequential workload with larger blocks will be faster,
> > > > > >
> > > > > > No, you cannot get the job done by sequential writes since it doesn't
> > > > > > populate FTL translation tables like random writes do.
> > > > > >
> > > > > > As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
> > > > > > worth of random writes. At today speeds, that should take just a
> > > > > > couple of hours.
> > > > > >
> > > > >       When you say 2xcapacity worth of random writes, do you mean just
> > > > > setting size=200%?
> > > >
> > > > Right.
> > > >
> > >       Then I wonder what I am doing wrong now. I changed the config file to
> > >
> > > [root@testbox tests]# cat preload.conf
> > > [global]
> > > name=4k random write 4 ios in the queue in 32 queues
> > > ioengine=libaio
> > > direct=1
> > > bs=4k
> > > rw=randwrite
> > > iodepth=4
> > > numjobs=32
> > > buffered=0
> > > size=200%
> > > loops=2
> > > random_generator=tausworthe64
> > > thread=1
> > >
> > > [job1]
> > > filename=/dev/nvme0n1
> > > [root@testbox tests]#
> > >
> > > but when I run it, now it spits out much larger eta times:
> > >
> > > Jobs: 32 (f=32): [w(32)][0.0%][w=382MiB/s][w=97.7k IOPS][eta
> > > 16580099d:14h:55m:27s]]
> >
> >  Size is set on per thread basis, so you're doing 32x200%x2 loops=128
> > drive capacities here.
> >
> > Also, using 32 threads doesn't improve anything. 2 (and even one)
> > threads with qd=128 will push the drive
> > to its limits.
> >
>      Update: so I redid the config file a bit to pass some of the
> arguments from command line, and cut down number of jobs and loops.
> And I ran it again, this time sequential write to the drive I have not
> touched to see how fast it was going to go. My eta is still
> astronomical:
>
> [root@testbox tests]# cat preload_fio.conf
> [global]
> name=4k random
> ioengine=${ioengine}
> direct=1
> bs=${bs_size}
> rw=${iotype}
> iodepth=4
> numjobs=1
> buffered=0
> size=200%
> loops=1
>
> [job1]
> filename=${devicename}
> [root@testbox tests]# devicename=/dev/nvme1n1 ioengine=libaio
> iotype=write bs_size=128k ~/dev/fio/fio ./preload_fio.conf
> job1: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T)
> 128KiB-128KiB, ioengine=libaio, iodepth=4
> fio-3.17-68-g3f1e
> Starting 1 process
> Jobs: 1 (f=1): [W(1)][0.0%][w=1906MiB/s][w=15.2k IOPS][eta 108616d:00h:00m:24s]

At almost 2GBps, your run against 4TB drive should take about an hor.
I'm not sure if fio properly reports ETA for size-based jobs at all.
You should check with Jens.

Regards,
Andrey

>
> > Regards,
> > Andrey
> > >
> > > Compare with what I was getting with size=100%
> > >
> > >  Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]
> > >
> > > > Regards,
> > > > Andrey
> > > > >
> > > > > > Regards,
> > > > > > Andrey
> > > > > >
> > > > > > > like:
> > > > > > >
> > > > > > > [global]
> > > > > > > ioengine=libaio
> > > > > > > thread=1
> > > > > > > direct=1
> > > > > > > bs=128k
> > > > > > > rw=write
> > > > > > > numjobs=1
> > > > > > > iodepth=128
> > > > > > > size=100%
> > > > > > > loops=2
> > > > > > > [job00]
> > > > > > > filename=/dev/nvme0n1
> > > > > > >
> > > > > > > Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
> > > > > > >
> > > > > > > -Joe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-16 16:12             ` Mauricio Tavares
  2020-01-16 17:03               ` Andrey Kuzmin
@ 2020-01-16 17:25               ` Jared Walton
  2020-01-16 18:39                 ` Andrey Kuzmin
  1 sibling, 1 reply; 18+ messages in thread
From: Jared Walton @ 2020-01-16 17:25 UTC (permalink / raw)
  To: Mauricio Tavares; +Cc: fio

Not sure if this will help, but I use the following to prep multiple
4TB drives at the same time in a little over an hour. Is it inelegant,
yes, but it works for me.

globalFIOParameters="--offset=0 --ioengine=libaio --invalidate=1
--group_reporting --direct=1 --thread --refill_buffers --norandommap
--randrepeat=0 --allow_mounted_write=1 --output-format=json,normal"

# Drives should be FOB or LLF'd (if it's good to do that)
# LLF logic

# 128k Pre-Condition
# Write to entire disk
for i in `ls -1 /dev/nvme*n1`
do
    size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
print $1 }')
./fio --name=PreconditionPass1of3 --filename=${i} --iodepth=$iodepth
--bs=128k --rw=write --size=${size} --fill_device=1
$globalFIOParameters &
done
wait

# Read entire disk
for i in `ls -1 /dev/nvme*n1`
do
    size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
print $1 }')
./fio --name=PreconditionPass2of3 --filename=${i} --iodepth=$iodepth
--bs=128k --rw=read --size=${size} --fill_device=1
$globalFIOParameters &
done
wait

# Write to entire disk one last time
for i in `ls -1 /dev/nvme*n1`
do
    size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
print $1 }')
./fio --name=PreconditionPass3of3 --filename=${i} --iodepth=$iodepth
--bs=128k --rw=write --size=${size} --fill_device=1
$globalFIOParameters &
done
wait


# Check 128k steady-state
for i in `ls -1 /dev/nvme*n1`
do
./fio --name=SteadyState --filename=${i} --iodepth=16 --numjobs=16
--bs=4k --rw=read --ss_dur=1800 --ss=iops_slope:0.3% --runtime=24h
$globalFIOParameters &
done
wait

On Thu, Jan 16, 2020 at 9:13 AM Mauricio Tavares <raubvogel@gmail.com> wrote:
>
> On Thu, Jan 16, 2020 at 2:00 AM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> >
> > On Wed, Jan 15, 2020 at 11:36 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > >
> > > On Wed, Jan 15, 2020 at 2:00 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > >
> > > > On Wed, Jan 15, 2020 at 9:29 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
> > > > > > <joseph.r.gruher@intel.com> wrote:
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > > > > > > > Mauricio Tavares
> > > > > > > > Sent: Wednesday, January 15, 2020 7:51 AM
> > > > > > > > To: fio@vger.kernel.org
> > > > > > > > Subject: CPUs, threads, and speed
> > > > > > > >
> > > > > > > > Let's say I have a config file to preload drive that looks like this (stolen from
> > > > > > > > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > > > > > > > _4KRandom_NVMe.ini)
> > > > > > > >
> > > > > > > > [global]
> > > > > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > > > > filename=/dev/nvme0n1
> > > > > > > > ioengine=libaio
> > > > > > > > direct=1
> > > > > > > > bs=4k
> > > > > > > > rw=randwrite
> > > > > > > > iodepth=4
> > > > > > > > numjobs=32
> > > > > > > > buffered=0
> > > > > > > > size=100%
> > > > > > > > loops=2
> > > > > > > > randrepeat=0
> > > > > > > > norandommap
> > > > > > > > refill_buffers
> > > > > > > >
> > > > > > > > [job1]
> > > > > > > >
> > > > > > > > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > > > > > > > up?
> > > > > > >
> > > > > > > When you say preload, do you just want to write in the full capacity of the drive?
> > > > > >
> > > > > > I believe that preload here means what in SSD world is called drive
> > > > > > preconditioning. It means bringing a fresh drive into steady mode
> > > > > > where it gives you the true performance in production over months of
> > > > > > use rather than the unrealistic fresh drive random write IOPS.
> > > > > >
> > > > > > > A sequential workload with larger blocks will be faster,
> > > > > >
> > > > > > No, you cannot get the job done by sequential writes since it doesn't
> > > > > > populate FTL translation tables like random writes do.
> > > > > >
> > > > > > As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
> > > > > > worth of random writes. At today speeds, that should take just a
> > > > > > couple of hours.
> > > > > >
> > > > >       When you say 2xcapacity worth of random writes, do you mean just
> > > > > setting size=200%?
> > > >
> > > > Right.
> > > >
> > >       Then I wonder what I am doing wrong now. I changed the config file to
> > >
> > > [root@testbox tests]# cat preload.conf
> > > [global]
> > > name=4k random write 4 ios in the queue in 32 queues
> > > ioengine=libaio
> > > direct=1
> > > bs=4k
> > > rw=randwrite
> > > iodepth=4
> > > numjobs=32
> > > buffered=0
> > > size=200%
> > > loops=2
> > > random_generator=tausworthe64
> > > thread=1
> > >
> > > [job1]
> > > filename=/dev/nvme0n1
> > > [root@testbox tests]#
> > >
> > > but when I run it, now it spits out much larger eta times:
> > >
> > > Jobs: 32 (f=32): [w(32)][0.0%][w=382MiB/s][w=97.7k IOPS][eta
> > > 16580099d:14h:55m:27s]]
> >
> >  Size is set on per thread basis, so you're doing 32x200%x2 loops=128
> > drive capacities here.
> >
> > Also, using 32 threads doesn't improve anything. 2 (and even one)
> > threads with qd=128 will push the drive
> > to its limits.
> >
>      Update: so I redid the config file a bit to pass some of the
> arguments from command line, and cut down number of jobs and loops.
> And I ran it again, this time sequential write to the drive I have not
> touched to see how fast it was going to go. My eta is still
> astronomical:
>
> [root@testbox tests]# cat preload_fio.conf
> [global]
> name=4k random
> ioengine=${ioengine}
> direct=1
> bs=${bs_size}
> rw=${iotype}
> iodepth=4
> numjobs=1
> buffered=0
> size=200%
> loops=1
>
> [job1]
> filename=${devicename}
> [root@testbox tests]# devicename=/dev/nvme1n1 ioengine=libaio
> iotype=write bs_size=128k ~/dev/fio/fio ./preload_fio.conf
> job1: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T)
> 128KiB-128KiB, ioengine=libaio, iodepth=4
> fio-3.17-68-g3f1e
> Starting 1 process
> Jobs: 1 (f=1): [W(1)][0.0%][w=1906MiB/s][w=15.2k IOPS][eta 108616d:00h:00m:24s]
>
> > Regards,
> > Andrey
> > >
> > > Compare with what I was getting with size=100%
> > >
> > >  Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]
> > >
> > > > Regards,
> > > > Andrey
> > > > >
> > > > > > Regards,
> > > > > > Andrey
> > > > > >
> > > > > > > like:
> > > > > > >
> > > > > > > [global]
> > > > > > > ioengine=libaio
> > > > > > > thread=1
> > > > > > > direct=1
> > > > > > > bs=128k
> > > > > > > rw=write
> > > > > > > numjobs=1
> > > > > > > iodepth=128
> > > > > > > size=100%
> > > > > > > loops=2
> > > > > > > [job00]
> > > > > > > filename=/dev/nvme0n1
> > > > > > >
> > > > > > > Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
> > > > > > >
> > > > > > > -Joe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-16 17:25               ` Jared Walton
@ 2020-01-16 18:39                 ` Andrey Kuzmin
  2020-01-16 19:03                   ` Jared Walton
  0 siblings, 1 reply; 18+ messages in thread
From: Andrey Kuzmin @ 2020-01-16 18:39 UTC (permalink / raw)
  To: Jared Walton; +Cc: Mauricio Tavares, fio

On Thu, Jan 16, 2020 at 9:31 PM Jared Walton <jawalking@gmail.com> wrote:
>
> Not sure if this will help, but I use the following to prep multiple
> 4TB drives at the same time in a little over an hour.

You seem to be preconditioning with sequential writes only, and
further doing so
with essentially single write frontier.

That doesn't stress FTL maps enough and doesn't trigger any substantial garbage
collection since SSD is intelligent enough to spot sequential write
workload with
128K sequential (re)writes.

So what you're doing is only good for bandwidth measurements, and if
this steady
state is applied to random IOPS profiling, you'd be getting highly
inflated results.

Regards,
Andrey

> Is it inelegant, yes, but it works for me.
>
> globalFIOParameters="--offset=0 --ioengine=libaio --invalidate=1
> --group_reporting --direct=1 --thread --refill_buffers --norandommap
> --randrepeat=0 --allow_mounted_write=1 --output-format=json,normal"
>
> # Drives should be FOB or LLF'd (if it's good to do that)
> # LLF logic
>
> # 128k Pre-Condition
> # Write to entire disk
> for i in `ls -1 /dev/nvme*n1`
> do
>     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> print $1 }')
> ./fio --name=PreconditionPass1of3 --filename=${i} --iodepth=$iodepth
> --bs=128k --rw=write --size=${size} --fill_device=1
> $globalFIOParameters &
> done
> wait
>
> # Read entire disk
> for i in `ls -1 /dev/nvme*n1`
> do
>     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> print $1 }')
> ./fio --name=PreconditionPass2of3 --filename=${i} --iodepth=$iodepth
> --bs=128k --rw=read --size=${size} --fill_device=1
> $globalFIOParameters &
> done
> wait
>
> # Write to entire disk one last time
> for i in `ls -1 /dev/nvme*n1`
> do
>     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> print $1 }')
> ./fio --name=PreconditionPass3of3 --filename=${i} --iodepth=$iodepth
> --bs=128k --rw=write --size=${size} --fill_device=1
> $globalFIOParameters &
> done
> wait
>
>
> # Check 128k steady-state
> for i in `ls -1 /dev/nvme*n1`
> do
> ./fio --name=SteadyState --filename=${i} --iodepth=16 --numjobs=16
> --bs=4k --rw=read --ss_dur=1800 --ss=iops_slope:0.3% --runtime=24h
> $globalFIOParameters &
> done
> wait
>
> On Thu, Jan 16, 2020 at 9:13 AM Mauricio Tavares <raubvogel@gmail.com> wrote:
> >
> > On Thu, Jan 16, 2020 at 2:00 AM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > >
> > > On Wed, Jan 15, 2020 at 11:36 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > > >
> > > > On Wed, Jan 15, 2020 at 2:00 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jan 15, 2020 at 9:29 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
> > > > > > > <joseph.r.gruher@intel.com> wrote:
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > > > > > > > > Mauricio Tavares
> > > > > > > > > Sent: Wednesday, January 15, 2020 7:51 AM
> > > > > > > > > To: fio@vger.kernel.org
> > > > > > > > > Subject: CPUs, threads, and speed
> > > > > > > > >
> > > > > > > > > Let's say I have a config file to preload drive that looks like this (stolen from
> > > > > > > > > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > > > > > > > > _4KRandom_NVMe.ini)
> > > > > > > > >
> > > > > > > > > [global]
> > > > > > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > > > > > filename=/dev/nvme0n1
> > > > > > > > > ioengine=libaio
> > > > > > > > > direct=1
> > > > > > > > > bs=4k
> > > > > > > > > rw=randwrite
> > > > > > > > > iodepth=4
> > > > > > > > > numjobs=32
> > > > > > > > > buffered=0
> > > > > > > > > size=100%
> > > > > > > > > loops=2
> > > > > > > > > randrepeat=0
> > > > > > > > > norandommap
> > > > > > > > > refill_buffers
> > > > > > > > >
> > > > > > > > > [job1]
> > > > > > > > >
> > > > > > > > > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > > > > > > > > up?
> > > > > > > >
> > > > > > > > When you say preload, do you just want to write in the full capacity of the drive?
> > > > > > >
> > > > > > > I believe that preload here means what in SSD world is called drive
> > > > > > > preconditioning. It means bringing a fresh drive into steady mode
> > > > > > > where it gives you the true performance in production over months of
> > > > > > > use rather than the unrealistic fresh drive random write IOPS.
> > > > > > >
> > > > > > > > A sequential workload with larger blocks will be faster,
> > > > > > >
> > > > > > > No, you cannot get the job done by sequential writes since it doesn't
> > > > > > > populate FTL translation tables like random writes do.
> > > > > > >
> > > > > > > As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
> > > > > > > worth of random writes. At today speeds, that should take just a
> > > > > > > couple of hours.
> > > > > > >
> > > > > >       When you say 2xcapacity worth of random writes, do you mean just
> > > > > > setting size=200%?
> > > > >
> > > > > Right.
> > > > >
> > > >       Then I wonder what I am doing wrong now. I changed the config file to
> > > >
> > > > [root@testbox tests]# cat preload.conf
> > > > [global]
> > > > name=4k random write 4 ios in the queue in 32 queues
> > > > ioengine=libaio
> > > > direct=1
> > > > bs=4k
> > > > rw=randwrite
> > > > iodepth=4
> > > > numjobs=32
> > > > buffered=0
> > > > size=200%
> > > > loops=2
> > > > random_generator=tausworthe64
> > > > thread=1
> > > >
> > > > [job1]
> > > > filename=/dev/nvme0n1
> > > > [root@testbox tests]#
> > > >
> > > > but when I run it, now it spits out much larger eta times:
> > > >
> > > > Jobs: 32 (f=32): [w(32)][0.0%][w=382MiB/s][w=97.7k IOPS][eta
> > > > 16580099d:14h:55m:27s]]
> > >
> > >  Size is set on per thread basis, so you're doing 32x200%x2 loops=128
> > > drive capacities here.
> > >
> > > Also, using 32 threads doesn't improve anything. 2 (and even one)
> > > threads with qd=128 will push the drive
> > > to its limits.
> > >
> >      Update: so I redid the config file a bit to pass some of the
> > arguments from command line, and cut down number of jobs and loops.
> > And I ran it again, this time sequential write to the drive I have not
> > touched to see how fast it was going to go. My eta is still
> > astronomical:
> >
> > [root@testbox tests]# cat preload_fio.conf
> > [global]
> > name=4k random
> > ioengine=${ioengine}
> > direct=1
> > bs=${bs_size}
> > rw=${iotype}
> > iodepth=4
> > numjobs=1
> > buffered=0
> > size=200%
> > loops=1
> >
> > [job1]
> > filename=${devicename}
> > [root@testbox tests]# devicename=/dev/nvme1n1 ioengine=libaio
> > iotype=write bs_size=128k ~/dev/fio/fio ./preload_fio.conf
> > job1: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T)
> > 128KiB-128KiB, ioengine=libaio, iodepth=4
> > fio-3.17-68-g3f1e
> > Starting 1 process
> > Jobs: 1 (f=1): [W(1)][0.0%][w=1906MiB/s][w=15.2k IOPS][eta 108616d:00h:00m:24s]
> >
> > > Regards,
> > > Andrey
> > > >
> > > > Compare with what I was getting with size=100%
> > > >
> > > >  Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]
> > > >
> > > > > Regards,
> > > > > Andrey
> > > > > >
> > > > > > > Regards,
> > > > > > > Andrey
> > > > > > >
> > > > > > > > like:
> > > > > > > >
> > > > > > > > [global]
> > > > > > > > ioengine=libaio
> > > > > > > > thread=1
> > > > > > > > direct=1
> > > > > > > > bs=128k
> > > > > > > > rw=write
> > > > > > > > numjobs=1
> > > > > > > > iodepth=128
> > > > > > > > size=100%
> > > > > > > > loops=2
> > > > > > > > [job00]
> > > > > > > > filename=/dev/nvme0n1
> > > > > > > >
> > > > > > > > Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
> > > > > > > >
> > > > > > > > -Joe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-16 18:39                 ` Andrey Kuzmin
@ 2020-01-16 19:03                   ` Jared Walton
  2020-01-17 22:08                     ` Matthew Eaton
  0 siblings, 1 reply; 18+ messages in thread
From: Jared Walton @ 2020-01-16 19:03 UTC (permalink / raw)
  To: Andrey Kuzmin; +Cc: Mauricio Tavares, fio

Correct, I pre-condition for IOPS testing by utilizing the the last if
block, only using randwrite. which will run random writes for about
45min, until a steady state is achieved.

On Thu, Jan 16, 2020 at 11:40 AM Andrey Kuzmin
<andrey.v.kuzmin@gmail.com> wrote:
>
> On Thu, Jan 16, 2020 at 9:31 PM Jared Walton <jawalking@gmail.com> wrote:
> >
> > Not sure if this will help, but I use the following to prep multiple
> > 4TB drives at the same time in a little over an hour.
>
> You seem to be preconditioning with sequential writes only, and
> further doing so
> with essentially single write frontier.
>
> That doesn't stress FTL maps enough and doesn't trigger any substantial garbage
> collection since SSD is intelligent enough to spot sequential write
> workload with
> 128K sequential (re)writes.
>
> So what you're doing is only good for bandwidth measurements, and if
> this steady
> state is applied to random IOPS profiling, you'd be getting highly
> inflated results.
>
> Regards,
> Andrey
>
> > Is it inelegant, yes, but it works for me.
> >
> > globalFIOParameters="--offset=0 --ioengine=libaio --invalidate=1
> > --group_reporting --direct=1 --thread --refill_buffers --norandommap
> > --randrepeat=0 --allow_mounted_write=1 --output-format=json,normal"
> >
> > # Drives should be FOB or LLF'd (if it's good to do that)
> > # LLF logic
> >
> > # 128k Pre-Condition
> > # Write to entire disk
> > for i in `ls -1 /dev/nvme*n1`
> > do
> >     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> > print $1 }')
> > ./fio --name=PreconditionPass1of3 --filename=${i} --iodepth=$iodepth
> > --bs=128k --rw=write --size=${size} --fill_device=1
> > $globalFIOParameters &
> > done
> > wait
> >
> > # Read entire disk
> > for i in `ls -1 /dev/nvme*n1`
> > do
> >     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> > print $1 }')
> > ./fio --name=PreconditionPass2of3 --filename=${i} --iodepth=$iodepth
> > --bs=128k --rw=read --size=${size} --fill_device=1
> > $globalFIOParameters &
> > done
> > wait
> >
> > # Write to entire disk one last time
> > for i in `ls -1 /dev/nvme*n1`
> > do
> >     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> > print $1 }')
> > ./fio --name=PreconditionPass3of3 --filename=${i} --iodepth=$iodepth
> > --bs=128k --rw=write --size=${size} --fill_device=1
> > $globalFIOParameters &
> > done
> > wait
> >
> >
> > # Check 128k steady-state
> > for i in `ls -1 /dev/nvme*n1`
> > do
> > ./fio --name=SteadyState --filename=${i} --iodepth=16 --numjobs=16
> > --bs=4k --rw=read --ss_dur=1800 --ss=iops_slope:0.3% --runtime=24h
> > $globalFIOParameters &
> > done
> > wait
> >
> > On Thu, Jan 16, 2020 at 9:13 AM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > >
> > > On Thu, Jan 16, 2020 at 2:00 AM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > >
> > > > On Wed, Jan 15, 2020 at 11:36 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jan 15, 2020 at 2:00 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Jan 15, 2020 at 9:29 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
> > > > > > > > <joseph.r.gruher@intel.com> wrote:
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > > > > > > > > > Mauricio Tavares
> > > > > > > > > > Sent: Wednesday, January 15, 2020 7:51 AM
> > > > > > > > > > To: fio@vger.kernel.org
> > > > > > > > > > Subject: CPUs, threads, and speed
> > > > > > > > > >
> > > > > > > > > > Let's say I have a config file to preload drive that looks like this (stolen from
> > > > > > > > > > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > > > > > > > > > _4KRandom_NVMe.ini)
> > > > > > > > > >
> > > > > > > > > > [global]
> > > > > > > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > > > > > > filename=/dev/nvme0n1
> > > > > > > > > > ioengine=libaio
> > > > > > > > > > direct=1
> > > > > > > > > > bs=4k
> > > > > > > > > > rw=randwrite
> > > > > > > > > > iodepth=4
> > > > > > > > > > numjobs=32
> > > > > > > > > > buffered=0
> > > > > > > > > > size=100%
> > > > > > > > > > loops=2
> > > > > > > > > > randrepeat=0
> > > > > > > > > > norandommap
> > > > > > > > > > refill_buffers
> > > > > > > > > >
> > > > > > > > > > [job1]
> > > > > > > > > >
> > > > > > > > > > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > > > > > > > > > up?
> > > > > > > > >
> > > > > > > > > When you say preload, do you just want to write in the full capacity of the drive?
> > > > > > > >
> > > > > > > > I believe that preload here means what in SSD world is called drive
> > > > > > > > preconditioning. It means bringing a fresh drive into steady mode
> > > > > > > > where it gives you the true performance in production over months of
> > > > > > > > use rather than the unrealistic fresh drive random write IOPS.
> > > > > > > >
> > > > > > > > > A sequential workload with larger blocks will be faster,
> > > > > > > >
> > > > > > > > No, you cannot get the job done by sequential writes since it doesn't
> > > > > > > > populate FTL translation tables like random writes do.
> > > > > > > >
> > > > > > > > As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
> > > > > > > > worth of random writes. At today speeds, that should take just a
> > > > > > > > couple of hours.
> > > > > > > >
> > > > > > >       When you say 2xcapacity worth of random writes, do you mean just
> > > > > > > setting size=200%?
> > > > > >
> > > > > > Right.
> > > > > >
> > > > >       Then I wonder what I am doing wrong now. I changed the config file to
> > > > >
> > > > > [root@testbox tests]# cat preload.conf
> > > > > [global]
> > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > ioengine=libaio
> > > > > direct=1
> > > > > bs=4k
> > > > > rw=randwrite
> > > > > iodepth=4
> > > > > numjobs=32
> > > > > buffered=0
> > > > > size=200%
> > > > > loops=2
> > > > > random_generator=tausworthe64
> > > > > thread=1
> > > > >
> > > > > [job1]
> > > > > filename=/dev/nvme0n1
> > > > > [root@testbox tests]#
> > > > >
> > > > > but when I run it, now it spits out much larger eta times:
> > > > >
> > > > > Jobs: 32 (f=32): [w(32)][0.0%][w=382MiB/s][w=97.7k IOPS][eta
> > > > > 16580099d:14h:55m:27s]]
> > > >
> > > >  Size is set on per thread basis, so you're doing 32x200%x2 loops=128
> > > > drive capacities here.
> > > >
> > > > Also, using 32 threads doesn't improve anything. 2 (and even one)
> > > > threads with qd=128 will push the drive
> > > > to its limits.
> > > >
> > >      Update: so I redid the config file a bit to pass some of the
> > > arguments from command line, and cut down number of jobs and loops.
> > > And I ran it again, this time sequential write to the drive I have not
> > > touched to see how fast it was going to go. My eta is still
> > > astronomical:
> > >
> > > [root@testbox tests]# cat preload_fio.conf
> > > [global]
> > > name=4k random
> > > ioengine=${ioengine}
> > > direct=1
> > > bs=${bs_size}
> > > rw=${iotype}
> > > iodepth=4
> > > numjobs=1
> > > buffered=0
> > > size=200%
> > > loops=1
> > >
> > > [job1]
> > > filename=${devicename}
> > > [root@testbox tests]# devicename=/dev/nvme1n1 ioengine=libaio
> > > iotype=write bs_size=128k ~/dev/fio/fio ./preload_fio.conf
> > > job1: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T)
> > > 128KiB-128KiB, ioengine=libaio, iodepth=4
> > > fio-3.17-68-g3f1e
> > > Starting 1 process
> > > Jobs: 1 (f=1): [W(1)][0.0%][w=1906MiB/s][w=15.2k IOPS][eta 108616d:00h:00m:24s]
> > >
> > > > Regards,
> > > > Andrey
> > > > >
> > > > > Compare with what I was getting with size=100%
> > > > >
> > > > >  Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]
> > > > >
> > > > > > Regards,
> > > > > > Andrey
> > > > > > >
> > > > > > > > Regards,
> > > > > > > > Andrey
> > > > > > > >
> > > > > > > > > like:
> > > > > > > > >
> > > > > > > > > [global]
> > > > > > > > > ioengine=libaio
> > > > > > > > > thread=1
> > > > > > > > > direct=1
> > > > > > > > > bs=128k
> > > > > > > > > rw=write
> > > > > > > > > numjobs=1
> > > > > > > > > iodepth=128
> > > > > > > > > size=100%
> > > > > > > > > loops=2
> > > > > > > > > [job00]
> > > > > > > > > filename=/dev/nvme0n1
> > > > > > > > >
> > > > > > > > > Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
> > > > > > > > >
> > > > > > > > > -Joe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-16 19:03                   ` Jared Walton
@ 2020-01-17 22:08                     ` Matthew Eaton
  2020-01-24 20:39                       ` Mauricio Tavares
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Eaton @ 2020-01-17 22:08 UTC (permalink / raw)
  To: Jared Walton; +Cc: Andrey Kuzmin, Mauricio Tavares, fio

On Thu, Jan 16, 2020 at 11:04 AM Jared Walton <jawalking@gmail.com> wrote:
>
> Correct, I pre-condition for IOPS testing by utilizing the the last if
> block, only using randwrite. which will run random writes for about
> 45min, until a steady state is achieved.
>
> On Thu, Jan 16, 2020 at 11:40 AM Andrey Kuzmin
> <andrey.v.kuzmin@gmail.com> wrote:
> >
> > On Thu, Jan 16, 2020 at 9:31 PM Jared Walton <jawalking@gmail.com> wrote:
> > >
> > > Not sure if this will help, but I use the following to prep multiple
> > > 4TB drives at the same time in a little over an hour.
> >
> > You seem to be preconditioning with sequential writes only, and
> > further doing so
> > with essentially single write frontier.
> >
> > That doesn't stress FTL maps enough and doesn't trigger any substantial garbage
> > collection since SSD is intelligent enough to spot sequential write
> > workload with
> > 128K sequential (re)writes.
> >
> > So what you're doing is only good for bandwidth measurements, and if
> > this steady
> > state is applied to random IOPS profiling, you'd be getting highly
> > inflated results.
> >
> > Regards,
> > Andrey
> >
> > > Is it inelegant, yes, but it works for me.
> > >
> > > globalFIOParameters="--offset=0 --ioengine=libaio --invalidate=1
> > > --group_reporting --direct=1 --thread --refill_buffers --norandommap
> > > --randrepeat=0 --allow_mounted_write=1 --output-format=json,normal"
> > >
> > > # Drives should be FOB or LLF'd (if it's good to do that)
> > > # LLF logic
> > >
> > > # 128k Pre-Condition
> > > # Write to entire disk
> > > for i in `ls -1 /dev/nvme*n1`
> > > do
> > >     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> > > print $1 }')
> > > ./fio --name=PreconditionPass1of3 --filename=${i} --iodepth=$iodepth
> > > --bs=128k --rw=write --size=${size} --fill_device=1
> > > $globalFIOParameters &
> > > done
> > > wait
> > >
> > > # Read entire disk
> > > for i in `ls -1 /dev/nvme*n1`
> > > do
> > >     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> > > print $1 }')
> > > ./fio --name=PreconditionPass2of3 --filename=${i} --iodepth=$iodepth
> > > --bs=128k --rw=read --size=${size} --fill_device=1
> > > $globalFIOParameters &
> > > done
> > > wait
> > >
> > > # Write to entire disk one last time
> > > for i in `ls -1 /dev/nvme*n1`
> > > do
> > >     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> > > print $1 }')
> > > ./fio --name=PreconditionPass3of3 --filename=${i} --iodepth=$iodepth
> > > --bs=128k --rw=write --size=${size} --fill_device=1
> > > $globalFIOParameters &
> > > done
> > > wait
> > >
> > >
> > > # Check 128k steady-state
> > > for i in `ls -1 /dev/nvme*n1`
> > > do
> > > ./fio --name=SteadyState --filename=${i} --iodepth=16 --numjobs=16
> > > --bs=4k --rw=read --ss_dur=1800 --ss=iops_slope:0.3% --runtime=24h
> > > $globalFIOParameters &
> > > done
> > > wait
> > >
> > > On Thu, Jan 16, 2020 at 9:13 AM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > > >
> > > > On Thu, Jan 16, 2020 at 2:00 AM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jan 15, 2020 at 11:36 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Jan 15, 2020 at 2:00 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jan 15, 2020 at 9:29 PM Mauricio Tavares <raubvogel@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
> > > > > > > > > <joseph.r.gruher@intel.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf Of
> > > > > > > > > > > Mauricio Tavares
> > > > > > > > > > > Sent: Wednesday, January 15, 2020 7:51 AM
> > > > > > > > > > > To: fio@vger.kernel.org
> > > > > > > > > > > Subject: CPUs, threads, and speed
> > > > > > > > > > >
> > > > > > > > > > > Let's say I have a config file to preload drive that looks like this (stolen from
> > > > > > > > > > > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > > > > > > > > > > _4KRandom_NVMe.ini)
> > > > > > > > > > >
> > > > > > > > > > > [global]
> > > > > > > > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > > > > > > > filename=/dev/nvme0n1
> > > > > > > > > > > ioengine=libaio
> > > > > > > > > > > direct=1
> > > > > > > > > > > bs=4k
> > > > > > > > > > > rw=randwrite
> > > > > > > > > > > iodepth=4
> > > > > > > > > > > numjobs=32
> > > > > > > > > > > buffered=0
> > > > > > > > > > > size=100%
> > > > > > > > > > > loops=2
> > > > > > > > > > > randrepeat=0
> > > > > > > > > > > norandommap
> > > > > > > > > > > refill_buffers
> > > > > > > > > > >
> > > > > > > > > > > [job1]
> > > > > > > > > > >
> > > > > > > > > > > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > > > > > > > > > > up?
> > > > > > > > > >
> > > > > > > > > > When you say preload, do you just want to write in the full capacity of the drive?
> > > > > > > > >
> > > > > > > > > I believe that preload here means what in SSD world is called drive
> > > > > > > > > preconditioning. It means bringing a fresh drive into steady mode
> > > > > > > > > where it gives you the true performance in production over months of
> > > > > > > > > use rather than the unrealistic fresh drive random write IOPS.
> > > > > > > > >
> > > > > > > > > > A sequential workload with larger blocks will be faster,
> > > > > > > > >
> > > > > > > > > No, you cannot get the job done by sequential writes since it doesn't
> > > > > > > > > populate FTL translation tables like random writes do.
> > > > > > > > >
> > > > > > > > > As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
> > > > > > > > > worth of random writes. At today speeds, that should take just a
> > > > > > > > > couple of hours.
> > > > > > > > >
> > > > > > > >       When you say 2xcapacity worth of random writes, do you mean just
> > > > > > > > setting size=200%?
> > > > > > >
> > > > > > > Right.
> > > > > > >
> > > > > >       Then I wonder what I am doing wrong now. I changed the config file to
> > > > > >
> > > > > > [root@testbox tests]# cat preload.conf
> > > > > > [global]
> > > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > > ioengine=libaio
> > > > > > direct=1
> > > > > > bs=4k
> > > > > > rw=randwrite
> > > > > > iodepth=4
> > > > > > numjobs=32
> > > > > > buffered=0
> > > > > > size=200%
> > > > > > loops=2
> > > > > > random_generator=tausworthe64
> > > > > > thread=1
> > > > > >
> > > > > > [job1]
> > > > > > filename=/dev/nvme0n1
> > > > > > [root@testbox tests]#
> > > > > >
> > > > > > but when I run it, now it spits out much larger eta times:
> > > > > >
> > > > > > Jobs: 32 (f=32): [w(32)][0.0%][w=382MiB/s][w=97.7k IOPS][eta
> > > > > > 16580099d:14h:55m:27s]]
> > > > >
> > > > >  Size is set on per thread basis, so you're doing 32x200%x2 loops=128
> > > > > drive capacities here.
> > > > >
> > > > > Also, using 32 threads doesn't improve anything. 2 (and even one)
> > > > > threads with qd=128 will push the drive
> > > > > to its limits.
> > > > >
> > > >      Update: so I redid the config file a bit to pass some of the
> > > > arguments from command line, and cut down number of jobs and loops.
> > > > And I ran it again, this time sequential write to the drive I have not
> > > > touched to see how fast it was going to go. My eta is still
> > > > astronomical:
> > > >
> > > > [root@testbox tests]# cat preload_fio.conf
> > > > [global]
> > > > name=4k random
> > > > ioengine=${ioengine}
> > > > direct=1
> > > > bs=${bs_size}
> > > > rw=${iotype}
> > > > iodepth=4
> > > > numjobs=1
> > > > buffered=0
> > > > size=200%
> > > > loops=1
> > > >
> > > > [job1]
> > > > filename=${devicename}
> > > > [root@testbox tests]# devicename=/dev/nvme1n1 ioengine=libaio
> > > > iotype=write bs_size=128k ~/dev/fio/fio ./preload_fio.conf
> > > > job1: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T)
> > > > 128KiB-128KiB, ioengine=libaio, iodepth=4
> > > > fio-3.17-68-g3f1e
> > > > Starting 1 process
> > > > Jobs: 1 (f=1): [W(1)][0.0%][w=1906MiB/s][w=15.2k IOPS][eta 108616d:00h:00m:24s]
> > > >
> > > > > Regards,
> > > > > Andrey
> > > > > >
> > > > > > Compare with what I was getting with size=100%
> > > > > >
> > > > > >  Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]
> > > > > >
> > > > > > > Regards,
> > > > > > > Andrey
> > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Andrey
> > > > > > > > >
> > > > > > > > > > like:
> > > > > > > > > >
> > > > > > > > > > [global]
> > > > > > > > > > ioengine=libaio
> > > > > > > > > > thread=1
> > > > > > > > > > direct=1
> > > > > > > > > > bs=128k
> > > > > > > > > > rw=write
> > > > > > > > > > numjobs=1
> > > > > > > > > > iodepth=128
> > > > > > > > > > size=100%
> > > > > > > > > > loops=2
> > > > > > > > > > [job00]
> > > > > > > > > > filename=/dev/nvme0n1
> > > > > > > > > >
> > > > > > > > > > Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
> > > > > > > > > >
> > > > > > > > > > -Joe

I have pretty much standardized on two sequential drive writes and
four random drive writes to get to steady state. It may be overkill
but has worked well for me since we test a variety of SSDs and some
reach steady state faster than others.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CPUs, threads, and speed
  2020-01-17 22:08                     ` Matthew Eaton
@ 2020-01-24 20:39                       ` Mauricio Tavares
  0 siblings, 0 replies; 18+ messages in thread
From: Mauricio Tavares @ 2020-01-24 20:39 UTC (permalink / raw)
  To: fio

On Fri, Jan 17, 2020 at 5:08 PM Matthew Eaton <m.eaton82@gmail.com> wrote:
>
> On Thu, Jan 16, 2020 at 11:04 AM Jared Walton <jawalking@gmail.com> wrote:
> >
> > Correct, I pre-condition for IOPS testing by utilizing the the last if
> > block, only using randwrite. which will run random writes for about
> > 45min, until a steady state is achieved.
> >
[...]
>
> I have pretty much standardized on two sequential drive writes and
> four random drive writes to get to steady state. It may be overkill
> but has worked well for me since we test a variety of SSDs and some
> reach steady state faster than others.

Thank you for all the replies and to Jared for the script, which leads
me to one related question: even though it used steady state as
criteria to consider drive preconditioned, am I correct to assume that
steady state does not necessarily have any relationship with the
steady state where I want to take the measurements when actually
running the tests?


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-01-24 20:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 15:50 CPUs, threads, and speed Mauricio Tavares
2020-01-15 17:28 ` Gruher, Joseph R
2020-01-15 18:04   ` Andrey Kuzmin
2020-01-15 18:29     ` Mauricio Tavares
2020-01-15 19:00       ` Andrey Kuzmin
2020-01-15 20:36         ` Mauricio Tavares
2020-01-16  6:59           ` Andrey Kuzmin
2020-01-16 16:12             ` Mauricio Tavares
2020-01-16 17:03               ` Andrey Kuzmin
2020-01-16 17:25               ` Jared Walton
2020-01-16 18:39                 ` Andrey Kuzmin
2020-01-16 19:03                   ` Jared Walton
2020-01-17 22:08                     ` Matthew Eaton
2020-01-24 20:39                       ` Mauricio Tavares
2020-01-15 18:33   ` Kudryavtsev, Andrey O
2020-01-15 21:33 ` Elliott, Robert (Servers)
2020-01-15 22:39   ` Mauricio Tavares
2020-01-16  0:49     ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.