From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f65.google.com ([209.85.208.65]:40639 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726587AbgAQWIk (ORCPT ); Fri, 17 Jan 2020 17:08:40 -0500 Received: by mail-ed1-f65.google.com with SMTP id b8so23696753edx.7 for ; Fri, 17 Jan 2020 14:08:38 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Matthew Eaton Date: Fri, 17 Jan 2020 14:08:26 -0800 Message-ID: Subject: Re: CPUs, threads, and speed Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: fio-owner@vger.kernel.org List-Id: fio@vger.kernel.org To: Jared Walton Cc: Andrey Kuzmin , Mauricio Tavares , fio On Thu, Jan 16, 2020 at 11:04 AM Jared Walton wrote: > > Correct, I pre-condition for IOPS testing by utilizing the the last if > block, only using randwrite. which will run random writes for about > 45min, until a steady state is achieved. > > On Thu, Jan 16, 2020 at 11:40 AM Andrey Kuzmin > wrote: > > > > On Thu, Jan 16, 2020 at 9:31 PM Jared Walton wrot= e: > > > > > > Not sure if this will help, but I use the following to prep multiple > > > 4TB drives at the same time in a little over an hour. > > > > You seem to be preconditioning with sequential writes only, and > > further doing so > > with essentially single write frontier. > > > > That doesn't stress FTL maps enough and doesn't trigger any substantial= garbage > > collection since SSD is intelligent enough to spot sequential write > > workload with > > 128K sequential (re)writes. > > > > So what you're doing is only good for bandwidth measurements, and if > > this steady > > state is applied to random IOPS profiling, you'd be getting highly > > inflated results. > > > > Regards, > > Andrey > > > > > Is it inelegant, yes, but it works for me. > > > > > > globalFIOParameters=3D"--offset=3D0 --ioengine=3Dlibaio --invalidate= =3D1 > > > --group_reporting --direct=3D1 --thread --refill_buffers --norandomma= p > > > --randrepeat=3D0 --allow_mounted_write=3D1 --output-format=3Djson,nor= mal" > > > > > > # Drives should be FOB or LLF'd (if it's good to do that) > > > # LLF logic > > > > > > # 128k Pre-Condition > > > # Write to entire disk > > > for i in `ls -1 /dev/nvme*n1` > > > do > > > size=3D$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '= { > > > print $1 }') > > > ./fio --name=3DPreconditionPass1of3 --filename=3D${i} --iodepth=3D$io= depth > > > --bs=3D128k --rw=3Dwrite --size=3D${size} --fill_device=3D1 > > > $globalFIOParameters & > > > done > > > wait > > > > > > # Read entire disk > > > for i in `ls -1 /dev/nvme*n1` > > > do > > > size=3D$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '= { > > > print $1 }') > > > ./fio --name=3DPreconditionPass2of3 --filename=3D${i} --iodepth=3D$io= depth > > > --bs=3D128k --rw=3Dread --size=3D${size} --fill_device=3D1 > > > $globalFIOParameters & > > > done > > > wait > > > > > > # Write to entire disk one last time > > > for i in `ls -1 /dev/nvme*n1` > > > do > > > size=3D$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '= { > > > print $1 }') > > > ./fio --name=3DPreconditionPass3of3 --filename=3D${i} --iodepth=3D$io= depth > > > --bs=3D128k --rw=3Dwrite --size=3D${size} --fill_device=3D1 > > > $globalFIOParameters & > > > done > > > wait > > > > > > > > > # Check 128k steady-state > > > for i in `ls -1 /dev/nvme*n1` > > > do > > > ./fio --name=3DSteadyState --filename=3D${i} --iodepth=3D16 --numjobs= =3D16 > > > --bs=3D4k --rw=3Dread --ss_dur=3D1800 --ss=3Diops_slope:0.3% --runtim= e=3D24h > > > $globalFIOParameters & > > > done > > > wait > > > > > > On Thu, Jan 16, 2020 at 9:13 AM Mauricio Tavares wrote: > > > > > > > > On Thu, Jan 16, 2020 at 2:00 AM Andrey Kuzmin wrote: > > > > > > > > > > On Wed, Jan 15, 2020 at 11:36 PM Mauricio Tavares wrote: > > > > > > > > > > > > On Wed, Jan 15, 2020 at 2:00 PM Andrey Kuzmin wrote: > > > > > > > > > > > > > > On Wed, Jan 15, 2020 at 9:29 PM Mauricio Tavares wrote: > > > > > > > > > > > > > > > > On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin wrote: > > > > > > > > > > > > > > > > > > On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > > From: fio-owner@vger.kernel.org On Behalf Of > > > > > > > > > > > Mauricio Tavares > > > > > > > > > > > Sent: Wednesday, January 15, 2020 7:51 AM > > > > > > > > > > > To: fio@vger.kernel.org > > > > > > > > > > > Subject: CPUs, threads, and speed > > > > > > > > > > > > > > > > > > > > > > Let's say I have a config file to preload drive that = looks like this (stolen from > > > > > > > > > > > https://github.com/intel/fiovisualizer/blob/master/Wo= rkloads/Precondition/fill > > > > > > > > > > > _4KRandom_NVMe.ini) > > > > > > > > > > > > > > > > > > > > > > [global] > > > > > > > > > > > name=3D4k random write 4 ios in the queue in 32 queue= s > > > > > > > > > > > filename=3D/dev/nvme0n1 > > > > > > > > > > > ioengine=3Dlibaio > > > > > > > > > > > direct=3D1 > > > > > > > > > > > bs=3D4k > > > > > > > > > > > rw=3Drandwrite > > > > > > > > > > > iodepth=3D4 > > > > > > > > > > > numjobs=3D32 > > > > > > > > > > > buffered=3D0 > > > > > > > > > > > size=3D100% > > > > > > > > > > > loops=3D2 > > > > > > > > > > > randrepeat=3D0 > > > > > > > > > > > norandommap > > > > > > > > > > > refill_buffers > > > > > > > > > > > > > > > > > > > > > > [job1] > > > > > > > > > > > > > > > > > > > > > > That is taking a ton of time, like days to go. Is the= re anything I can do to speed it > > > > > > > > > > > up? > > > > > > > > > > > > > > > > > > > > When you say preload, do you just want to write in the = full capacity of the drive? > > > > > > > > > > > > > > > > > > I believe that preload here means what in SSD world is ca= lled drive > > > > > > > > > preconditioning. It means bringing a fresh drive into ste= ady mode > > > > > > > > > where it gives you the true performance in production ove= r months of > > > > > > > > > use rather than the unrealistic fresh drive random write = IOPS. > > > > > > > > > > > > > > > > > > > A sequential workload with larger blocks will be faster= , > > > > > > > > > > > > > > > > > > No, you cannot get the job done by sequential writes sinc= e it doesn't > > > > > > > > > populate FTL translation tables like random writes do. > > > > > > > > > > > > > > > > > > As to taking a ton, the rule of thumb is to give the SSD = 2xcapacity > > > > > > > > > worth of random writes. At today speeds, that should take= just a > > > > > > > > > couple of hours. > > > > > > > > > > > > > > > > > When you say 2xcapacity worth of random writes, do yo= u mean just > > > > > > > > setting size=3D200%? > > > > > > > > > > > > > > Right. > > > > > > > > > > > > > Then I wonder what I am doing wrong now. I changed the co= nfig file to > > > > > > > > > > > > [root@testbox tests]# cat preload.conf > > > > > > [global] > > > > > > name=3D4k random write 4 ios in the queue in 32 queues > > > > > > ioengine=3Dlibaio > > > > > > direct=3D1 > > > > > > bs=3D4k > > > > > > rw=3Drandwrite > > > > > > iodepth=3D4 > > > > > > numjobs=3D32 > > > > > > buffered=3D0 > > > > > > size=3D200% > > > > > > loops=3D2 > > > > > > random_generator=3Dtausworthe64 > > > > > > thread=3D1 > > > > > > > > > > > > [job1] > > > > > > filename=3D/dev/nvme0n1 > > > > > > [root@testbox tests]# > > > > > > > > > > > > but when I run it, now it spits out much larger eta times: > > > > > > > > > > > > Jobs: 32 (f=3D32): [w(32)][0.0%][w=3D382MiB/s][w=3D97.7k IOPS][= eta > > > > > > 16580099d:14h:55m:27s]] > > > > > > > > > > Size is set on per thread basis, so you're doing 32x200%x2 loops= =3D128 > > > > > drive capacities here. > > > > > > > > > > Also, using 32 threads doesn't improve anything. 2 (and even one) > > > > > threads with qd=3D128 will push the drive > > > > > to its limits. > > > > > > > > > Update: so I redid the config file a bit to pass some of the > > > > arguments from command line, and cut down number of jobs and loops. > > > > And I ran it again, this time sequential write to the drive I have = not > > > > touched to see how fast it was going to go. My eta is still > > > > astronomical: > > > > > > > > [root@testbox tests]# cat preload_fio.conf > > > > [global] > > > > name=3D4k random > > > > ioengine=3D${ioengine} > > > > direct=3D1 > > > > bs=3D${bs_size} > > > > rw=3D${iotype} > > > > iodepth=3D4 > > > > numjobs=3D1 > > > > buffered=3D0 > > > > size=3D200% > > > > loops=3D1 > > > > > > > > [job1] > > > > filename=3D${devicename} > > > > [root@testbox tests]# devicename=3D/dev/nvme1n1 ioengine=3Dlibaio > > > > iotype=3Dwrite bs_size=3D128k ~/dev/fio/fio ./preload_fio.conf > > > > job1: (g=3D0): rw=3Dwrite, bs=3D(R) 128KiB-128KiB, (W) 128KiB-128Ki= B, (T) > > > > 128KiB-128KiB, ioengine=3Dlibaio, iodepth=3D4 > > > > fio-3.17-68-g3f1e > > > > Starting 1 process > > > > Jobs: 1 (f=3D1): [W(1)][0.0%][w=3D1906MiB/s][w=3D15.2k IOPS][eta 10= 8616d:00h:00m:24s] > > > > > > > > > Regards, > > > > > Andrey > > > > > > > > > > > > Compare with what I was getting with size=3D100% > > > > > > > > > > > > Jobs: 32 (f=3D32): [w(32)][10.8%][w=3D301MiB/s][w=3D77.0k IOPS= ][eta 06d:13h:56m:51s]] > > > > > > > > > > > > > Regards, > > > > > > > Andrey > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Andrey > > > > > > > > > > > > > > > > > > > like: > > > > > > > > > > > > > > > > > > > > [global] > > > > > > > > > > ioengine=3Dlibaio > > > > > > > > > > thread=3D1 > > > > > > > > > > direct=3D1 > > > > > > > > > > bs=3D128k > > > > > > > > > > rw=3Dwrite > > > > > > > > > > numjobs=3D1 > > > > > > > > > > iodepth=3D128 > > > > > > > > > > size=3D100% > > > > > > > > > > loops=3D2 > > > > > > > > > > [job00] > > > > > > > > > > filename=3D/dev/nvme0n1 > > > > > > > > > > > > > > > > > > > > Or if you have a use case where you specifically want t= o write it in with 4K blocks, you could probably increase your queue depth = way beyond 4 and see improvement in performance, and you probably don't wan= t to specify norandommap if you're trying to hit every block on the device. > > > > > > > > > > > > > > > > > > > > -Joe I have pretty much standardized on two sequential drive writes and four random drive writes to get to steady state. It may be overkill but has worked well for me since we test a variety of SSDs and some reach steady state faster than others.