Re: rand-read: increasing iodepth makes throughput drop

From: Federico Motta <federico@willer.it>
To: "Elliott, Robert (Persistent Memory)" <elliott@hpe.com>,
	"fio@vger.kernel.org" <fio@vger.kernel.org>
Cc: Paolo Valente <paolo.valente@linaro.org>,
	Mark Brown <mark.brown@linaro.org>,
	Linus Walleij <linus.walleij@linaro.org>,
	Ulf Hansson <ulf.hansson@linaro.org>
Subject: Re: rand-read: increasing iodepth makes throughput drop
Date: Wed, 31 Oct 2018 15:19:34 +0100	[thread overview]
Message-ID: <a26c6cc6-3efe-cbf8-37c7-f52672179b79@willer.it> (raw)
In-Reply-To: <AT5PR8401MB116955F32C7A59FB0494F505ABF70@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM>

[-- Attachment #1.1.1: Type: text/plain, Size: 2956 bytes --]

Hi,
I am back again, with new doubts, hoping someone can clarify them.

Although the previous issue was solved with iodirect=1; we are still
investigating possible reasons that in some scenarios make bfq
throughput about the 5-10% lower than mq-deadline one.

Is it possible to ask fio a measure of "how much device internal queues
are filled"?
If yes, with which option(s)?
If not, do you know some other tools or procedures which can tell us that?

On 10/25/18 7:55 AM, Elliott, Robert (Persistent Memory) wrote:
>> Paolo Valente and I found an unexpected surprise while investigating
>> iodepth impact on throughput. Both on my device (ssd samsung 960 pro
>> nvme m.2 512gb) and his one (ssd plextor px-256m5s), throughput
>> decreases as the iodepth increases.
> ...
>> What we observed is that with "iodepth=1 and ioengine=sync"
>> throughput
>> is greater than with "iodepth=8 and ioengine=libaio".  For example:
>> - iodepth=1,  sync: 170.942 ± 11.122 [mb/s]
>> - iodepth=8, async: 161.360 ± 10.165 [mb/s]
>>
>>
>> So, I would like to ask:
>> 1) Is it normal that throughput drop? (shouldn't device internal
>> queues
>> be more filled with a greater iodepth and thus lead to a greater
>> throughput?)
> 
> The drive specs are:
> * 3500 MB/s sequential read
> * 57.344 MB/s random read, QD1 (14000 IOPS doing random 4 KiB reads)
> * 1351.68 MB/s random read, QD4 (330000 IOPS doing random 4 KiB reads)
> 
> so the bigger issue is why it's underachieving by 10x.
> 
> direct=0 means you're involving the page cache, which distorts results,
> especially with libaio:
> 
>     iodepth=int
>     Number of I/O units to keep in flight against the file. Note that
>     increasing iodepth beyond 1 will not affect synchronous ioengines
>     (except for small degress when verify_async is in use). Even async
>     engines may impose OS restrictions causing the desired depth not to
>     be achieved. This may happen on Linux when using libaio and not
>     setting direct=1, since buffered IO is not async on that OS. Keep
>     an eye on the IO depth distribution in the fio output to verify
>     that the achieved depth is as expected. Default: 1. 
About that "IO depth distribution" I have another question; with the
jobfile attached the iodepth distribution of the 4 jobs is something like:

IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=133.4%, 16=0.0%, 32=0.0%, >=64=0.0%
IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=133.3%, 16=0.0%, 32=0.0%, >=64=0.0%
IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=133.2%, 16=0.0%, 32=0.0%, >=64=0.0%
IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=133.2%, 16=0.0%, 32=0.0%, >=64=0.0%

Is it normal that the sum of these percentages is not 100 but it is
around 133? (for completeness in attachment you can also found the whole
fio output)

Thank you for your patience,
regards,
Federico

> 
> If you're trying to test storage device performance, stick with direct=1.
> 
> 

[-- Attachment #1.1.2: attachment.zip --]
[-- Type: application/zip, Size: 1812 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]