* rand-read: increasing iodepth makes throughput drop @ 2018-10-24 18:02 Federico Motta 2018-10-25 5:55 ` Elliott, Robert (Persistent Memory) 0 siblings, 1 reply; 10+ messages in thread From: Federico Motta @ 2018-10-24 18:02 UTC (permalink / raw) To: fio; +Cc: Paolo Valente, Mark Brown, Linus Walleij, Ulf Hansson [-- Attachment #1.1.1: Type: text/plain, Size: 1825 bytes --] Hi, Paolo Valente and I found an unexpected surprise while investigating iodepth impact on throughput. Both on my device (ssd samsung 960 pro nvme m.2 512gb) and his one (ssd plextor px-256m5s), throughput decreases as the iodepth increases. What follows is a summary of the testbed details: - mq-deadline scheduler - 4 fio doing rand-read I/O (4 is the number of physical cores of our cpus) - throughput was measured with "iostat -tmd /dev/$disk 3" - from the above command the first "mb_read/s" occurrence was ignored because of ramp time and only on the following ones average and stdev were computed - duration of 15 sec More details can be found in the jobfiles in attachment, please note that: - global sections differ only for the first two lines (iodepth and ioengine) - job sections differ only for N (0 to 3) What we observed is that with "iodepth=1 and ioengine=sync" throughput is greater than with "iodepth=8 and ioengine=libaio". For example: - iodepth=1, sync: 170.942 ± 11.122 [mb/s] - iodepth=8, async: 161.360 ± 10.165 [mb/s] So, I would like to ask: 1) Is it normal that throughput drop? (shouldn't device internal queues be more filled with a greater iodepth and thus lead to a greater throughput?) 2) If (1) is normal, for which reasons? 3) If (1) is not normal, we thought to investigate about the average number of requests in software and hardware queues (especially the latter ones). In fact we expect a correlation between that number and throughput. 4) If (3) is correct, how can we measure the average number of requests in the device internal queues? Is it possible to do that directly with fio? 5) If (3) is not correct, what should we look for? and (if possible) with which tools? Under request I can give more details. Thank you, Federico [-- Attachment #1.1.2: fio_jobfiles.zip --] [-- Type: application/zip, Size: 2866 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: rand-read: increasing iodepth makes throughput drop 2018-10-24 18:02 rand-read: increasing iodepth makes throughput drop Federico Motta @ 2018-10-25 5:55 ` Elliott, Robert (Persistent Memory) 2018-10-29 9:07 ` Federico Motta 2018-10-31 14:19 ` Federico Motta 0 siblings, 2 replies; 10+ messages in thread From: Elliott, Robert (Persistent Memory) @ 2018-10-25 5:55 UTC (permalink / raw) To: Federico Motta, fio; +Cc: Paolo Valente, Mark Brown, Linus Walleij, Ulf Hansson > Paolo Valente and I found an unexpected surprise while investigating > iodepth impact on throughput. Both on my device (ssd samsung 960 pro > nvme m.2 512gb) and his one (ssd plextor px-256m5s), throughput > decreases as the iodepth increases. ... > What we observed is that with "iodepth=1 and ioengine=sync" > throughput > is greater than with "iodepth=8 and ioengine=libaio". For example: > - iodepth=1, sync: 170.942 ± 11.122 [mb/s] > - iodepth=8, async: 161.360 ± 10.165 [mb/s] > > > So, I would like to ask: > 1) Is it normal that throughput drop? (shouldn't device internal > queues > be more filled with a greater iodepth and thus lead to a greater > throughput?) The drive specs are: * 3500 MB/s sequential read * 57.344 MB/s random read, QD1 (14000 IOPS doing random 4 KiB reads) * 1351.68 MB/s random read, QD4 (330000 IOPS doing random 4 KiB reads) so the bigger issue is why it's underachieving by 10x. direct=0 means you're involving the page cache, which distorts results, especially with libaio: iodepth=int Number of I/O units to keep in flight against the file. Note that increasing iodepth beyond 1 will not affect synchronous ioengines (except for small degress when verify_async is in use). Even async engines may impose OS restrictions causing the desired depth not to be achieved. This may happen on Linux when using libaio and not setting direct=1, since buffered IO is not async on that OS. Keep an eye on the IO depth distribution in the fio output to verify that the achieved depth is as expected. Default: 1. If you're trying to test storage device performance, stick with direct=1. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rand-read: increasing iodepth makes throughput drop 2018-10-25 5:55 ` Elliott, Robert (Persistent Memory) @ 2018-10-29 9:07 ` Federico Motta 2018-10-31 14:19 ` Federico Motta 1 sibling, 0 replies; 10+ messages in thread From: Federico Motta @ 2018-10-29 9:07 UTC (permalink / raw) To: Elliott, Robert (Persistent Memory), fio Cc: Paolo Valente, Mark Brown, Linus Walleij, Ulf Hansson [-- Attachment #1.1: Type: text/plain, Size: 2099 bytes --] On 10/25/18 7:55 AM, Elliott, Robert (Persistent Memory) wrote: >> Paolo Valente and I found an unexpected surprise while investigating >> iodepth impact on throughput. Both on my device (ssd samsung 960 pro >> nvme m.2 512gb) and his one (ssd plextor px-256m5s), throughput >> decreases as the iodepth increases. > ... >> What we observed is that with "iodepth=1 and ioengine=sync" >> throughput >> is greater than with "iodepth=8 and ioengine=libaio". For example: >> - iodepth=1, sync: 170.942 ± 11.122 [mb/s] >> - iodepth=8, async: 161.360 ± 10.165 [mb/s] >> >> >> So, I would like to ask: >> 1) Is it normal that throughput drop? (shouldn't device internal >> queues >> be more filled with a greater iodepth and thus lead to a greater >> throughput?) Hi, Thank you for your reply > > The drive specs are: > * 3500 MB/s sequential read > * 57.344 MB/s random read, QD1 (14000 IOPS doing random 4 KiB reads) > * 1351.68 MB/s random read, QD4 (330000 IOPS doing random 4 KiB reads) > > so the bigger issue is why it's underachieving by 10x. > > direct=0 means you're involving the page cache, which distorts results, > especially with libaio: > > iodepth=int > Number of I/O units to keep in flight against the file. Note that > increasing iodepth beyond 1 will not affect synchronous ioengines > (except for small degress when verify_async is in use). Even async > engines may impose OS restrictions causing the desired depth not to > be achieved. This may happen on Linux when using libaio and not > setting direct=1, since buffered IO is not async on that OS. Keep > an eye on the IO depth distribution in the fio output to verify > that the achieved depth is as expected. Default: 1. You are right, I'm sorry that I didn't read carefully enough the docs. > > If you're trying to test storage device performance, stick with direct=1. In fact with direct=1, the "iodepth=8 and ioengine=libaio" now performs much better than "iodepth=1 and ioengine=sync" Thank you again for your help, Federico [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rand-read: increasing iodepth makes throughput drop 2018-10-25 5:55 ` Elliott, Robert (Persistent Memory) 2018-10-29 9:07 ` Federico Motta @ 2018-10-31 14:19 ` Federico Motta 2018-10-31 17:31 ` Jeff Furlong 1 sibling, 1 reply; 10+ messages in thread From: Federico Motta @ 2018-10-31 14:19 UTC (permalink / raw) To: Elliott, Robert (Persistent Memory), fio Cc: Paolo Valente, Mark Brown, Linus Walleij, Ulf Hansson [-- Attachment #1.1.1: Type: text/plain, Size: 2956 bytes --] Hi, I am back again, with new doubts, hoping someone can clarify them. Although the previous issue was solved with iodirect=1; we are still investigating possible reasons that in some scenarios make bfq throughput about the 5-10% lower than mq-deadline one. Is it possible to ask fio a measure of "how much device internal queues are filled"? If yes, with which option(s)? If not, do you know some other tools or procedures which can tell us that? On 10/25/18 7:55 AM, Elliott, Robert (Persistent Memory) wrote: >> Paolo Valente and I found an unexpected surprise while investigating >> iodepth impact on throughput. Both on my device (ssd samsung 960 pro >> nvme m.2 512gb) and his one (ssd plextor px-256m5s), throughput >> decreases as the iodepth increases. > ... >> What we observed is that with "iodepth=1 and ioengine=sync" >> throughput >> is greater than with "iodepth=8 and ioengine=libaio". For example: >> - iodepth=1, sync: 170.942 ± 11.122 [mb/s] >> - iodepth=8, async: 161.360 ± 10.165 [mb/s] >> >> >> So, I would like to ask: >> 1) Is it normal that throughput drop? (shouldn't device internal >> queues >> be more filled with a greater iodepth and thus lead to a greater >> throughput?) > > The drive specs are: > * 3500 MB/s sequential read > * 57.344 MB/s random read, QD1 (14000 IOPS doing random 4 KiB reads) > * 1351.68 MB/s random read, QD4 (330000 IOPS doing random 4 KiB reads) > > so the bigger issue is why it's underachieving by 10x. > > direct=0 means you're involving the page cache, which distorts results, > especially with libaio: > > iodepth=int > Number of I/O units to keep in flight against the file. Note that > increasing iodepth beyond 1 will not affect synchronous ioengines > (except for small degress when verify_async is in use). Even async > engines may impose OS restrictions causing the desired depth not to > be achieved. This may happen on Linux when using libaio and not > setting direct=1, since buffered IO is not async on that OS. Keep > an eye on the IO depth distribution in the fio output to verify > that the achieved depth is as expected. Default: 1. About that "IO depth distribution" I have another question; with the jobfile attached the iodepth distribution of the 4 jobs is something like: IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=133.4%, 16=0.0%, 32=0.0%, >=64=0.0% IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=133.3%, 16=0.0%, 32=0.0%, >=64=0.0% IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=133.2%, 16=0.0%, 32=0.0%, >=64=0.0% IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=133.2%, 16=0.0%, 32=0.0%, >=64=0.0% Is it normal that the sum of these percentages is not 100 but it is around 133? (for completeness in attachment you can also found the whole fio output) Thank you for your patience, regards, Federico > > If you're trying to test storage device performance, stick with direct=1. > > [-- Attachment #1.1.2: attachment.zip --] [-- Type: application/zip, Size: 1812 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: rand-read: increasing iodepth makes throughput drop 2018-10-31 14:19 ` Federico Motta @ 2018-10-31 17:31 ` Jeff Furlong 2018-11-07 11:08 ` Federico Motta 0 siblings, 1 reply; 10+ messages in thread From: Jeff Furlong @ 2018-10-31 17:31 UTC (permalink / raw) To: Federico Motta, Elliott, Robert (Persistent Memory), fio Cc: Paolo Valente, Mark Brown, Linus Walleij, Ulf Hansson > Is it normal that the sum of these percentages is not 100 but it is around 133? Looks like you're running fio 3.9. There was a recent patch, after the 3.11 release, to resolve. Try the latest from git. Regards, Jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rand-read: increasing iodepth makes throughput drop 2018-10-31 17:31 ` Jeff Furlong @ 2018-11-07 11:08 ` Federico Motta 2018-11-07 15:38 ` Sitsofe Wheeler 0 siblings, 1 reply; 10+ messages in thread From: Federico Motta @ 2018-11-07 11:08 UTC (permalink / raw) To: Jeff Furlong, Elliott, Robert (Persistent Memory), fio Cc: Paolo Valente, Mark Brown, Linus Walleij, Ulf Hansson [-- Attachment #1.1: Type: text/plain, Size: 772 bytes --] Hi, On 10/31/18 6:31 PM, Jeff Furlong wrote: >> Is it normal that the sum of these percentages is not 100 but it is around 133? > Looks like you're running fio 3.9. There was a recent patch, after the 3.11 release, to resolve. Try the latest from git. > Thank you, the script I wrote to keep my fio updated to the latest git-tag was missing a "--version-sort" :( With fio-3.12 the sum is 100 as expected. On 10/31/18 3:19 PM, Federico Motta wrote: ... > Is it possible to ask fio a measure of "how much device internal queues are filled"? > If yes, with which option(s)? > If not, do you know some other tools or procedures which can tell us that? Does somebody has ideas about that ^ ? > Regards, > Jeff > > Regards, Federico [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rand-read: increasing iodepth makes throughput drop 2018-11-07 11:08 ` Federico Motta @ 2018-11-07 15:38 ` Sitsofe Wheeler 2018-11-09 9:30 ` Federico Motta 0 siblings, 1 reply; 10+ messages in thread From: Sitsofe Wheeler @ 2018-11-07 15:38 UTC (permalink / raw) To: federico Cc: Jeff Furlong, Elliott, Robert (Persistent Memory), fio, Paolo Valente, mark.brown, linus.walleij, ulf.hansson On Wed, 7 Nov 2018 at 11:09, Federico Motta <federico@willer.it> wrote: > > Hi, > > On 10/31/18 6:31 PM, Jeff Furlong wrote: > >> Is it normal that the sum of these percentages is not 100 but it is around 133? > > Looks like you're running fio 3.9. There was a recent patch, after the 3.11 release, to resolve. Try the latest from git. > > > Thank you, the script I wrote to keep my fio updated to the latest > git-tag was missing a "--version-sort" :( > > With fio-3.12 the sum is 100 as expected. > > > On 10/31/18 3:19 PM, Federico Motta wrote: > ... > > Is it possible to ask fio a measure of "how much device internal > queues are filled"? > > If yes, with which option(s)? > > If not, do you know some other tools or procedures which can tell us that? > > Does somebody has ideas about that ^ ? I think fio will report something similar back on Linux when it is able to work it out - see the "And finally, the disk statistics are printed" section over in https://fio.readthedocs.io/en/latest/fio_doc.html#interpreting-the-output . The util percentage might be as good as it gets (i.e. a device that can take more I/Os will have a low percentage). I think iostat produces similar output to that too... -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rand-read: increasing iodepth makes throughput drop 2018-11-07 15:38 ` Sitsofe Wheeler @ 2018-11-09 9:30 ` Federico Motta 2018-11-09 20:21 ` Elliott, Robert (Persistent Memory) 0 siblings, 1 reply; 10+ messages in thread From: Federico Motta @ 2018-11-09 9:30 UTC (permalink / raw) To: Sitsofe Wheeler Cc: Jeff Furlong, Elliott, Robert (Persistent Memory), fio, Paolo Valente, mark.brown, linus.walleij, ulf.hansson [-- Attachment #1.1: Type: text/plain, Size: 1611 bytes --] On 11/7/18 4:38 PM, Sitsofe Wheeler wrote: > On Wed, 7 Nov 2018 at 11:09, Federico Motta <federico@willer.it> wrote: >> >> Hi, >> >> On 10/31/18 6:31 PM, Jeff Furlong wrote: >>>> Is it normal that the sum of these percentages is not 100 but it is around 133? >>> Looks like you're running fio 3.9. There was a recent patch, after the 3.11 release, to resolve. Try the latest from git. >>> >> Thank you, the script I wrote to keep my fio updated to the latest >> git-tag was missing a "--version-sort" :( >> >> With fio-3.12 the sum is 100 as expected. >> >> >> On 10/31/18 3:19 PM, Federico Motta wrote: >> ... >>> Is it possible to ask fio a measure of "how much device internal >> queues are filled"? >>> If yes, with which option(s)? >>> If not, do you know some other tools or procedures which can tell us that? >> >> Does somebody has ideas about that ^ ? > > I think fio will report something similar back on Linux when it is > able to work it out - see the "And finally, the disk statistics are > printed" section over in > https://fio.readthedocs.io/en/latest/fio_doc.html#interpreting-the-output > . The util percentage might be as good as it gets (i.e. a device that > can take more I/Os will have a low percentage). I think iostat > produces similar output to that too... > Actually I was wondering if a more direct or low level measure exists, but since I don't even know if that is possible I think the fio "util" percentage could be a good trade-off between "a low level measure" and "no measure available". Thank you :) Have a nice day, Federico [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: rand-read: increasing iodepth makes throughput drop 2018-11-09 9:30 ` Federico Motta @ 2018-11-09 20:21 ` Elliott, Robert (Persistent Memory) 2018-11-10 9:29 ` Paolo Valente 0 siblings, 1 reply; 10+ messages in thread From: Elliott, Robert (Persistent Memory) @ 2018-11-09 20:21 UTC (permalink / raw) To: 'Federico Motta', Sitsofe Wheeler Cc: Jeff Furlong, fio, Paolo Valente, mark.brown, linus.walleij, ulf.hansson > -----Original Message----- > From: Federico Motta [mailto:federico@willer.it] > Sent: Friday, November 9, 2018 3:31 AM > To: Sitsofe Wheeler <sitsofe@gmail.com> > Cc: Jeff Furlong <jeff.furlong@wdc.com>; Elliott, Robert (Persistent Memory) <elliott@hpe.com>; fio > <fio@vger.kernel.org>; Paolo Valente <paolo.valente@linaro.org>; mark.brown@linaro.org; > linus.walleij@linaro.org; ulf.hansson@linaro.org > Subject: Re: rand-read: increasing iodepth makes throughput drop > >> On 10/31/18 3:19 PM, Federico Motta wrote: > >> ... > >>> Is it possible to ask fio a measure of "how much device internal > >> queues are filled"? > >>> If yes, with which option(s)? > >>> If not, do you know some other tools or procedures which can tell us that? > >> > >> Does somebody has ideas about that ^ ? > > > > ...I think iostat produces similar output to that too... > > > > Actually I was wondering if a more direct or low level measure exists, > but since I don't even know if that is possible I think the fio "util" > percentage could be a good trade-off between "a low level measure" and > "no measure available". In linux, iostat provides some statistics, and there are lots of more detailed queuing statistics in sysfs. Some are provided by the block layer; some by the SCSI midlayer (for SCSI devices); some by the low-level driver (like hpsa). # nr_requests, the block layer request queue depth /sys/block/$device/queue/nr_requests # queue_depth, the SCSI midlayer queue depth /sys/block/$device/device/queue_depth # can_queue sets the initial request queue size /sys/class/scsi_host/$scsi_host/can_queue # cmd_per_lun sets the initial queue_depth /sys/class/scsi_host/$scsi_host/cmd_per_lun # host_busy indicates how many commands are queued by the SCSI layer /sys/class/scsi_host/$scsi_host/host_busy # host_blocked indicates the host is blocking all IOs /sys/class/scsi_host/$scsi_host/host_blocked # hpsa ioaccel commands outstanding /sys/block/$device/device/hpsa_ioaccel_cmds_out # hpsa physical drive queue depth /sys/block/$device/device/hpsa_queue_depth # hpsa commands_outstanding from submission to completion /sys/class/scsi_host/$scsi_host/commands_outstanding There are a bunch of blk-mq stats too, some only shown if debugfs is enabled. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rand-read: increasing iodepth makes throughput drop 2018-11-09 20:21 ` Elliott, Robert (Persistent Memory) @ 2018-11-10 9:29 ` Paolo Valente 0 siblings, 0 replies; 10+ messages in thread From: Paolo Valente @ 2018-11-10 9:29 UTC (permalink / raw) To: Elliott, Robert (Persistent Memory) Cc: Federico Motta, Sitsofe Wheeler, Jeff Furlong, fio, mark.brown, linus.walleij, ulf.hansson > Il giorno 9 nov 2018, alle ore 21:21, Elliott, Robert (Persistent Memory) <elliott@hpe.com> ha scritto: > > > >> -----Original Message----- >> From: Federico Motta [mailto:federico@willer.it] >> Sent: Friday, November 9, 2018 3:31 AM >> To: Sitsofe Wheeler <sitsofe@gmail.com> >> Cc: Jeff Furlong <jeff.furlong@wdc.com>; Elliott, Robert (Persistent Memory) <elliott@hpe.com>; fio >> <fio@vger.kernel.org>; Paolo Valente <paolo.valente@linaro.org>; mark.brown@linaro.org; >> linus.walleij@linaro.org; ulf.hansson@linaro.org >> Subject: Re: rand-read: increasing iodepth makes throughput drop >>>> On 10/31/18 3:19 PM, Federico Motta wrote: >>>> ... >>>>> Is it possible to ask fio a measure of "how much device internal >>>> queues are filled"? >>>>> If yes, with which option(s)? >>>>> If not, do you know some other tools or procedures which can tell us that? >>>> >>>> Does somebody has ideas about that ^ ? >>> >>> ...I think iostat produces similar output to that too... >>> >> >> Actually I was wondering if a more direct or low level measure exists, >> but since I don't even know if that is possible I think the fio "util" >> percentage could be a good trade-off between "a low level measure" and >> "no measure available". > > In linux, iostat provides some statistics, and there are lots of more detailed > queuing statistics in sysfs. Some are provided by the block layer; some by the > SCSI midlayer (for SCSI devices); some by the low-level driver (like hpsa). > > > # nr_requests, the block layer request queue depth > /sys/block/$device/queue/nr_requests > > # queue_depth, the SCSI midlayer queue depth > /sys/block/$device/device/queue_depth > > # can_queue sets the initial request queue size > /sys/class/scsi_host/$scsi_host/can_queue > > # cmd_per_lun sets the initial queue_depth > /sys/class/scsi_host/$scsi_host/cmd_per_lun > > # host_busy indicates how many commands are queued by the SCSI layer > /sys/class/scsi_host/$scsi_host/host_busy > Thank you very much, this seems exactly the kind of information we need. AFAYK, are there also per-queue info in case of NVMe devices? Thanks, Paolo > # host_blocked indicates the host is blocking all IOs > /sys/class/scsi_host/$scsi_host/host_blocked > > # hpsa ioaccel commands outstanding > /sys/block/$device/device/hpsa_ioaccel_cmds_out > > # hpsa physical drive queue depth > /sys/block/$device/device/hpsa_queue_depth > > # hpsa commands_outstanding from submission to completion > /sys/class/scsi_host/$scsi_host/commands_outstanding > > There are a bunch of blk-mq stats too, some only shown if debugfs is enabled. > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-11-10 19:13 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-24 18:02 rand-read: increasing iodepth makes throughput drop Federico Motta 2018-10-25 5:55 ` Elliott, Robert (Persistent Memory) 2018-10-29 9:07 ` Federico Motta 2018-10-31 14:19 ` Federico Motta 2018-10-31 17:31 ` Jeff Furlong 2018-11-07 11:08 ` Federico Motta 2018-11-07 15:38 ` Sitsofe Wheeler 2018-11-09 9:30 ` Federico Motta 2018-11-09 20:21 ` Elliott, Robert (Persistent Memory) 2018-11-10 9:29 ` Paolo Valente
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.