From: "Ober, Frank" <frank.ober@intel.com>
To: Keith Busch <kbusch@kernel.org>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"Rajendiran, Swetha" <swetha.rajendiran@intel.com>,
"Liang, Mark" <mark.liang@intel.com>,
"Derrick, Jonathan" <jonathan.derrick@intel.com>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: RE: Polled io for Linux kernel 5.x
Date: Tue, 31 Dec 2019 19:06:01 +0000 [thread overview]
Message-ID: <SN6PR11MB26691B36D7AEF22393CC04F38B260@SN6PR11MB2669.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20191220212049.GA5582@redsun51.ssa.fujisawa.hgst.com>
Hi Keith, so the performance results I see are very close between poll_queues and io_uring. I posted them below. Because I think this topic is pretty new to people.
Is there anything we need to tell the reader/user about poll_queues. What is important to usage?
And can it be dynamic or do we have only at (module) startup the ability to define poll_queues?
My goal is to update the blog we built around testing Optane SSDs. Is there a possibility of creating an LWN article that will go deeper (into this change) to poll_queues?
What's interesting in the below data is that the clat time for io_uring is (lower) better but the performance in IOPS is not. Pvsync2 is the most efficient, by a small margin against the newer 3D XPoint device.
Thanks
Frank
Results:
(kernel (el repo) - 5.4.1-1.el8.elrepo.x86_64
cpu - Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz - pinned to run at 3.1
fio - fio-3.16-64-gfd988
Results of Gen2 Optane SSD with poll_queues(pvsync2) v io_uring/hipri
pvsync2 (poll queues)
fio-3.16-64-gfd988
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=552MiB/s][r=141k IOPS][eta 00m:00s]
rand-read-4k-qd1: (groupid=0, jobs=1): err= 0: pid=10309: Tue Dec 31 10:49:33 2019
read: IOPS=141k, BW=552MiB/s (579MB/s)(64.7GiB/120001msec)
clat (nsec): min=6548, max=186309, avg=6809.48, stdev=497.58
lat (nsec): min=6572, max=186333, avg=6834.24, stdev=499.28
clat percentiles (usec):
| 1.0000th=[ 7], 5.0000th=[ 7], 10.0000th=[ 7],
| 20.0000th=[ 7], 30.0000th=[ 7], 40.0000th=[ 7],
| 50.0000th=[ 7], 60.0000th=[ 7], 70.0000th=[ 7],
| 80.0000th=[ 7], 90.0000th=[ 7], 95.0000th=[ 8],
| 99.0000th=[ 8], 99.5000th=[ 8], 99.9000th=[ 9],
| 99.9500th=[ 10], 99.9900th=[ 18], 99.9990th=[ 117],
| 99.9999th=[ 163]
bw ( KiB/s): min=563512, max=567392, per=100.00%, avg=565635.38, stdev=846.99, samples=239
iops : min=140878, max=141848, avg=141408.82, stdev=211.76, samples=239
lat (usec) : 10=99.97%, 20=0.03%, 50=0.01%, 100=0.01%, 250=0.01%
cpu : usr=6.28%, sys=93.55%, ctx=408, majf=0, minf=96
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=16969949,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=552MiB/s (579MB/s), 552MiB/s-552MiB/s (579MB/s-579MB/s), io=64.7GiB (69.5GB), run=120001-120001msec
Disk stats (read/write):
nvme3n1: ios=16955008/0, merge=0/0, ticks=101477/0, in_queue=0, util=99.95%
io_uring:
fio-3.16-64-gfd988
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=538MiB/s][r=138k IOPS][eta 00m:00s]
rand-read-4k-qd1: (groupid=0, jobs=1): err= 0: pid=10797: Tue Dec 31 10:53:29 2019
read: IOPS=138k, BW=539MiB/s (565MB/s)(63.1GiB/120001msec)
slat (nsec): min=1029, max=161248, avg=1204.69, stdev=219.02
clat (nsec): min=262, max=208952, avg=5735.42, stdev=469.73
lat (nsec): min=6691, max=210136, avg=7008.54, stdev=516.99
clat percentiles (usec):
| 1.0000th=[ 6], 5.0000th=[ 6], 10.0000th=[ 6],
| 20.0000th=[ 6], 30.0000th=[ 6], 40.0000th=[ 6],
| 50.0000th=[ 6], 60.0000th=[ 6], 70.0000th=[ 6],
| 80.0000th=[ 6], 90.0000th=[ 6], 95.0000th=[ 6],
| 99.0000th=[ 7], 99.5000th=[ 7], 99.9000th=[ 8],
| 99.9500th=[ 9], 99.9900th=[ 10], 99.9990th=[ 52],
| 99.9999th=[ 161]
bw ( KiB/s): min=548208, max=554504, per=100.00%, avg=551620.30, stdev=984.77, samples=239
iops : min=137052, max=138626, avg=137905.07, stdev=246.17, samples=239
lat (nsec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (usec) : 2=0.01%, 4=0.01%, 10=99.98%, 20=0.01%, 50=0.01%
lat (usec) : 100=0.01%, 250=0.01%
cpu : usr=7.39%, sys=92.44%, ctx=408, majf=0, minf=93
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=16548899,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=539MiB/s (565MB/s), 539MiB/s-539MiB/s (565MB/s-565MB/s), io=63.1GiB (67.8GB), run=120001-120001msec
Disk stats (read/write):
nvme3n1: ios=16534429/0, merge=0/0, ticks=100320/0, in_queue=0, util=99.95%
Happy New Year Keith!
-----Original Message-----
From: Keith Busch <kbusch@kernel.org>
Sent: Friday, December 20, 2019 1:21 PM
To: Ober, Frank <frank.ober@intel.com>
Cc: linux-block@vger.kernel.org; linux-nvme@lists.infradead.org; Derrick, Jonathan <jonathan.derrick@intel.com>; Rajendiran, Swetha <swetha.rajendiran@intel.com>; Liang, Mark <mark.liang@intel.com>
Subject: Re: Polled io for Linux kernel 5.x
On Thu, Dec 19, 2019 at 09:59:14PM +0000, Ober, Frank wrote:
> Thanks Keith, it makes sense to reserve and set it up uniquely if you
> can save hw interrupts. But why would io_uring then not need these
> queues, because a stack trace I ran shows without the special queues I
> am still entering bio_poll. With pvsync2 I can only do polled io with
> the poll_queues?
Polling can happen only if you have polled queues, so io_uring is not accomplishing anything by calling iopoll. I don't see an immediately good way to pass that information up to io_uring, though.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
prev parent reply other threads:[~2019-12-31 19:06 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-19 19:25 Polled io for Linux kernel 5.x Ober, Frank
2019-12-19 20:52 ` Keith Busch
2019-12-19 21:59 ` Ober, Frank
2019-12-20 21:20 ` Keith Busch
2019-12-31 19:06 ` Ober, Frank [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SN6PR11MB26691B36D7AEF22393CC04F38B260@SN6PR11MB2669.namprd11.prod.outlook.com \
--to=frank.ober@intel.com \
--cc=jonathan.derrick@intel.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=mark.liang@intel.com \
--cc=swetha.rajendiran@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).