io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeffle Xu <jefflexu@linux.alibaba.com>
To: snitzer@redhat.com
Cc: joseph.qi@linux.alibaba.com, dm-devel@redhat.com,
	linux-block@vger.kernel.org, io-uring@vger.kernel.org
Subject: [PATCH v2 0/6] dm: support IO polling for bio-based dm device
Date: Mon, 25 Jan 2021 20:13:34 +0800	[thread overview]
Message-ID: <20210125121340.70459-1-jefflexu@linux.alibaba.com> (raw)

Since currently we have no simple but efficient way to implement the
bio-based IO polling in the split-bio tracking style, this patch set
turns to the original implementation mechanism that iterates and
polls all underlying hw queues in polling mode. One optimization is
introduced to mitigate the race of one hw queue among multiple polling
instances.

I'm still open to the split bio tracking mechanism, if there's
reasonable way to implement it.


[Performance Test]
The performance is tested by fio (engine=io_uring) 4k randread on
dm-linear device. The dm-linear device is built upon nvme devices,
and every nvme device has one polling hw queue (nvme.poll_queues=1).

Test Case		    | IOPS in IRQ mode | IOPS in polling mode | Diff
			    | (hipri=0)	       | (hipri=1)	      |
--------------------------- | ---------------- | -------------------- | ----
3 target nvme, num_jobs = 1 | 198k 	       | 276k		      | ~40%
3 target nvme, num_jobs = 3 | 608k 	       | 705k		      | ~16%
6 target nvme, num_jobs = 6 | 1197k 	       | 1347k		      | ~13%
3 target nvme, num_jobs = 6 | 1285k 	       | 1293k		      | ~0%

As the number of polling instances (num_jobs) increases, the
performance improvement decreases, though it's still positive
compared to the IRQ mode.

[Optimization]
To mitigate the race when iterating all the underlying hw queues, one
flag is maintained on a per-hw-queue basis. This flag is used to
indicate whether this polling hw queue currently being polled on or
not. Every polling hw queue is exclusive to one polling instance, i.e.,
the polling instance will skip this polling hw queue if this hw queue
currently is being polled by another polling instance, and start
polling on the next hw queue.

This per-hw-queue flag map is currently maintained in dm layer. In
the table load phase, a table describing all underlying polling hw
queues is built and stored in 'struct dm_table'. It is safe when
reloading the mapping table.


changes since v1:
- patch 1,2,4 is the same as v1 and have already been reviewed
- patch 3 is refactored a bit on the basis of suggestions from
Mike Snitzer.
- patch 5 is newly added and introduces one new queue flag
representing if the queue is capable of IO polling. This mainly
simplifies the logic in queue_poll_store().
- patch 6 implements the core mechanism supporting IO polling.
The sanity check checking if the dm device supports IO polling is
also folded into this patch, and the queue flag will be cleared if
it doesn't support, in case of table reloading.


Jeffle Xu (6):
  block: move definition of blk_qc_t to types.h
  block: add queue_to_disk() to get gendisk from request_queue
  block: add iopoll method to support bio-based IO polling
  dm: always return BLK_QC_T_NONE for bio-based device
  block: add QUEUE_FLAG_POLL_CAP flag
  dm: support IO polling for bio-based dm device

 block/blk-core.c             |  76 +++++++++++++++++++++
 block/blk-mq.c               |  76 +++------------------
 block/blk-sysfs.c            |   3 +-
 drivers/md/dm-core.h         |  21 ++++++
 drivers/md/dm-table.c        | 127 +++++++++++++++++++++++++++++++++++
 drivers/md/dm.c              |  61 ++++++++++++-----
 include/linux/blk-mq.h       |   3 +
 include/linux/blk_types.h    |   2 +-
 include/linux/blkdev.h       |   9 +++
 include/linux/fs.h           |   2 +-
 include/linux/types.h        |   3 +
 include/trace/events/kyber.h |   6 +-
 12 files changed, 302 insertions(+), 87 deletions(-)

-- 
2.27.0


             reply	other threads:[~2021-01-26 20:36 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25 12:13 Jeffle Xu [this message]
2021-01-25 12:13 ` [PATCH v2 1/6] block: move definition of blk_qc_t to types.h Jeffle Xu
2021-01-25 12:13 ` [PATCH v2 2/6] block: add queue_to_disk() to get gendisk from request_queue Jeffle Xu
2021-01-27 17:20   ` Mike Snitzer
2021-01-25 12:13 ` [PATCH v2 3/6] block: add iopoll method to support bio-based IO polling Jeffle Xu
2021-01-27 17:14   ` Mike Snitzer
2021-01-28  8:40   ` Christoph Hellwig
2021-01-28 11:52     ` JeffleXu
2021-01-28 14:36       ` Christoph Hellwig
2021-01-25 12:13 ` [PATCH v2 4/6] dm: always return BLK_QC_T_NONE for bio-based device Jeffle Xu
2021-01-25 12:13 ` [PATCH v2 5/6] block: add QUEUE_FLAG_POLL_CAP flag Jeffle Xu
2021-01-27 17:13   ` Mike Snitzer
2021-01-28  2:07     ` JeffleXu
2021-01-25 12:13 ` [PATCH v2 6/6] dm: support IO polling for bio-based dm device Jeffle Xu
2021-01-29  7:37   ` JeffleXu
2021-01-27 17:19 ` [PATCH v2 0/6] " Mike Snitzer
2021-01-28  3:06   ` JeffleXu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210125121340.70459-1-jefflexu@linux.alibaba.com \
    --to=jefflexu@linux.alibaba.com \
    --cc=dm-devel@redhat.com \
    --cc=io-uring@vger.kernel.org \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-block@vger.kernel.org \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).