All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Harris James R <james.r.harris@intel.com>,
	io-uring@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.com>,
	ZiyangZhang <ZiyangZhang@linux.alibaba.com>,
	Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Ming Lei <ming.lei@redhat.com>
Subject: [PATCH V2 0/1] ubd: add io_uring based userspace block driver
Date: Tue, 17 May 2022 13:53:57 +0800	[thread overview]
Message-ID: <20220517055358.3164431-1-ming.lei@redhat.com> (raw)

Hello Guys,

ubd driver is one kernel driver for implementing generic userspace block
device/driver, which delivers io request from ubd block device(/dev/ubdbN) into
ubd server[1] which is the userspace part of ubd for communicating
with ubd driver and handling specific io logic by its target module.

Another thing ubd driver handles is to copy data between user space buffer
and request/bio's pages, or take zero copy if mm is ready for support it in
future. ubd driver doesn't handle any IO logic of the specific driver, so
it is small/simple, and all io logics are done by the target code in ubdserver.

The above two are main jobs done by ubd driver.

ubd driver can help to move IO logic into userspace, in which the
development work is easier/more effective than doing in kernel, such as,
ubd-loop takes < 200 lines of loop specific code to get basically same 
function with kernel loop block driver, meantime the performance is
still good. ubdsrv[1] provide built-in test for comparing both by running
"make test T=loop".

Another example is high performance qcow2 support[2], which could be built with
ubd framework more easily than doing it inside kernel.

Also there are more people who express interests on userspace block driver[3],
Gabriel Krisman Bertazi proposes this topic in lsf/mm/ebpf 2022 and mentioned
requirement from Google. Ziyang Zhang from Alibaba said they "plan to
replace TCMU by UBD as a new choice" because UBD can get better throughput than
TCMU even with single queue[4], meantime UBD is simple. Also there is userspace
storage service for providing storage to containers.

It is io_uring based: io request is delivered to userspace via new added
io_uring command which has been proved as very efficient for making nvme
passthrough IO to get better IOPS than io_uring(READ/WRITE). Meantime one
shared/mmap buffer is used for sharing io descriptor to userspace, the
buffer is readonly for userspace, each IO just takes 24bytes so far.
It is suggested to use io_uring in userspace(target part of ubd server)
to handle IO request too. And it is still easy for ubdserver to support
io handling by non-io_uring, and this work isn't done yet, but can be
supported easily with help o eventfd.

This way is efficient since no extra io command copy is required, no sleep
is needed in transferring io command to userspace. Meantime the communication
protocol is simple and efficient, one single command of
UBD_IO_COMMIT_AND_FETCH_REQ can handle both fetching io request desc and commit
command result in one trip. IO handling is often batched after single
io_uring_enter() returns, both IO requests from ubd server target and
IO commands could be handled as a whole batch.

Remove RFC now because ubd driver codes gets lots of cleanup, enhancement and
bug fixes since V1:

- cleanup uapi: remove ubd specific error code,  switch to linux error code,
remove one command op, remove one field from cmd_desc

- add monitor mechanism to handle ubq_daemon being killed, ubdsrv[1]
  includes builtin tests for covering heavy IO with deleting ubd / killing
  ubq_daemon at the same time, and V2 pass all the two tests(make test T=generic),
  and the abort/stop mechanism is simple

- fix MQ command buffer mmap bug, and now 'xfstetests -g auto' works well on
  MQ ubd-loop devices(test/scratch)

- improve batching submission as suggested by Jens

- improve handling for starting device, replace random wait/poll with
completion

- all kinds of cleanup, bug fix,..

And the patch by patch change since V1 can be found in the following
tree:

https://github.com/ming1/linux/commits/my_for-5.18-ubd-devel_v2

Todo:
	- add lazy user page release for avoiding cost of pinning user pages in
	ubd_copy_pages() most of time, so we can save CPU for handling io logic
	in userpsace


[1] ubd server
https://github.com/ming1/ubdsrv/commits/devel-v2

[2] qcow2 kernel driver attempt
https://www.spinics.net/lists/kernel/msg4292965.html
https://patchwork.kernel.org/project/linux-block/cover/20190823225619.15530-1-development@manuel-bentele.de/#22841183

[3] [LSF/MM/BPF TOPIC] block drivers in user space
https://lore.kernel.org/linux-block/87tucsf0sr.fsf@collabora.com/

[4] Follow up on UBD discussion
https://lore.kernel.org/linux-block/YnsW+utCrosF0lvm@T590/#r

Ming Lei (1):
  ubd: add io_uring based userspace block driver

 drivers/block/Kconfig        |    6 +
 drivers/block/Makefile       |    2 +
 drivers/block/ubd_drv.c      | 1444 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/ubd_cmd.h |  158 ++++
 4 files changed, 1610 insertions(+)
 create mode 100644 drivers/block/ubd_drv.c
 create mode 100644 include/uapi/linux/ubd_cmd.h

-- 
2.31.1


             reply	other threads:[~2022-05-17  5:54 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-17  5:53 Ming Lei [this message]
2022-05-17  5:53 ` [PATCH V2 1/1] ubd: add io_uring based userspace block driver Ming Lei
2022-05-17 10:00   ` Ziyang Zhang
2022-05-17 12:55     ` Ming Lei
2022-05-18  5:53       ` Ziyang Zhang
2022-05-17  8:01 ` [PATCH V2 0/1] " Christoph Hellwig
2022-05-17 14:06 ` Stefan Hajnoczi
2022-05-18  7:09   ` Ming Lei
2022-05-18 10:45     ` Stefan Hajnoczi
2022-05-18 12:53       ` Ming Lei
2022-05-18 15:49         ` Stefan Hajnoczi
2022-05-19  2:42           ` Ming Lei
2022-05-19  9:46             ` Stefan Hajnoczi
2022-05-18  6:38 ` Liu Xiaodong
2022-05-18 13:18   ` Ming Lei
2022-05-23 14:56     ` Liu Xiaodong
2022-05-24  2:59       ` Ming Lei
2022-05-18  9:26 ` Stefan Hajnoczi
2022-05-19 13:33 ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220517055358.3164431-1-ming.lei@redhat.com \
    --to=ming.lei@redhat.com \
    --cc=ZiyangZhang@linux.alibaba.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=james.r.harris@intel.com \
    --cc=krisman@collabora.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stefanha@redhat.com \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.