linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Xie Yongji <xieyongji@bytedance.com>
Cc: jasowang@redhat.com, akpm@linux-foundation.org,
	linux-mm@kvack.org, virtualization@lists.linux-foundation.org
Subject: Re: [RFC 0/4] Introduce VDUSE - vDPA Device in Userspace
Date: Mon, 19 Oct 2020 13:16:10 -0400	[thread overview]
Message-ID: <20201019130815-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20201019145623.671-1-xieyongji@bytedance.com>

On Mon, Oct 19, 2020 at 10:56:19PM +0800, Xie Yongji wrote:
> This series introduces a framework, which can be used to implement
> vDPA Devices in a userspace program. To implement it, the work
> consist of two parts: control path emulating and data path offloading.
> 
> In the control path, the VDUSE driver will make use of message
> mechnism to forward the actions (get/set features, get/st status,
> get/set config space and set virtqueue states) from virtio-vdpa
> driver to userspace. Userspace can use read()/write() to
> receive/reply to those control messages.
> 
> In the data path, the VDUSE driver implements a MMU-based
> on-chip IOMMU driver which supports both direct mapping and
> indirect mapping with bounce buffer. Then userspace can access
> those iova space via mmap(). Besides, eventfd mechnism is used to
> trigger interrupts and forward virtqueue kicks.
> 
> The details and our user case is shown below:
> 
> ------------------------     -----------------------------------------------------------
> |                  APP |     |                          QEMU                           |
> |       ---------      |     | --------------------    -------------------+<-->+------ |
> |       |dev/vdx|      |     | | device emulation |    | virtio dataplane |    | BDS | |
> ------------+-----------     -----------+-----------------------+-----------------+-----
>             |                           |                       |                 |
>             |                           | emulating             | offloading      |
> ------------+---------------------------+-----------------------+-----------------+------
> |    | block device |           |  vduse driver |        |  vdpa device |    | TCP/IP | |
> |    -------+--------           --------+--------        +------+-------     -----+---- |
> |           |                           |                |      |                 |     |
> |           |                           |                |      |                 |     |
> | ----------+----------       ----------+-----------     |      |                 |     |
> | | virtio-blk driver |       | virtio-vdpa driver |     |      |                 |     |
> | ----------+----------       ----------+-----------     |      |                 |     |
> |           |                           |                |      |                 |     |
> |           |                           ------------------      |                 |     |
> |           -----------------------------------------------------              ---+---  |
> ------------------------------------------------------------------------------ | NIC |---
>                                                                                ---+---
>                                                                                   |
>                                                                          ---------+---------
>                                                                          | Remote Storages |
>                                                                          -------------------
> We make use of it to implement a block device connecting to
> our distributed storage, which can be used in containers and
> bare metal.

What is not exactly clear is what is the APP above doing.

Taking virtio blk requests and sending them over the network
in some proprietary way?

> Compared with qemu-nbd solution, this solution has
> higher performance, and we can have an unified technology stack
> in VM and containers for remote storages.
> 
> To test it with a host disk (e.g. /dev/sdx):
> 
>   $ qemu-storage-daemon \
>       --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
>       --monitor chardev=charmonitor \
>       --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/sdx,node-name=disk0 \
>       --export vduse-blk,id=test,node-name=disk0,writable=on,vduse-id=1,num-queues=16,queue-size=128
> 
> The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse
> 
> Future work:
>   - Improve performance (e.g. zero copy implementation in datapath)
>   - Config interrupt support
>   - Userspace library (find a way to reuse device emulation code in qemu/rust-vmm)


How does this driver compare with vhost-user-blk (which doesn't need kernel support)?



> Xie Yongji (4):
>   mm: export zap_page_range() for driver use
>   vduse: Introduce VDUSE - vDPA Device in Userspace
>   vduse: grab the module's references until there is no vduse device
>   vduse: Add memory shrinker to reclaim bounce pages
> 
>  drivers/vdpa/Kconfig                 |    8 +
>  drivers/vdpa/Makefile                |    1 +
>  drivers/vdpa/vdpa_user/Makefile      |    5 +
>  drivers/vdpa/vdpa_user/eventfd.c     |  221 ++++++
>  drivers/vdpa/vdpa_user/eventfd.h     |   48 ++
>  drivers/vdpa/vdpa_user/iova_domain.c |  488 ++++++++++++
>  drivers/vdpa/vdpa_user/iova_domain.h |  104 +++
>  drivers/vdpa/vdpa_user/vduse.h       |   66 ++
>  drivers/vdpa/vdpa_user/vduse_dev.c   | 1081 ++++++++++++++++++++++++++
>  include/uapi/linux/vduse.h           |   85 ++
>  mm/memory.c                          |    1 +
>  11 files changed, 2108 insertions(+)
>  create mode 100644 drivers/vdpa/vdpa_user/Makefile
>  create mode 100644 drivers/vdpa/vdpa_user/eventfd.c
>  create mode 100644 drivers/vdpa/vdpa_user/eventfd.h
>  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c
>  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h
>  create mode 100644 drivers/vdpa/vdpa_user/vduse.h
>  create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
>  create mode 100644 include/uapi/linux/vduse.h
> 
> -- 
> 2.25.1



  parent reply	other threads:[~2020-10-19 17:16 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-19 14:56 [RFC 0/4] Introduce VDUSE - vDPA Device in Userspace Xie Yongji
2020-10-19 14:56 ` [RFC 1/4] mm: export zap_page_range() for driver use Xie Yongji
2020-10-19 15:14   ` Matthew Wilcox
2020-10-19 15:36     ` [External] " 谢永吉
2020-10-19 14:56 ` [RFC 2/4] vduse: Introduce VDUSE - vDPA Device in Userspace Xie Yongji
2020-10-19 15:08   ` Michael S. Tsirkin
2020-10-19 15:24     ` Randy Dunlap
2020-10-19 15:46       ` [External] " 谢永吉
2020-10-19 15:48     ` 谢永吉
2020-10-19 14:56 ` [RFC 3/4] vduse: grab the module's references until there is no vduse device Xie Yongji
2020-10-19 15:05   ` Michael S. Tsirkin
2020-10-19 15:44     ` [External] " 谢永吉
2020-10-19 15:47       ` Michael S. Tsirkin
2020-10-19 15:56         ` 谢永吉
2020-10-19 16:41           ` Michael S. Tsirkin
2020-10-20  7:42             ` Yongji Xie
2020-10-19 14:56 ` [RFC 4/4] vduse: Add memory shrinker to reclaim bounce pages Xie Yongji
2020-10-19 17:16 ` Michael S. Tsirkin [this message]
2020-10-20  2:18   ` [External] Re: [RFC 0/4] Introduce VDUSE - vDPA Device in Userspace 谢永吉
2020-10-20  2:20     ` Jason Wang
2020-10-20  2:28       ` 谢永吉
2020-10-20  3:20 ` Jason Wang
2020-10-20  7:39   ` [External] " Yongji Xie
2020-10-20  8:01     ` Jason Wang
2020-10-20  8:35       ` Yongji Xie
2020-10-20  9:12         ` Jason Wang
2020-10-23  2:55           ` Yongji Xie
2020-10-23  8:44             ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201019130815-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=jasowang@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).