* [PATCH v4 0/6] Support exporting BDSs via VDUSE
@ 2022-04-06 7:59 Xie Yongji
2022-04-06 7:59 ` [PATCH v4 1/6] block: Support passing NULL ops to blk_set_dev_ops() Xie Yongji
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: Xie Yongji @ 2022-04-06 7:59 UTC (permalink / raw)
To: mst, jasowang, stefanha, sgarzare, kwolf, mreitz, mlureau, jsnow, eblake
Cc: qemu-devel, qemu-block
Hi all,
Last few months ago, VDUSE (vDPA Device in Userspace) [1] has
been merged into Linux kernel as a framework that make it
possible to emulate a vDPA device in userspace. This series
aimed at implementing a VDUSE block backend based on the
qemu-storage-daemon infrastructure.
To support that, we firstly introduce a VDUSE library as a
subproject (like what libvhost-user does) to help implementing
VDUSE backends in QEMU. Then a VDUSE block export is implemented
based on this library. At last, we add resize and reconnect support
to the VDUSE block export and VDUSE library.
Since we don't support vdpa-blk in QEMU currently, the VM case is
tested with my previous patchset [2].
[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg797569.html
Please review, thanks!
V3 to V4:
- Fix some comments on QAPI [Eric]
V2 to V3:
- Introduce vduse_get_virtio_features() [Stefan]
- Update MAINTAINERS file [Stefan]
- Fix handler of VIRTIO_BLK_T_GET_ID request [Stefan]
- Add barrier for vduse_queue_inflight_get() [Stefan]
V1 to V2:
- Move vduse header to linux-headers [Stefan]
- Add two new API to support creating device from /dev/vduse/$NAME or
file descriptor [Stefan]
- Check VIRTIO_F_VERSION_1 during intialization [Stefan]
- Replace malloc() + memset to calloc() [Stefan]
- Increase default queue size to 256 for vduse-blk [Stefan]
- Zero-initialize virtio-blk config space [Stefan]
- Add a patch to support reset blk->dev_ops
- Validate vq->log->inflight fields [Stefan]
- Add vduse_set_reconnect_log_file() API to support specifing the
reconnect log file
- Fix some bugs [Stefan]
Xie Yongji (6):
block: Support passing NULL ops to blk_set_dev_ops()
linux-headers: Add vduse.h
libvduse: Add VDUSE (vDPA Device in Userspace) library
vduse-blk: implements vduse-blk export
vduse-blk: Add vduse-blk resize support
libvduse: Add support for reconnecting
MAINTAINERS | 7 +
block/block-backend.c | 2 +-
block/export/export.c | 6 +
block/export/meson.build | 5 +
block/export/vduse-blk.c | 459 ++++++
block/export/vduse-blk.h | 20 +
linux-headers/linux/vduse.h | 306 ++++
meson.build | 28 +
meson_options.txt | 4 +
qapi/block-export.json | 25 +-
scripts/meson-buildoptions.sh | 7 +
scripts/update-linux-headers.sh | 2 +-
subprojects/libvduse/include/atomic.h | 1 +
subprojects/libvduse/libvduse.c | 1386 +++++++++++++++++++
subprojects/libvduse/libvduse.h | 247 ++++
subprojects/libvduse/linux-headers/linux | 1 +
subprojects/libvduse/meson.build | 10 +
subprojects/libvduse/standard-headers/linux | 1 +
18 files changed, 2513 insertions(+), 4 deletions(-)
create mode 100644 block/export/vduse-blk.c
create mode 100644 block/export/vduse-blk.h
create mode 100644 linux-headers/linux/vduse.h
create mode 120000 subprojects/libvduse/include/atomic.h
create mode 100644 subprojects/libvduse/libvduse.c
create mode 100644 subprojects/libvduse/libvduse.h
create mode 120000 subprojects/libvduse/linux-headers/linux
create mode 100644 subprojects/libvduse/meson.build
create mode 120000 subprojects/libvduse/standard-headers/linux
--
2.20.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 1/6] block: Support passing NULL ops to blk_set_dev_ops()
2022-04-06 7:59 [PATCH v4 0/6] Support exporting BDSs via VDUSE Xie Yongji
@ 2022-04-06 7:59 ` Xie Yongji
2022-04-06 7:59 ` [PATCH v4 2/6] linux-headers: Add vduse.h Xie Yongji
` (4 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Xie Yongji @ 2022-04-06 7:59 UTC (permalink / raw)
To: mst, jasowang, stefanha, sgarzare, kwolf, mreitz, mlureau, jsnow, eblake
Cc: qemu-devel, qemu-block
This supports passing NULL ops to blk_set_dev_ops()
so that we can remove stale ops in some cases.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
block/block-backend.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/block-backend.c b/block/block-backend.c
index e0e1aff4b1..35457a6a1d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1062,7 +1062,7 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
blk->dev_opaque = opaque;
/* Are we currently quiesced? Should we enforce this right now? */
- if (blk->quiesce_counter && ops->drained_begin) {
+ if (blk->quiesce_counter && ops && ops->drained_begin) {
ops->drained_begin(opaque);
}
}
--
2.20.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 2/6] linux-headers: Add vduse.h
2022-04-06 7:59 [PATCH v4 0/6] Support exporting BDSs via VDUSE Xie Yongji
2022-04-06 7:59 ` [PATCH v4 1/6] block: Support passing NULL ops to blk_set_dev_ops() Xie Yongji
@ 2022-04-06 7:59 ` Xie Yongji
2022-04-06 7:59 ` [PATCH v4 3/6] libvduse: Add VDUSE (vDPA Device in Userspace) library Xie Yongji
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Xie Yongji @ 2022-04-06 7:59 UTC (permalink / raw)
To: mst, jasowang, stefanha, sgarzare, kwolf, mreitz, mlureau, jsnow, eblake
Cc: qemu-devel, qemu-block
This adds vduse header to linux headers so that the
relevant VDUSE API can be used in subsequent patches.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
| 306 ++++++++++++++++++++++++++++++++
| 2 +-
2 files changed, 307 insertions(+), 1 deletion(-)
create mode 100644 linux-headers/linux/vduse.h
--git a/linux-headers/linux/vduse.h b/linux-headers/linux/vduse.h
new file mode 100644
index 0000000000..d47b004ce6
--- /dev/null
+++ b/linux-headers/linux/vduse.h
@@ -0,0 +1,306 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _VDUSE_H_
+#define _VDUSE_H_
+
+#include <linux/types.h>
+
+#define VDUSE_BASE 0x81
+
+/* The ioctls for control device (/dev/vduse/control) */
+
+#define VDUSE_API_VERSION 0
+
+/*
+ * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
+ * This is used for future extension.
+ */
+#define VDUSE_GET_API_VERSION _IOR(VDUSE_BASE, 0x00, __u64)
+
+/* Set the version of VDUSE API that userspace supported. */
+#define VDUSE_SET_API_VERSION _IOW(VDUSE_BASE, 0x01, __u64)
+
+/**
+ * struct vduse_dev_config - basic configuration of a VDUSE device
+ * @name: VDUSE device name, needs to be NUL terminated
+ * @vendor_id: virtio vendor id
+ * @device_id: virtio device id
+ * @features: virtio features
+ * @vq_num: the number of virtqueues
+ * @vq_align: the allocation alignment of virtqueue's metadata
+ * @reserved: for future use, needs to be initialized to zero
+ * @config_size: the size of the configuration space
+ * @config: the buffer of the configuration space
+ *
+ * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device.
+ */
+struct vduse_dev_config {
+#define VDUSE_NAME_MAX 256
+ char name[VDUSE_NAME_MAX];
+ __u32 vendor_id;
+ __u32 device_id;
+ __u64 features;
+ __u32 vq_num;
+ __u32 vq_align;
+ __u32 reserved[13];
+ __u32 config_size;
+ __u8 config[];
+};
+
+/* Create a VDUSE device which is represented by a char device (/dev/vduse/$NAME) */
+#define VDUSE_CREATE_DEV _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config)
+
+/*
+ * Destroy a VDUSE device. Make sure there are no more references
+ * to the char device (/dev/vduse/$NAME).
+ */
+#define VDUSE_DESTROY_DEV _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX])
+
+/* The ioctls for VDUSE device (/dev/vduse/$NAME) */
+
+/**
+ * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region [start, last]
+ * @offset: the mmap offset on returned file descriptor
+ * @start: start of the IOVA region
+ * @last: last of the IOVA region
+ * @perm: access permission of the IOVA region
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA region.
+ */
+struct vduse_iotlb_entry {
+ __u64 offset;
+ __u64 start;
+ __u64 last;
+#define VDUSE_ACCESS_RO 0x1
+#define VDUSE_ACCESS_WO 0x2
+#define VDUSE_ACCESS_RW 0x3
+ __u8 perm;
+};
+
+/*
+ * Find the first IOVA region that overlaps with the range [start, last]
+ * and return the corresponding file descriptor. Return -EINVAL means the
+ * IOVA region doesn't exist. Caller should set start and last fields.
+ */
+#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct vduse_iotlb_entry)
+
+/*
+ * Get the negotiated virtio features. It's a subset of the features in
+ * struct vduse_dev_config which can be accepted by virtio driver. It's
+ * only valid after FEATURES_OK status bit is set.
+ */
+#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, __u64)
+
+/**
+ * struct vduse_config_data - data used to update configuration space
+ * @offset: the offset from the beginning of configuration space
+ * @length: the length to write to configuration space
+ * @buffer: the buffer used to write from
+ *
+ * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device
+ * configuration space.
+ */
+struct vduse_config_data {
+ __u32 offset;
+ __u32 length;
+ __u8 buffer[];
+};
+
+/* Set device configuration space */
+#define VDUSE_DEV_SET_CONFIG _IOW(VDUSE_BASE, 0x12, struct vduse_config_data)
+
+/*
+ * Inject a config interrupt. It's usually used to notify virtio driver
+ * that device configuration space has changed.
+ */
+#define VDUSE_DEV_INJECT_CONFIG_IRQ _IO(VDUSE_BASE, 0x13)
+
+/**
+ * struct vduse_vq_config - basic configuration of a virtqueue
+ * @index: virtqueue index
+ * @max_size: the max size of virtqueue
+ * @reserved: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue.
+ */
+struct vduse_vq_config {
+ __u32 index;
+ __u16 max_size;
+ __u16 reserved[13];
+};
+
+/*
+ * Setup the specified virtqueue. Make sure all virtqueues have been
+ * configured before the device is attached to vDPA bus.
+ */
+#define VDUSE_VQ_SETUP _IOW(VDUSE_BASE, 0x14, struct vduse_vq_config)
+
+/**
+ * struct vduse_vq_state_split - split virtqueue state
+ * @avail_index: available index
+ */
+struct vduse_vq_state_split {
+ __u16 avail_index;
+};
+
+/**
+ * struct vduse_vq_state_packed - packed virtqueue state
+ * @last_avail_counter: last driver ring wrap counter observed by device
+ * @last_avail_idx: device available index
+ * @last_used_counter: device ring wrap counter
+ * @last_used_idx: used index
+ */
+struct vduse_vq_state_packed {
+ __u16 last_avail_counter;
+ __u16 last_avail_idx;
+ __u16 last_used_counter;
+ __u16 last_used_idx;
+};
+
+/**
+ * struct vduse_vq_info - information of a virtqueue
+ * @index: virtqueue index
+ * @num: the size of virtqueue
+ * @desc_addr: address of desc area
+ * @driver_addr: address of driver area
+ * @device_addr: address of device area
+ * @split: split virtqueue state
+ * @packed: packed virtqueue state
+ * @ready: ready status of virtqueue
+ *
+ * Structure used by VDUSE_VQ_GET_INFO ioctl to get virtqueue's information.
+ */
+struct vduse_vq_info {
+ __u32 index;
+ __u32 num;
+ __u64 desc_addr;
+ __u64 driver_addr;
+ __u64 device_addr;
+ union {
+ struct vduse_vq_state_split split;
+ struct vduse_vq_state_packed packed;
+ };
+ __u8 ready;
+};
+
+/* Get the specified virtqueue's information. Caller should set index field. */
+#define VDUSE_VQ_GET_INFO _IOWR(VDUSE_BASE, 0x15, struct vduse_vq_info)
+
+/**
+ * struct vduse_vq_eventfd - eventfd configuration for a virtqueue
+ * @index: virtqueue index
+ * @fd: eventfd, -1 means de-assigning the eventfd
+ *
+ * Structure used by VDUSE_VQ_SETUP_KICKFD ioctl to setup kick eventfd.
+ */
+struct vduse_vq_eventfd {
+ __u32 index;
+#define VDUSE_EVENTFD_DEASSIGN -1
+ int fd;
+};
+
+/*
+ * Setup kick eventfd for specified virtqueue. The kick eventfd is used
+ * by VDUSE kernel module to notify userspace to consume the avail vring.
+ */
+#define VDUSE_VQ_SETUP_KICKFD _IOW(VDUSE_BASE, 0x16, struct vduse_vq_eventfd)
+
+/*
+ * Inject an interrupt for specific virtqueue. It's used to notify virtio driver
+ * to consume the used vring.
+ */
+#define VDUSE_VQ_INJECT_IRQ _IOW(VDUSE_BASE, 0x17, __u32)
+
+/* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */
+
+/**
+ * enum vduse_req_type - request type
+ * @VDUSE_GET_VQ_STATE: get the state for specified virtqueue from userspace
+ * @VDUSE_SET_STATUS: set the device status
+ * @VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping for
+ * specified IOVA range via VDUSE_IOTLB_GET_FD ioctl
+ */
+enum vduse_req_type {
+ VDUSE_GET_VQ_STATE,
+ VDUSE_SET_STATUS,
+ VDUSE_UPDATE_IOTLB,
+};
+
+/**
+ * struct vduse_vq_state - virtqueue state
+ * @index: virtqueue index
+ * @split: split virtqueue state
+ * @packed: packed virtqueue state
+ */
+struct vduse_vq_state {
+ __u32 index;
+ union {
+ struct vduse_vq_state_split split;
+ struct vduse_vq_state_packed packed;
+ };
+};
+
+/**
+ * struct vduse_dev_status - device status
+ * @status: device status
+ */
+struct vduse_dev_status {
+ __u8 status;
+};
+
+/**
+ * struct vduse_iova_range - IOVA range [start, last]
+ * @start: start of the IOVA range
+ * @last: last of the IOVA range
+ */
+struct vduse_iova_range {
+ __u64 start;
+ __u64 last;
+};
+
+/**
+ * struct vduse_dev_request - control request
+ * @type: request type
+ * @request_id: request id
+ * @reserved: for future use
+ * @vq_state: virtqueue state, only index field is available
+ * @s: device status
+ * @iova: IOVA range for updating
+ * @padding: padding
+ *
+ * Structure used by read(2) on /dev/vduse/$NAME.
+ */
+struct vduse_dev_request {
+ __u32 type;
+ __u32 request_id;
+ __u32 reserved[4];
+ union {
+ struct vduse_vq_state vq_state;
+ struct vduse_dev_status s;
+ struct vduse_iova_range iova;
+ __u32 padding[32];
+ };
+};
+
+/**
+ * struct vduse_dev_response - response to control request
+ * @request_id: corresponding request id
+ * @result: the result of request
+ * @reserved: for future use, needs to be initialized to zero
+ * @vq_state: virtqueue state
+ * @padding: padding
+ *
+ * Structure used by write(2) on /dev/vduse/$NAME.
+ */
+struct vduse_dev_response {
+ __u32 request_id;
+#define VDUSE_REQ_RESULT_OK 0x00
+#define VDUSE_REQ_RESULT_FAILED 0x01
+ __u32 result;
+ __u32 reserved[4];
+ union {
+ struct vduse_vq_state vq_state;
+ __u32 padding[32];
+ };
+};
+
+#endif /* _VDUSE_H_ */
--git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 839a5ec614..b1ad99cba8 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -161,7 +161,7 @@ done
rm -rf "$output/linux-headers/linux"
mkdir -p "$output/linux-headers/linux"
for header in kvm.h vfio.h vfio_ccw.h vfio_zdev.h vhost.h \
- psci.h psp-sev.h userfaultfd.h mman.h; do
+ psci.h psp-sev.h userfaultfd.h mman.h vduse.h; do
cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
done
--
2.20.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 3/6] libvduse: Add VDUSE (vDPA Device in Userspace) library
2022-04-06 7:59 [PATCH v4 0/6] Support exporting BDSs via VDUSE Xie Yongji
2022-04-06 7:59 ` [PATCH v4 1/6] block: Support passing NULL ops to blk_set_dev_ops() Xie Yongji
2022-04-06 7:59 ` [PATCH v4 2/6] linux-headers: Add vduse.h Xie Yongji
@ 2022-04-06 7:59 ` Xie Yongji
2022-04-06 7:59 ` [PATCH v4 4/6] vduse-blk: implements vduse-blk export Xie Yongji
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Xie Yongji @ 2022-04-06 7:59 UTC (permalink / raw)
To: mst, jasowang, stefanha, sgarzare, kwolf, mreitz, mlureau, jsnow, eblake
Cc: qemu-devel, qemu-block
VDUSE [1] is a linux framework that makes it possible to implement
software-emulated vDPA devices in userspace. This adds a library
as a subproject to help implementing VDUSE backends in QEMU.
[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
MAINTAINERS | 5 +
meson.build | 15 +
meson_options.txt | 2 +
scripts/meson-buildoptions.sh | 3 +
subprojects/libvduse/include/atomic.h | 1 +
subprojects/libvduse/libvduse.c | 1161 +++++++++++++++++++
subprojects/libvduse/libvduse.h | 235 ++++
| 1 +
subprojects/libvduse/meson.build | 10 +
subprojects/libvduse/standard-headers/linux | 1 +
10 files changed, 1434 insertions(+)
create mode 120000 subprojects/libvduse/include/atomic.h
create mode 100644 subprojects/libvduse/libvduse.c
create mode 100644 subprojects/libvduse/libvduse.h
create mode 120000 subprojects/libvduse/linux-headers/linux
create mode 100644 subprojects/libvduse/meson.build
create mode 120000 subprojects/libvduse/standard-headers/linux
diff --git a/MAINTAINERS b/MAINTAINERS
index 9aed5f3e04..53a14bf7a8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3547,6 +3547,11 @@ L: qemu-block@nongnu.org
S: Supported
F: block/export/fuse.c
+VDUSE library
+M: Xie Yongji <xieyongji@bytedance.com>
+S: Maintained
+F: subprojects/libvduse/
+
Replication
M: Wen Congyang <wencongyang2@huawei.com>
M: Xie Changlong <xiechanglong.d@gmail.com>
diff --git a/meson.build b/meson.build
index bae62efc9c..5c71904461 100644
--- a/meson.build
+++ b/meson.build
@@ -1351,6 +1351,21 @@ if get_option('fuse_lseek').allowed()
endif
endif
+have_libvduse = (targetos == 'linux')
+if get_option('libvduse').enabled()
+ if targetos != 'linux'
+ error('libvduse requires linux')
+ endif
+elif get_option('libvduse').disabled()
+ have_libvduse = false
+endif
+
+libvduse = not_found
+if have_libvduse
+ libvduse_proj = subproject('libvduse')
+ libvduse = libvduse_proj.get_variable('libvduse_dep')
+endif
+
# libbpf
libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
if libbpf.found() and not cc.links('''
diff --git a/meson_options.txt b/meson_options.txt
index 52b11cead4..e25af3277d 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -219,6 +219,8 @@ option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
option('virtiofsd', type: 'feature', value: 'auto',
description: 'build virtiofs daemon (virtiofsd)')
+option('libvduse', type: 'feature', value: 'auto',
+ description: 'build VDUSE Library')
option('capstone', type: 'combo', value: 'auto',
choices: ['disabled', 'enabled', 'auto', 'system', 'internal'],
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 1e26f4571e..ccab9ca9da 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -77,6 +77,7 @@ meson_options_help() {
printf "%s\n" ' libssh ssh block device support'
printf "%s\n" ' libudev Use libudev to enumerate host devices'
printf "%s\n" ' libusb libusb support for USB passthrough'
+ printf "%s\n" ' libvduse build VDUSE Library'
printf "%s\n" ' linux-aio Linux AIO support'
printf "%s\n" ' linux-io-uring Linux io_uring support'
printf "%s\n" ' live-block-migration'
@@ -244,6 +245,8 @@ _meson_option_parse() {
--disable-libudev) printf "%s" -Dlibudev=disabled ;;
--enable-libusb) printf "%s" -Dlibusb=enabled ;;
--disable-libusb) printf "%s" -Dlibusb=disabled ;;
+ --enable-libvduse) printf "%s" -Dlibvduse=enabled ;;
+ --disable-libvduse) printf "%s" -Dlibvduse=disabled ;;
--enable-linux-aio) printf "%s" -Dlinux_aio=enabled ;;
--disable-linux-aio) printf "%s" -Dlinux_aio=disabled ;;
--enable-linux-io-uring) printf "%s" -Dlinux_io_uring=enabled ;;
diff --git a/subprojects/libvduse/include/atomic.h b/subprojects/libvduse/include/atomic.h
new file mode 120000
index 0000000000..8c2be64f7b
--- /dev/null
+++ b/subprojects/libvduse/include/atomic.h
@@ -0,0 +1 @@
+../../../include/qemu/atomic.h
\ No newline at end of file
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
new file mode 100644
index 0000000000..ecee9c0568
--- /dev/null
+++ b/subprojects/libvduse/libvduse.c
@@ -0,0 +1,1161 @@
+/*
+ * VDUSE (vDPA Device in Userspace) library
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved.
+ * Portions of codes and concepts borrowed from libvhost-user.c, so:
+ * Copyright IBM, Corp. 2007
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * Author:
+ * Xie Yongji <xieyongji@bytedance.com>
+ * Anthony Liguori <aliguori@us.ibm.com>
+ * Marc-André Lureau <mlureau@redhat.com>
+ * Victor Kaplansky <victork@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <errno.h>
+#include <string.h>
+#include <assert.h>
+#include <endian.h>
+#include <unistd.h>
+#include <limits.h>
+#include <fcntl.h>
+
+#include <sys/ioctl.h>
+#include <sys/eventfd.h>
+#include <sys/mman.h>
+
+#include "include/atomic.h"
+#include "linux-headers/linux/virtio_ring.h"
+#include "linux-headers/linux/virtio_config.h"
+#include "linux-headers/linux/vduse.h"
+#include "libvduse.h"
+
+#define VDUSE_VQ_ALIGN 4096
+#define MAX_IOVA_REGIONS 256
+
+/* Round number down to multiple */
+#define ALIGN_DOWN(n, m) ((n) / (m) * (m))
+
+/* Round number up to multiple */
+#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m))
+
+#ifndef unlikely
+#define unlikely(x) __builtin_expect(!!(x), 0)
+#endif
+
+typedef struct VduseRing {
+ unsigned int num;
+ uint64_t desc_addr;
+ uint64_t avail_addr;
+ uint64_t used_addr;
+ struct vring_desc *desc;
+ struct vring_avail *avail;
+ struct vring_used *used;
+} VduseRing;
+
+struct VduseVirtq {
+ VduseRing vring;
+ uint16_t last_avail_idx;
+ uint16_t shadow_avail_idx;
+ uint16_t used_idx;
+ uint16_t signalled_used;
+ bool signalled_used_valid;
+ int index;
+ int inuse;
+ bool ready;
+ int fd;
+ VduseDev *dev;
+};
+
+typedef struct VduseIovaRegion {
+ uint64_t iova;
+ uint64_t size;
+ uint64_t mmap_offset;
+ uint64_t mmap_addr;
+} VduseIovaRegion;
+
+struct VduseDev {
+ VduseVirtq *vqs;
+ VduseIovaRegion regions[MAX_IOVA_REGIONS];
+ int num_regions;
+ char *name;
+ uint32_t device_id;
+ uint32_t vendor_id;
+ uint16_t num_queues;
+ uint16_t queue_size;
+ uint64_t features;
+ const VduseOps *ops;
+ int fd;
+ int ctrl_fd;
+ void *priv;
+};
+
+static inline bool has_feature(uint64_t features, unsigned int fbit)
+{
+ assert(fbit < 64);
+ return !!(features & (1ULL << fbit));
+}
+
+static inline bool vduse_dev_has_feature(VduseDev *dev, unsigned int fbit)
+{
+ return has_feature(dev->features, fbit);
+}
+
+uint64_t vduse_get_virtio_features(void)
+{
+ return (1ULL << VIRTIO_F_IOMMU_PLATFORM) |
+ (1ULL << VIRTIO_F_VERSION_1) |
+ (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) |
+ (1ULL << VIRTIO_RING_F_EVENT_IDX) |
+ (1ULL << VIRTIO_RING_F_INDIRECT_DESC);
+}
+
+VduseDev *vduse_queue_get_dev(VduseVirtq *vq)
+{
+ return vq->dev;
+}
+
+int vduse_queue_get_fd(VduseVirtq *vq)
+{
+ return vq->fd;
+}
+
+void *vduse_dev_get_priv(VduseDev *dev)
+{
+ return dev->priv;
+}
+
+VduseVirtq *vduse_dev_get_queue(VduseDev *dev, int index)
+{
+ return &dev->vqs[index];
+}
+
+int vduse_dev_get_fd(VduseDev *dev)
+{
+ return dev->fd;
+}
+
+static int vduse_inject_irq(VduseDev *dev, int index)
+{
+ return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index);
+}
+
+static void vduse_iova_remove_region(VduseDev *dev, uint64_t start,
+ uint64_t last)
+{
+ int i;
+
+ if (last == start) {
+ return;
+ }
+
+ for (i = 0; i < MAX_IOVA_REGIONS; i++) {
+ if (!dev->regions[i].mmap_addr) {
+ continue;
+ }
+
+ if (start <= dev->regions[i].iova &&
+ last >= (dev->regions[i].iova + dev->regions[i].size - 1)) {
+ munmap((void *)dev->regions[i].mmap_addr,
+ dev->regions[i].mmap_offset + dev->regions[i].size);
+ dev->regions[i].mmap_addr = 0;
+ dev->num_regions--;
+ }
+ }
+}
+
+static int vduse_iova_add_region(VduseDev *dev, int fd,
+ uint64_t offset, uint64_t start,
+ uint64_t last, int prot)
+{
+ int i;
+ uint64_t size = last - start + 1;
+ void *mmap_addr = mmap(0, size + offset, prot, MAP_SHARED, fd, 0);
+
+ if (mmap_addr == MAP_FAILED) {
+ close(fd);
+ return -EINVAL;
+ }
+
+ for (i = 0; i < MAX_IOVA_REGIONS; i++) {
+ if (!dev->regions[i].mmap_addr) {
+ dev->regions[i].mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
+ dev->regions[i].mmap_offset = offset;
+ dev->regions[i].iova = start;
+ dev->regions[i].size = size;
+ dev->num_regions++;
+ break;
+ }
+ }
+ assert(i < MAX_IOVA_REGIONS);
+ close(fd);
+
+ return 0;
+}
+
+static int perm_to_prot(uint8_t perm)
+{
+ int prot = 0;
+
+ switch (perm) {
+ case VDUSE_ACCESS_WO:
+ prot |= PROT_WRITE;
+ break;
+ case VDUSE_ACCESS_RO:
+ prot |= PROT_READ;
+ break;
+ case VDUSE_ACCESS_RW:
+ prot |= PROT_READ | PROT_WRITE;
+ break;
+ default:
+ break;
+ }
+
+ return prot;
+}
+
+static inline void *iova_to_va(VduseDev *dev, uint64_t *plen, uint64_t iova)
+{
+ int i, ret;
+ struct vduse_iotlb_entry entry;
+
+ for (i = 0; i < MAX_IOVA_REGIONS; i++) {
+ VduseIovaRegion *r = &dev->regions[i];
+
+ if (!r->mmap_addr) {
+ continue;
+ }
+
+ if ((iova >= r->iova) && (iova < (r->iova + r->size))) {
+ if ((iova + *plen) > (r->iova + r->size)) {
+ *plen = r->iova + r->size - iova;
+ }
+ return (void *)(uintptr_t)(iova - r->iova +
+ r->mmap_addr + r->mmap_offset);
+ }
+ }
+
+ entry.start = iova;
+ entry.last = iova + 1;
+ ret = ioctl(dev->fd, VDUSE_IOTLB_GET_FD, &entry);
+ if (ret < 0) {
+ return NULL;
+ }
+
+ if (!vduse_iova_add_region(dev, ret, entry.offset, entry.start,
+ entry.last, perm_to_prot(entry.perm))) {
+ return iova_to_va(dev, plen, iova);
+ }
+
+ return NULL;
+}
+
+static inline uint16_t vring_avail_flags(VduseVirtq *vq)
+{
+ return le16toh(vq->vring.avail->flags);
+}
+
+static inline uint16_t vring_avail_idx(VduseVirtq *vq)
+{
+ vq->shadow_avail_idx = le16toh(vq->vring.avail->idx);
+
+ return vq->shadow_avail_idx;
+}
+
+static inline uint16_t vring_avail_ring(VduseVirtq *vq, int i)
+{
+ return le16toh(vq->vring.avail->ring[i]);
+}
+
+static inline uint16_t vring_get_used_event(VduseVirtq *vq)
+{
+ return vring_avail_ring(vq, vq->vring.num);
+}
+
+static bool vduse_queue_get_head(VduseVirtq *vq, unsigned int idx,
+ unsigned int *head)
+{
+ /*
+ * Grab the next descriptor number they're advertising, and increment
+ * the index we've seen.
+ */
+ *head = vring_avail_ring(vq, idx % vq->vring.num);
+
+ /* If their number is silly, that's a fatal mistake. */
+ if (*head >= vq->vring.num) {
+ fprintf(stderr, "Guest says index %u is available\n", *head);
+ return false;
+ }
+
+ return true;
+}
+
+static int
+vduse_queue_read_indirect_desc(VduseDev *dev, struct vring_desc *desc,
+ uint64_t addr, size_t len)
+{
+ struct vring_desc *ori_desc;
+ uint64_t read_len;
+
+ if (len > (VIRTQUEUE_MAX_SIZE * sizeof(struct vring_desc))) {
+ return -1;
+ }
+
+ if (len == 0) {
+ return -1;
+ }
+
+ while (len) {
+ read_len = len;
+ ori_desc = iova_to_va(dev, &read_len, addr);
+ if (!ori_desc) {
+ return -1;
+ }
+
+ memcpy(desc, ori_desc, read_len);
+ len -= read_len;
+ addr += read_len;
+ desc += read_len;
+ }
+
+ return 0;
+}
+
+enum {
+ VIRTQUEUE_READ_DESC_ERROR = -1,
+ VIRTQUEUE_READ_DESC_DONE = 0, /* end of chain */
+ VIRTQUEUE_READ_DESC_MORE = 1, /* more buffers in chain */
+};
+
+static int vduse_queue_read_next_desc(struct vring_desc *desc, int i,
+ unsigned int max, unsigned int *next)
+{
+ /* If this descriptor says it doesn't chain, we're done. */
+ if (!(le16toh(desc[i].flags) & VRING_DESC_F_NEXT)) {
+ return VIRTQUEUE_READ_DESC_DONE;
+ }
+
+ /* Check they're not leading us off end of descriptors. */
+ *next = desc[i].next;
+ /* Make sure compiler knows to grab that: we don't want it changing! */
+ smp_wmb();
+
+ if (*next >= max) {
+ fprintf(stderr, "Desc next is %u\n", *next);
+ return VIRTQUEUE_READ_DESC_ERROR;
+ }
+
+ return VIRTQUEUE_READ_DESC_MORE;
+}
+
+/*
+ * Fetch avail_idx from VQ memory only when we really need to know if
+ * guest has added some buffers.
+ */
+static bool vduse_queue_empty(VduseVirtq *vq)
+{
+ if (unlikely(!vq->vring.avail)) {
+ return true;
+ }
+
+ if (vq->shadow_avail_idx != vq->last_avail_idx) {
+ return false;
+ }
+
+ return vring_avail_idx(vq) == vq->last_avail_idx;
+}
+
+static bool vduse_queue_should_notify(VduseVirtq *vq)
+{
+ VduseDev *dev = vq->dev;
+ uint16_t old, new;
+ bool v;
+
+ /* We need to expose used array entries before checking used event. */
+ smp_mb();
+
+ /* Always notify when queue is empty (when feature acknowledge) */
+ if (vduse_dev_has_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY) &&
+ !vq->inuse && vduse_queue_empty(vq)) {
+ return true;
+ }
+
+ if (!vduse_dev_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
+ return !(vring_avail_flags(vq) & VRING_AVAIL_F_NO_INTERRUPT);
+ }
+
+ v = vq->signalled_used_valid;
+ vq->signalled_used_valid = true;
+ old = vq->signalled_used;
+ new = vq->signalled_used = vq->used_idx;
+ return !v || vring_need_event(vring_get_used_event(vq), new, old);
+}
+
+void vduse_queue_notify(VduseVirtq *vq)
+{
+ VduseDev *dev = vq->dev;
+
+ if (unlikely(!vq->vring.avail)) {
+ return;
+ }
+
+ if (!vduse_queue_should_notify(vq)) {
+ return;
+ }
+
+ if (vduse_inject_irq(dev, vq->index) < 0) {
+ fprintf(stderr, "Error inject irq for vq %d: %s\n",
+ vq->index, strerror(errno));
+ }
+}
+
+static inline void vring_used_flags_set_bit(VduseVirtq *vq, int mask)
+{
+ uint16_t *flags;
+
+ flags = (uint16_t *)((char*)vq->vring.used +
+ offsetof(struct vring_used, flags));
+ *flags = htole16(le16toh(*flags) | mask);
+}
+
+static inline void vring_used_flags_unset_bit(VduseVirtq *vq, int mask)
+{
+ uint16_t *flags;
+
+ flags = (uint16_t *)((char*)vq->vring.used +
+ offsetof(struct vring_used, flags));
+ *flags = htole16(le16toh(*flags) & ~mask);
+}
+
+static inline void vring_set_avail_event(VduseVirtq *vq, uint16_t val)
+{
+ *((uint16_t *)&vq->vring.used->ring[vq->vring.num]) = htole16(val);
+}
+
+static bool vduse_queue_map_single_desc(VduseVirtq *vq, unsigned int *p_num_sg,
+ struct iovec *iov, unsigned int max_num_sg,
+ bool is_write, uint64_t pa, size_t sz)
+{
+ unsigned num_sg = *p_num_sg;
+ VduseDev *dev = vq->dev;
+
+ assert(num_sg <= max_num_sg);
+
+ if (!sz) {
+ fprintf(stderr, "virtio: zero sized buffers are not allowed\n");
+ return false;
+ }
+
+ while (sz) {
+ uint64_t len = sz;
+
+ if (num_sg == max_num_sg) {
+ fprintf(stderr,
+ "virtio: too many descriptors in indirect table\n");
+ return false;
+ }
+
+ iov[num_sg].iov_base = iova_to_va(dev, &len, pa);
+ if (iov[num_sg].iov_base == NULL) {
+ fprintf(stderr, "virtio: invalid address for buffers\n");
+ return false;
+ }
+ iov[num_sg++].iov_len = len;
+ sz -= len;
+ pa += len;
+ }
+
+ *p_num_sg = num_sg;
+ return true;
+}
+
+static void *vduse_queue_alloc_element(size_t sz, unsigned out_num,
+ unsigned in_num)
+{
+ VduseVirtqElement *elem;
+ size_t in_sg_ofs = ALIGN_UP(sz, __alignof__(elem->in_sg[0]));
+ size_t out_sg_ofs = in_sg_ofs + in_num * sizeof(elem->in_sg[0]);
+ size_t out_sg_end = out_sg_ofs + out_num * sizeof(elem->out_sg[0]);
+
+ assert(sz >= sizeof(VduseVirtqElement));
+ elem = malloc(out_sg_end);
+ if (!elem) {
+ return NULL;
+ }
+ elem->out_num = out_num;
+ elem->in_num = in_num;
+ elem->in_sg = (void *)elem + in_sg_ofs;
+ elem->out_sg = (void *)elem + out_sg_ofs;
+ return elem;
+}
+
+static void *vduse_queue_map_desc(VduseVirtq *vq, unsigned int idx, size_t sz)
+{
+ struct vring_desc *desc = vq->vring.desc;
+ VduseDev *dev = vq->dev;
+ uint64_t desc_addr, read_len;
+ unsigned int desc_len;
+ unsigned int max = vq->vring.num;
+ unsigned int i = idx;
+ VduseVirtqElement *elem;
+ struct iovec iov[VIRTQUEUE_MAX_SIZE];
+ struct vring_desc desc_buf[VIRTQUEUE_MAX_SIZE];
+ unsigned int out_num = 0, in_num = 0;
+ int rc;
+
+ if (le16toh(desc[i].flags) & VRING_DESC_F_INDIRECT) {
+ if (le32toh(desc[i].len) % sizeof(struct vring_desc)) {
+ fprintf(stderr, "Invalid size for indirect buffer table\n");
+ return NULL;
+ }
+
+ /* loop over the indirect descriptor table */
+ desc_addr = le64toh(desc[i].addr);
+ desc_len = le32toh(desc[i].len);
+ max = desc_len / sizeof(struct vring_desc);
+ read_len = desc_len;
+ desc = iova_to_va(dev, &read_len, desc_addr);
+ if (unlikely(desc && read_len != desc_len)) {
+ /* Failed to use zero copy */
+ desc = NULL;
+ if (!vduse_queue_read_indirect_desc(dev, desc_buf,
+ desc_addr,
+ desc_len)) {
+ desc = desc_buf;
+ }
+ }
+ if (!desc) {
+ fprintf(stderr, "Invalid indirect buffer table\n");
+ return NULL;
+ }
+ i = 0;
+ }
+
+ /* Collect all the descriptors */
+ do {
+ if (le16toh(desc[i].flags) & VRING_DESC_F_WRITE) {
+ if (!vduse_queue_map_single_desc(vq, &in_num, iov + out_num,
+ VIRTQUEUE_MAX_SIZE - out_num,
+ true, le64toh(desc[i].addr),
+ le32toh(desc[i].len))) {
+ return NULL;
+ }
+ } else {
+ if (in_num) {
+ fprintf(stderr, "Incorrect order for descriptors\n");
+ return NULL;
+ }
+ if (!vduse_queue_map_single_desc(vq, &out_num, iov,
+ VIRTQUEUE_MAX_SIZE, false,
+ le64toh(desc[i].addr),
+ le32toh(desc[i].len))) {
+ return NULL;
+ }
+ }
+
+ /* If we've got too many, that implies a descriptor loop. */
+ if ((in_num + out_num) > max) {
+ fprintf(stderr, "Looped descriptor\n");
+ return NULL;
+ }
+ rc = vduse_queue_read_next_desc(desc, i, max, &i);
+ } while (rc == VIRTQUEUE_READ_DESC_MORE);
+
+ if (rc == VIRTQUEUE_READ_DESC_ERROR) {
+ fprintf(stderr, "read descriptor error\n");
+ return NULL;
+ }
+
+ /* Now copy what we have collected and mapped */
+ elem = vduse_queue_alloc_element(sz, out_num, in_num);
+ if (!elem) {
+ fprintf(stderr, "read descriptor error\n");
+ return NULL;
+ }
+ elem->index = idx;
+ for (i = 0; i < out_num; i++) {
+ elem->out_sg[i] = iov[i];
+ }
+ for (i = 0; i < in_num; i++) {
+ elem->in_sg[i] = iov[out_num + i];
+ }
+
+ return elem;
+}
+
+void *vduse_queue_pop(VduseVirtq *vq, size_t sz)
+{
+ unsigned int head;
+ VduseVirtqElement *elem;
+ VduseDev *dev = vq->dev;
+
+ if (unlikely(!vq->vring.avail)) {
+ return NULL;
+ }
+
+ if (vduse_queue_empty(vq)) {
+ return NULL;
+ }
+ /* Needed after virtio_queue_empty() */
+ smp_rmb();
+
+ if (vq->inuse >= vq->vring.num) {
+ fprintf(stderr, "Virtqueue size exceeded: %d\n", vq->inuse);
+ return NULL;
+ }
+
+ if (!vduse_queue_get_head(vq, vq->last_avail_idx++, &head)) {
+ return NULL;
+ }
+
+ if (vduse_dev_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
+ vring_set_avail_event(vq, vq->last_avail_idx);
+ }
+
+ elem = vduse_queue_map_desc(vq, head, sz);
+
+ if (!elem) {
+ return NULL;
+ }
+
+ vq->inuse++;
+
+ return elem;
+}
+
+static inline void vring_used_write(VduseVirtq *vq,
+ struct vring_used_elem *uelem, int i)
+{
+ struct vring_used *used = vq->vring.used;
+
+ used->ring[i] = *uelem;
+}
+
+static void vduse_queue_fill(VduseVirtq *vq, const VduseVirtqElement *elem,
+ unsigned int len, unsigned int idx)
+{
+ struct vring_used_elem uelem;
+
+ if (unlikely(!vq->vring.used)) {
+ return;
+ }
+
+ idx = (idx + vq->used_idx) % vq->vring.num;
+
+ uelem.id = htole32(elem->index);
+ uelem.len = htole32(len);
+ vring_used_write(vq, &uelem, idx);
+}
+
+static inline void vring_used_idx_set(VduseVirtq *vq, uint16_t val)
+{
+ vq->vring.used->idx = htole16(val);
+ vq->used_idx = val;
+}
+
+static void vduse_queue_flush(VduseVirtq *vq, unsigned int count)
+{
+ uint16_t old, new;
+
+ if (unlikely(!vq->vring.used)) {
+ return;
+ }
+
+ /* Make sure buffer is written before we update index. */
+ smp_wmb();
+
+ old = vq->used_idx;
+ new = old + count;
+ vring_used_idx_set(vq, new);
+ vq->inuse -= count;
+ if (unlikely((int16_t)(new - vq->signalled_used) < (uint16_t)(new - old))) {
+ vq->signalled_used_valid = false;
+ }
+}
+
+void vduse_queue_push(VduseVirtq *vq, const VduseVirtqElement *elem,
+ unsigned int len)
+{
+ vduse_queue_fill(vq, elem, len, 0);
+ vduse_queue_flush(vq, 1);
+}
+
+static int vduse_queue_update_vring(VduseVirtq *vq, uint64_t desc_addr,
+ uint64_t avail_addr, uint64_t used_addr)
+{
+ struct VduseDev *dev = vq->dev;
+ uint64_t len;
+
+ len = sizeof(struct vring_desc);
+ vq->vring.desc = iova_to_va(dev, &len, desc_addr);
+ assert(len == sizeof(struct vring_desc));
+
+ len = sizeof(struct vring_avail);
+ vq->vring.avail = iova_to_va(dev, &len, avail_addr);
+ assert(len == sizeof(struct vring_avail));
+
+ len = sizeof(struct vring_used);
+ vq->vring.used = iova_to_va(dev, &len, used_addr);
+ assert(len == sizeof(struct vring_used));
+
+ if (!vq->vring.desc || !vq->vring.avail || !vq->vring.used) {
+ fprintf(stderr, "Failed to get vq[%d] iova mapping\n", vq->index);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void vduse_queue_enable(VduseVirtq *vq)
+{
+ struct VduseDev *dev = vq->dev;
+ struct vduse_vq_info vq_info;
+ struct vduse_vq_eventfd vq_eventfd;
+ int fd;
+
+ vq_info.index = vq->index;
+ if (ioctl(dev->fd, VDUSE_VQ_GET_INFO, &vq_info)) {
+ fprintf(stderr, "Failed to get vq[%d] info: %s\n",
+ vq->index, strerror(errno));
+ return;
+ }
+
+ if (!vq_info.ready) {
+ return;
+ }
+
+ vq->vring.num = vq_info.num;
+ vq->vring.desc_addr = vq_info.desc_addr;
+ vq->vring.avail_addr = vq_info.driver_addr;
+ vq->vring.used_addr = vq_info.device_addr;
+
+ if (vduse_queue_update_vring(vq, vq_info.desc_addr,
+ vq_info.driver_addr, vq_info.device_addr)) {
+ fprintf(stderr, "Failed to update vring for vq[%d]\n", vq->index);
+ return;
+ }
+
+ fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
+ if (fd < 0) {
+ fprintf(stderr, "Failed to init eventfd for vq[%d]\n", vq->index);
+ return;
+ }
+
+ vq_eventfd.index = vq->index;
+ vq_eventfd.fd = fd;
+ if (ioctl(dev->fd, VDUSE_VQ_SETUP_KICKFD, &vq_eventfd)) {
+ fprintf(stderr, "Failed to setup kick fd for vq[%d]\n", vq->index);
+ close(fd);
+ return;
+ }
+
+ vq->fd = fd;
+ vq->shadow_avail_idx = vq->last_avail_idx = vq_info.split.avail_index;
+ vq->inuse = 0;
+ vq->used_idx = 0;
+ vq->signalled_used_valid = false;
+ vq->ready = true;
+
+ dev->ops->enable_queue(dev, vq);
+}
+
+static void vduse_queue_disable(VduseVirtq *vq)
+{
+ struct VduseDev *dev = vq->dev;
+ struct vduse_vq_eventfd eventfd;
+
+ if (!vq->ready) {
+ return;
+ }
+
+ dev->ops->disable_queue(dev, vq);
+
+ eventfd.index = vq->index;
+ eventfd.fd = VDUSE_EVENTFD_DEASSIGN;
+ ioctl(dev->fd, VDUSE_VQ_SETUP_KICKFD, &eventfd);
+ close(vq->fd);
+
+ assert(vq->inuse == 0);
+
+ vq->vring.num = 0;
+ vq->vring.desc_addr = 0;
+ vq->vring.avail_addr = 0;
+ vq->vring.used_addr = 0;
+ vq->vring.desc = 0;
+ vq->vring.avail = 0;
+ vq->vring.used = 0;
+ vq->ready = false;
+ vq->fd = -1;
+}
+
+static void vduse_dev_start_dataplane(VduseDev *dev)
+{
+ int i;
+
+ if (ioctl(dev->fd, VDUSE_DEV_GET_FEATURES, &dev->features)) {
+ fprintf(stderr, "Failed to get features: %s\n", strerror(errno));
+ return;
+ }
+ assert(vduse_dev_has_feature(dev, VIRTIO_F_VERSION_1));
+
+ for (i = 0; i < dev->num_queues; i++) {
+ vduse_queue_enable(&dev->vqs[i]);
+ }
+}
+
+static void vduse_dev_stop_dataplane(VduseDev *dev)
+{
+ int i;
+
+ for (i = 0; i < dev->num_queues; i++) {
+ vduse_queue_disable(&dev->vqs[i]);
+ }
+ dev->features = 0;
+ vduse_iova_remove_region(dev, 0, ULONG_MAX);
+}
+
+int vduse_dev_handler(VduseDev *dev)
+{
+ struct vduse_dev_request req;
+ struct vduse_dev_response resp = { 0 };
+ VduseVirtq *vq;
+ int i, ret;
+
+ ret = read(dev->fd, &req, sizeof(req));
+ if (ret != sizeof(req)) {
+ fprintf(stderr, "Read request error [%d]: %s\n",
+ ret, strerror(errno));
+ return -errno;
+ }
+ resp.request_id = req.request_id;
+
+ switch (req.type) {
+ case VDUSE_GET_VQ_STATE:
+ vq = &dev->vqs[req.vq_state.index];
+ resp.vq_state.split.avail_index = vq->last_avail_idx;
+ resp.result = VDUSE_REQ_RESULT_OK;
+ break;
+ case VDUSE_SET_STATUS:
+ if (req.s.status & VIRTIO_CONFIG_S_DRIVER_OK) {
+ vduse_dev_start_dataplane(dev);
+ } else if (req.s.status == 0) {
+ vduse_dev_stop_dataplane(dev);
+ }
+ resp.result = VDUSE_REQ_RESULT_OK;
+ break;
+ case VDUSE_UPDATE_IOTLB:
+ /* The iova will be updated by iova_to_va() later, so just remove it */
+ vduse_iova_remove_region(dev, req.iova.start, req.iova.last);
+ for (i = 0; i < dev->num_queues; i++) {
+ VduseVirtq *vq = &dev->vqs[i];
+ if (vq->ready) {
+ if (vduse_queue_update_vring(vq, vq->vring.desc_addr,
+ vq->vring.avail_addr,
+ vq->vring.used_addr)) {
+ fprintf(stderr, "Failed to update vring for vq[%d]\n",
+ vq->index);
+ }
+ }
+ }
+ resp.result = VDUSE_REQ_RESULT_OK;
+ break;
+ default:
+ resp.result = VDUSE_REQ_RESULT_FAILED;
+ break;
+ }
+
+ ret = write(dev->fd, &resp, sizeof(resp));
+ if (ret != sizeof(resp)) {
+ fprintf(stderr, "Write request %d error [%d]: %s\n",
+ req.type, ret, strerror(errno));
+ return -errno;
+ }
+ return 0;
+}
+
+int vduse_dev_update_config(VduseDev *dev, uint32_t size,
+ uint32_t offset, char *buffer)
+{
+ int ret;
+ struct vduse_config_data *data;
+
+ data = malloc(offsetof(struct vduse_config_data, buffer) + size);
+ if (!data) {
+ return -ENOMEM;
+ }
+
+ data->offset = offset;
+ data->length = size;
+ memcpy(data->buffer, buffer, size);
+
+ ret = ioctl(dev->fd, VDUSE_DEV_SET_CONFIG, data);
+ free(data);
+
+ if (ret) {
+ return -errno;
+ }
+
+ if (ioctl(dev->fd, VDUSE_DEV_INJECT_CONFIG_IRQ)) {
+ return -errno;
+ }
+
+ return 0;
+}
+
+int vduse_dev_setup_queue(VduseDev *dev, int index, int max_size)
+{
+ VduseVirtq *vq = &dev->vqs[index];
+ struct vduse_vq_config vq_config = { 0 };
+
+ if (max_size > VIRTQUEUE_MAX_SIZE) {
+ return -EINVAL;
+ }
+
+ vq_config.index = vq->index;
+ vq_config.max_size = max_size;
+
+ if (ioctl(dev->fd, VDUSE_VQ_SETUP, &vq_config)) {
+ return -errno;
+ }
+
+ return 0;
+}
+
+static int vduse_dev_init_vqs(VduseDev *dev, uint16_t num_queues)
+{
+ VduseVirtq *vqs;
+ int i;
+
+ vqs = calloc(sizeof(VduseVirtq), num_queues);
+ if (!vqs) {
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < num_queues; i++) {
+ vqs[i].index = i;
+ vqs[i].dev = dev;
+ vqs[i].fd = -1;
+ }
+ dev->vqs = vqs;
+
+ return 0;
+}
+
+static int vduse_dev_init(VduseDev *dev, const char *name,
+ uint16_t num_queues, const VduseOps *ops,
+ void *priv)
+{
+ char *dev_path, *dev_name;
+ int ret, fd;
+
+ dev_path = malloc(strlen(name) + strlen("/dev/vduse/") + 1);
+ if (!dev_path) {
+ return -ENOMEM;
+ }
+ sprintf(dev_path, "/dev/vduse/%s", name);
+
+ fd = open(dev_path, O_RDWR);
+ free(dev_path);
+ if (fd < 0) {
+ fprintf(stderr, "Failed to open vduse dev %s: %s\n",
+ name, strerror(errno));
+ return -errno;
+ }
+
+ dev_name = strdup(name);
+ if (!dev_name) {
+ close(fd);
+ return -ENOMEM;
+ }
+
+ ret = vduse_dev_init_vqs(dev, num_queues);
+ if (ret) {
+ free(dev_name);
+ close(fd);
+ return ret;
+ }
+
+ dev->name = dev_name;
+ dev->num_queues = num_queues;
+ dev->fd = fd;
+ dev->ops = ops;
+ dev->priv = priv;
+
+ return 0;
+}
+
+static inline bool vduse_name_is_valid(const char *name)
+{
+ return strlen(name) >= VDUSE_NAME_MAX || strstr(name, "..");
+}
+
+VduseDev *vduse_dev_create_by_fd(int fd, uint16_t num_queues,
+ const VduseOps *ops, void *priv)
+{
+ VduseDev *dev;
+ int ret;
+
+ if (!ops || !ops->enable_queue || !ops->disable_queue) {
+ fprintf(stderr, "Invalid parameter for vduse\n");
+ return NULL;
+ }
+
+ dev = calloc(sizeof(VduseDev), 1);
+ if (!dev) {
+ fprintf(stderr, "Failed to allocate vduse device\n");
+ return NULL;
+ }
+
+ ret = vduse_dev_init_vqs(dev, num_queues);
+ if (ret) {
+ fprintf(stderr, "Failed to init vqs\n");
+ free(dev);
+ return NULL;
+ }
+
+ dev->num_queues = num_queues;
+ dev->fd = fd;
+ dev->ops = ops;
+ dev->priv = priv;
+
+ return dev;
+}
+
+VduseDev *vduse_dev_create_by_name(const char *name, uint16_t num_queues,
+ const VduseOps *ops, void *priv)
+{
+ VduseDev *dev;
+ int ret;
+
+ if (!name || vduse_name_is_valid(name) || !ops ||
+ !ops->enable_queue || !ops->disable_queue) {
+ fprintf(stderr, "Invalid parameter for vduse\n");
+ return NULL;
+ }
+
+ dev = calloc(sizeof(VduseDev), 1);
+ if (!dev) {
+ fprintf(stderr, "Failed to allocate vduse device\n");
+ return NULL;
+ }
+
+ ret = vduse_dev_init(dev, name, num_queues, ops, priv);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to init vduse device %s: %s\n",
+ name, strerror(ret));
+ free(dev);
+ return NULL;
+ }
+
+ return dev;
+}
+
+VduseDev *vduse_dev_create(const char *name, uint32_t device_id,
+ uint32_t vendor_id, uint64_t features,
+ uint16_t num_queues, uint32_t config_size,
+ char *config, const VduseOps *ops, void *priv)
+{
+ VduseDev *dev;
+ int ret, ctrl_fd;
+ uint64_t version;
+ struct vduse_dev_config *dev_config;
+ size_t size = offsetof(struct vduse_dev_config, config);
+
+ if (!name || vduse_name_is_valid(name) ||
+ !has_feature(features, VIRTIO_F_VERSION_1) || !config ||
+ !config_size || !ops || !ops->enable_queue || !ops->disable_queue) {
+ fprintf(stderr, "Invalid parameter for vduse\n");
+ return NULL;
+ }
+
+ dev = calloc(sizeof(VduseDev), 1);
+ if (!dev) {
+ fprintf(stderr, "Failed to allocate vduse device\n");
+ return NULL;
+ }
+
+ ctrl_fd = open("/dev/vduse/control", O_RDWR);
+ if (ctrl_fd < 0) {
+ fprintf(stderr, "Failed to open /dev/vduse/control: %s\n",
+ strerror(errno));
+ goto err_ctrl;
+ }
+
+ version = VDUSE_API_VERSION;
+ if (ioctl(ctrl_fd, VDUSE_SET_API_VERSION, &version)) {
+ fprintf(stderr, "Failed to set api version %lu: %s\n",
+ version, strerror(errno));
+ goto err_dev;
+ }
+
+ dev_config = calloc(size + config_size, 1);
+ if (!dev_config) {
+ fprintf(stderr, "Failed to allocate config space\n");
+ goto err_dev;
+ }
+
+ strcpy(dev_config->name, name);
+ dev_config->device_id = device_id;
+ dev_config->vendor_id = vendor_id;
+ dev_config->features = features;
+ dev_config->vq_num = num_queues;
+ dev_config->vq_align = VDUSE_VQ_ALIGN;
+ dev_config->config_size = config_size;
+ memcpy(dev_config->config, config, config_size);
+
+ ret = ioctl(ctrl_fd, VDUSE_CREATE_DEV, dev_config);
+ free(dev_config);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to create vduse device %s: %s\n",
+ name, strerror(errno));
+ goto err_dev;
+ }
+ dev->ctrl_fd = ctrl_fd;
+
+ ret = vduse_dev_init(dev, name, num_queues, ops, priv);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to init vduse device %s: %s\n",
+ name, strerror(ret));
+ goto err;
+ }
+
+ return dev;
+err:
+ ioctl(ctrl_fd, VDUSE_DESTROY_DEV, name);
+err_dev:
+ close(ctrl_fd);
+err_ctrl:
+ free(dev);
+
+ return NULL;
+}
+
+int vduse_dev_destroy(VduseDev *dev)
+{
+ int ret = 0;
+
+ free(dev->vqs);
+ if (dev->fd > 0) {
+ close(dev->fd);
+ dev->fd = -1;
+ }
+ if (dev->ctrl_fd > 0) {
+ if (ioctl(dev->ctrl_fd, VDUSE_DESTROY_DEV, dev->name)) {
+ ret = -errno;
+ }
+ close(dev->ctrl_fd);
+ dev->ctrl_fd = -1;
+ }
+ free(dev->name);
+ free(dev);
+
+ return ret;
+}
diff --git a/subprojects/libvduse/libvduse.h b/subprojects/libvduse/libvduse.h
new file mode 100644
index 0000000000..6c2fe98213
--- /dev/null
+++ b/subprojects/libvduse/libvduse.h
@@ -0,0 +1,235 @@
+/*
+ * VDUSE (vDPA Device in Userspace) library
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved.
+ *
+ * Author:
+ * Xie Yongji <xieyongji@bytedance.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+
+#ifndef LIBVDUSE_H
+#define LIBVDUSE_H
+
+#include <stdint.h>
+#include <sys/uio.h>
+
+#define VIRTQUEUE_MAX_SIZE 1024
+
+/* VDUSE device structure */
+typedef struct VduseDev VduseDev;
+
+/* Virtqueue structure */
+typedef struct VduseVirtq VduseVirtq;
+
+/* Some operation of VDUSE backend */
+typedef struct VduseOps {
+ /* Called when virtqueue can be processed */
+ void (*enable_queue)(VduseDev *dev, VduseVirtq *vq);
+ /* Called when virtqueue processing should be stopped */
+ void (*disable_queue)(VduseDev *dev, VduseVirtq *vq);
+} VduseOps;
+
+/* Describing elements of the I/O buffer */
+typedef struct VduseVirtqElement {
+ /* Descriptor table index */
+ unsigned int index;
+ /* Number of physically-contiguous device-readable descriptors */
+ unsigned int out_num;
+ /* Number of physically-contiguous device-writable descriptors */
+ unsigned int in_num;
+ /* Array to store physically-contiguous device-writable descriptors */
+ struct iovec *in_sg;
+ /* Array to store physically-contiguous device-readable descriptors */
+ struct iovec *out_sg;
+} VduseVirtqElement;
+
+
+/**
+ * vduse_get_virtio_features:
+ *
+ * Get supported virtio features
+ *
+ * Returns: supported feature bits
+ */
+uint64_t vduse_get_virtio_features(void);
+
+/**
+ * vduse_queue_get_dev:
+ * @vq: specified virtqueue
+ *
+ * Get corresponding VDUSE device from the virtqueue.
+ *
+ * Returns: a pointer to VDUSE device on success, NULL on failure.
+ */
+VduseDev *vduse_queue_get_dev(VduseVirtq *vq);
+
+/**
+ * vduse_queue_get_fd:
+ * @vq: specified virtqueue
+ *
+ * Get the kick fd for the virtqueue.
+ *
+ * Returns: file descriptor on success, -1 on failure.
+ */
+int vduse_queue_get_fd(VduseVirtq *vq);
+
+/**
+ * vduse_queue_pop:
+ * @vq: specified virtqueue
+ * @sz: the size of struct to return (must be >= VduseVirtqElement)
+ *
+ * Pop an element from virtqueue available ring.
+ *
+ * Returns: a pointer to a structure containing VduseVirtqElement on success,
+ * NULL on failure.
+ */
+void *vduse_queue_pop(VduseVirtq *vq, size_t sz);
+
+/**
+ * vduse_queue_push:
+ * @vq: specified virtqueue
+ * @elem: pointer to VduseVirtqElement returned by vduse_queue_pop()
+ * @len: length in bytes to write
+ *
+ * Push an element to virtqueue used ring.
+ */
+void vduse_queue_push(VduseVirtq *vq, const VduseVirtqElement *elem,
+ unsigned int len);
+/**
+ * vduse_queue_notify:
+ * @vq: specified virtqueue
+ *
+ * Request to notify the queue.
+ */
+void vduse_queue_notify(VduseVirtq *vq);
+
+/**
+ * vduse_dev_get_priv:
+ * @dev: VDUSE device
+ *
+ * Get the private pointer passed to vduse_dev_create().
+ *
+ * Returns: private pointer on success, NULL on failure.
+ */
+void *vduse_dev_get_priv(VduseDev *dev);
+
+/**
+ * vduse_dev_get_queue:
+ * @dev: VDUSE device
+ * @index: virtqueue index
+ *
+ * Get the specified virtqueue.
+ *
+ * Returns: a pointer to the virtqueue on success, NULL on failure.
+ */
+VduseVirtq *vduse_dev_get_queue(VduseDev *dev, int index);
+
+/**
+ * vduse_dev_get_fd:
+ * @dev: VDUSE device
+ *
+ * Get the control message fd for the VDUSE device.
+ *
+ * Returns: file descriptor on success, -1 on failure.
+ */
+int vduse_dev_get_fd(VduseDev *dev);
+
+/**
+ * vduse_dev_handler:
+ * @dev: VDUSE device
+ *
+ * Used to process the control message.
+ *
+ * Returns: file descriptor on success, -errno on failure.
+ */
+int vduse_dev_handler(VduseDev *dev);
+
+/**
+ * vduse_dev_update_config:
+ * @dev: VDUSE device
+ * @size: the size to write to configuration space
+ * @offset: the offset from the beginning of configuration space
+ * @buffer: the buffer used to write from
+ *
+ * Update device configuration space and inject a config interrupt.
+ *
+ * Returns: 0 on success, -errno on failure.
+ */
+int vduse_dev_update_config(VduseDev *dev, uint32_t size,
+ uint32_t offset, char *buffer);
+
+/**
+ * vduse_dev_setup_queue:
+ * @dev: VDUSE device
+ * @index: virtqueue index
+ * @max_size: the max size of virtqueue
+ *
+ * Setup the specified virtqueue.
+ *
+ * Returns: 0 on success, -errno on failure.
+ */
+int vduse_dev_setup_queue(VduseDev *dev, int index, int max_size);
+
+/**
+ * vduse_dev_create_by_fd:
+ * @fd: passed file descriptor
+ * @num_queues: the number of virtqueues
+ * @ops: the operation of VDUSE backend
+ * @priv: private pointer
+ *
+ * Create VDUSE device from a passed file descriptor.
+ *
+ * Returns: pointer to VDUSE device on success, NULL on failure.
+ */
+VduseDev *vduse_dev_create_by_fd(int fd, uint16_t num_queues,
+ const VduseOps *ops, void *priv);
+
+/**
+ * vduse_dev_create_by_name:
+ * @name: VDUSE device name
+ * @num_queues: the number of virtqueues
+ * @ops: the operation of VDUSE backend
+ * @priv: private pointer
+ *
+ * Create VDUSE device on /dev/vduse/$NAME.
+ *
+ * Returns: pointer to VDUSE device on success, NULL on failure.
+ */
+VduseDev *vduse_dev_create_by_name(const char *name, uint16_t num_queues,
+ const VduseOps *ops, void *priv);
+
+/**
+ * vduse_dev_create:
+ * @name: VDUSE device name
+ * @device_id: virtio device id
+ * @vendor_id: virtio vendor id
+ * @features: virtio features
+ * @num_queues: the number of virtqueues
+ * @config_size: the size of the configuration space
+ * @config: the buffer of the configuration space
+ * @ops: the operation of VDUSE backend
+ * @priv: private pointer
+ *
+ * Create VDUSE device.
+ *
+ * Returns: pointer to VDUSE device on success, NULL on failure.
+ */
+VduseDev *vduse_dev_create(const char *name, uint32_t device_id,
+ uint32_t vendor_id, uint64_t features,
+ uint16_t num_queues, uint32_t config_size,
+ char *config, const VduseOps *ops, void *priv);
+
+/**
+ * vduse_dev_destroy:
+ * @dev: VDUSE device
+ *
+ * Destroy the VDUSE device.
+ *
+ * Returns: 0 on success, -errno on failure.
+ */
+int vduse_dev_destroy(VduseDev *dev);
+
+#endif
--git a/subprojects/libvduse/linux-headers/linux b/subprojects/libvduse/linux-headers/linux
new file mode 120000
index 0000000000..04f3304f79
--- /dev/null
+++ b/subprojects/libvduse/linux-headers/linux
@@ -0,0 +1 @@
+../../../linux-headers/linux/
\ No newline at end of file
diff --git a/subprojects/libvduse/meson.build b/subprojects/libvduse/meson.build
new file mode 100644
index 0000000000..ba08f5ee1a
--- /dev/null
+++ b/subprojects/libvduse/meson.build
@@ -0,0 +1,10 @@
+project('libvduse', 'c',
+ license: 'GPL-2.0-or-later',
+ default_options: ['c_std=gnu99'])
+
+libvduse = static_library('vduse',
+ files('libvduse.c'),
+ c_args: '-D_GNU_SOURCE')
+
+libvduse_dep = declare_dependency(link_with: libvduse,
+ include_directories: include_directories('.'))
diff --git a/subprojects/libvduse/standard-headers/linux b/subprojects/libvduse/standard-headers/linux
new file mode 120000
index 0000000000..c416f068ac
--- /dev/null
+++ b/subprojects/libvduse/standard-headers/linux
@@ -0,0 +1 @@
+../../../include/standard-headers/linux/
\ No newline at end of file
--
2.20.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 4/6] vduse-blk: implements vduse-blk export
2022-04-06 7:59 [PATCH v4 0/6] Support exporting BDSs via VDUSE Xie Yongji
` (2 preceding siblings ...)
2022-04-06 7:59 ` [PATCH v4 3/6] libvduse: Add VDUSE (vDPA Device in Userspace) library Xie Yongji
@ 2022-04-06 7:59 ` Xie Yongji
2022-04-26 17:03 ` Kevin Wolf
2022-04-06 7:59 ` [PATCH v4 5/6] vduse-blk: Add vduse-blk resize support Xie Yongji
2022-04-06 7:59 ` [PATCH v4 6/6] libvduse: Add support for reconnecting Xie Yongji
5 siblings, 1 reply; 11+ messages in thread
From: Xie Yongji @ 2022-04-06 7:59 UTC (permalink / raw)
To: mst, jasowang, stefanha, sgarzare, kwolf, mreitz, mlureau, jsnow, eblake
Cc: qemu-devel, qemu-block
This implements a VDUSE block backends based on
the libvduse library. We can use it to export the BDSs
for both VM and container (host) usage.
The new command-line syntax is:
$ qemu-storage-daemon \
--blockdev file,node-name=drive0,filename=test.img \
--export vduse-blk,node-name=drive0,id=vduse-export0,writable=on
After the qemu-storage-daemon started, we need to use
the "vdpa" command to attach the device to vDPA bus:
$ vdpa dev add name vduse-export0 mgmtdev vduse
Also the device must be removed via the "vdpa" command
before we stop the qemu-storage-daemon.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
MAINTAINERS | 4 +-
block/export/export.c | 6 +
block/export/meson.build | 5 +
block/export/vduse-blk.c | 425 ++++++++++++++++++++++++++++++++++
block/export/vduse-blk.h | 20 ++
meson.build | 13 ++
meson_options.txt | 2 +
qapi/block-export.json | 25 +-
scripts/meson-buildoptions.sh | 4 +
9 files changed, 501 insertions(+), 3 deletions(-)
create mode 100644 block/export/vduse-blk.c
create mode 100644 block/export/vduse-blk.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 53a14bf7a8..9d9f68479f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3547,10 +3547,12 @@ L: qemu-block@nongnu.org
S: Supported
F: block/export/fuse.c
-VDUSE library
+VDUSE library and block device exports
M: Xie Yongji <xieyongji@bytedance.com>
S: Maintained
F: subprojects/libvduse/
+F: block/export/vduse-blk.c
+F: block/export/vduse-blk.h
Replication
M: Wen Congyang <wencongyang2@huawei.com>
diff --git a/block/export/export.c b/block/export/export.c
index 7253af3bc3..4744862915 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -26,6 +26,9 @@
#ifdef CONFIG_VHOST_USER_BLK_SERVER
#include "vhost-user-blk-server.h"
#endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+#include "vduse-blk.h"
+#endif
static const BlockExportDriver *blk_exp_drivers[] = {
&blk_exp_nbd,
@@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] = {
#ifdef CONFIG_FUSE
&blk_exp_fuse,
#endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+ &blk_exp_vduse_blk,
+#endif
};
/* Only accessed from the main thread */
diff --git a/block/export/meson.build b/block/export/meson.build
index 0a08e384c7..cf311d2b1b 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -5,3 +5,8 @@ if have_vhost_user_blk_server
endif
blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
+
+if have_vduse_blk_export
+ blockdev_ss.add(files('vduse-blk.c'))
+ blockdev_ss.add(libvduse)
+endif
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
new file mode 100644
index 0000000000..3f4e0df34b
--- /dev/null
+++ b/block/export/vduse-blk.c
@@ -0,0 +1,425 @@
+/*
+ * Export QEMU block device via VDUSE
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved.
+ * Portions of codes and concepts borrowed from vhost-user-blk-server.c, so:
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Author:
+ * Xie Yongji <xieyongji@bytedance.com>
+ * Coiby Xu <coiby.xu@gmail.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+
+#include <sys/eventfd.h>
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend.h"
+#include "block/export.h"
+#include "qemu/error-report.h"
+#include "util/block-helpers.h"
+#include "subprojects/libvduse/libvduse.h"
+
+#include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/virtio_blk.h"
+
+#define VIRTIO_BLK_SECTOR_BITS 9
+#define VIRTIO_BLK_SECTOR_SIZE (1ULL << VIRTIO_BLK_SECTOR_BITS)
+
+#define VDUSE_DEFAULT_NUM_QUEUE 1
+#define VDUSE_DEFAULT_QUEUE_SIZE 256
+
+typedef struct VduseBlkExport {
+ BlockExport export;
+ VduseDev *dev;
+ uint16_t num_queues;
+ uint32_t blk_size;
+ bool writable;
+} VduseBlkExport;
+
+struct virtio_blk_inhdr {
+ unsigned char status;
+};
+
+typedef struct VduseBlkReq {
+ VduseVirtqElement elem;
+ int64_t sector_num;
+ size_t in_len;
+ struct virtio_blk_inhdr *in;
+ struct virtio_blk_outhdr out;
+ VduseVirtq *vq;
+} VduseBlkReq;
+
+static void vduse_blk_req_complete(VduseBlkReq *req)
+{
+ vduse_queue_push(req->vq, &req->elem, req->in_len);
+ vduse_queue_notify(req->vq);
+
+ free(req);
+}
+
+static bool vduse_blk_sect_range_ok(VduseBlkExport *vblk_exp,
+ uint64_t sector, size_t size)
+{
+ uint64_t nb_sectors;
+ uint64_t total_sectors;
+
+ if (size % VIRTIO_BLK_SECTOR_SIZE) {
+ return false;
+ }
+
+ nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
+
+ QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
+ if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+ return false;
+ }
+ if ((sector << VIRTIO_BLK_SECTOR_BITS) % vblk_exp->blk_size) {
+ return false;
+ }
+ blk_get_geometry(vblk_exp->export.blk, &total_sectors);
+ if (sector > total_sectors || nb_sectors > total_sectors - sector) {
+ return false;
+ }
+ return true;
+}
+
+static void coroutine_fn vduse_blk_virtio_process_req(void *opaque)
+{
+ VduseBlkReq *req = opaque;
+ VduseVirtq *vq = req->vq;
+ VduseDev *dev = vduse_queue_get_dev(vq);
+ VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+ BlockBackend *blk = vblk_exp->export.blk;
+ VduseVirtqElement *elem = &req->elem;
+ struct iovec *in_iov = elem->in_sg;
+ struct iovec *out_iov = elem->out_sg;
+ unsigned in_num = elem->in_num;
+ unsigned out_num = elem->out_num;
+ uint32_t type;
+
+ if (elem->out_num < 1 || elem->in_num < 1) {
+ error_report("virtio-blk request missing headers");
+ goto err;
+ }
+
+ if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
+ sizeof(req->out)) != sizeof(req->out))) {
+ error_report("virtio-blk request outhdr too short");
+ goto err;
+ }
+
+ iov_discard_front(&out_iov, &out_num, sizeof(req->out));
+
+ if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
+ error_report("virtio-blk request inhdr too short");
+ goto err;
+ }
+
+ /* We always touch the last byte, so just see how big in_iov is. */
+ req->in_len = iov_size(in_iov, in_num);
+ req->in = (void *)in_iov[in_num - 1].iov_base
+ + in_iov[in_num - 1].iov_len
+ - sizeof(struct virtio_blk_inhdr);
+ iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
+
+ type = le32_to_cpu(req->out.type);
+ switch (type & ~VIRTIO_BLK_T_BARRIER) {
+ case VIRTIO_BLK_T_IN:
+ case VIRTIO_BLK_T_OUT: {
+ QEMUIOVector qiov;
+ int64_t offset;
+ ssize_t ret = 0;
+ bool is_write = type & VIRTIO_BLK_T_OUT;
+ req->sector_num = le64_to_cpu(req->out.sector);
+
+ if (is_write && !vblk_exp->writable) {
+ req->in->status = VIRTIO_BLK_S_IOERR;
+ break;
+ }
+
+ if (is_write) {
+ qemu_iovec_init_external(&qiov, out_iov, out_num);
+ } else {
+ qemu_iovec_init_external(&qiov, in_iov, in_num);
+ }
+
+ if (unlikely(!vduse_blk_sect_range_ok(vblk_exp,
+ req->sector_num,
+ qiov.size))) {
+ req->in->status = VIRTIO_BLK_S_IOERR;
+ break;
+ }
+
+ offset = req->sector_num << VIRTIO_BLK_SECTOR_BITS;
+
+ if (is_write) {
+ ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0);
+ } else {
+ ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0);
+ }
+ if (ret >= 0) {
+ req->in->status = VIRTIO_BLK_S_OK;
+ } else {
+ req->in->status = VIRTIO_BLK_S_IOERR;
+ }
+ break;
+ }
+ case VIRTIO_BLK_T_FLUSH:
+ if (blk_co_flush(blk) == 0) {
+ req->in->status = VIRTIO_BLK_S_OK;
+ } else {
+ req->in->status = VIRTIO_BLK_S_IOERR;
+ }
+ break;
+ case VIRTIO_BLK_T_GET_ID: {
+ size_t size = MIN(strlen(vblk_exp->export.id) + 1,
+ MIN(iov_size(in_iov, in_num),
+ VIRTIO_BLK_ID_BYTES));
+ iov_from_buf(in_iov, in_num, 0, vblk_exp->export.id, size);
+ req->in->status = VIRTIO_BLK_S_OK;
+ break;
+ }
+ default:
+ req->in->status = VIRTIO_BLK_S_UNSUPP;
+ break;
+ }
+
+ vduse_blk_req_complete(req);
+ return;
+
+err:
+ free(req);
+}
+
+static void vduse_blk_vq_handler(VduseDev *dev, VduseVirtq *vq)
+{
+ while (1) {
+ VduseBlkReq *req;
+
+ req = vduse_queue_pop(vq, sizeof(VduseBlkReq));
+ if (!req) {
+ break;
+ }
+ req->vq = vq;
+
+ Coroutine *co =
+ qemu_coroutine_create(vduse_blk_virtio_process_req, req);
+ qemu_coroutine_enter(co);
+ }
+}
+
+static void on_vduse_vq_kick(void *opaque)
+{
+ VduseVirtq *vq = opaque;
+ VduseDev *dev = vduse_queue_get_dev(vq);
+ int fd = vduse_queue_get_fd(vq);
+ eventfd_t kick_data;
+
+ if (eventfd_read(fd, &kick_data) == -1) {
+ error_report("failed to read data from eventfd");
+ return;
+ }
+
+ vduse_blk_vq_handler(dev, vq);
+}
+
+static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
+{
+ VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+
+ aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
+ true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+}
+
+static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
+{
+ VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+
+ aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
+ true, NULL, NULL, NULL, NULL, NULL);
+}
+
+static const VduseOps vduse_blk_ops = {
+ .enable_queue = vduse_blk_enable_queue,
+ .disable_queue = vduse_blk_disable_queue,
+};
+
+static void on_vduse_dev_kick(void *opaque)
+{
+ VduseDev *dev = opaque;
+
+ vduse_dev_handler(dev);
+}
+
+static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
+{
+ int i;
+
+ aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
+ true, on_vduse_dev_kick, NULL, NULL, NULL,
+ vblk_exp->dev);
+
+ for (i = 0; i < vblk_exp->num_queues; i++) {
+ VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
+ int fd = vduse_queue_get_fd(vq);
+
+ if (fd < 0) {
+ continue;
+ }
+ aio_set_fd_handler(vblk_exp->export.ctx, fd, true,
+ on_vduse_vq_kick, NULL, NULL, NULL, vq);
+ }
+}
+
+static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp)
+{
+ int i;
+
+ for (i = 0; i < vblk_exp->num_queues; i++) {
+ VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
+ int fd = vduse_queue_get_fd(vq);
+
+ if (fd < 0) {
+ continue;
+ }
+ aio_set_fd_handler(vblk_exp->export.ctx, fd,
+ true, NULL, NULL, NULL, NULL, NULL);
+ }
+ aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
+ true, NULL, NULL, NULL, NULL, NULL);
+}
+
+
+static void blk_aio_attached(AioContext *ctx, void *opaque)
+{
+ VduseBlkExport *vblk_exp = opaque;
+
+ vblk_exp->export.ctx = ctx;
+ vduse_blk_attach_ctx(vblk_exp, ctx);
+}
+
+static void blk_aio_detach(void *opaque)
+{
+ VduseBlkExport *vblk_exp = opaque;
+
+ vduse_blk_detach_ctx(vblk_exp);
+ vblk_exp->export.ctx = NULL;
+}
+
+static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
+ Error **errp)
+{
+ VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+ BlockExportOptionsVduseBlk *vblk_opts = &opts->u.vduse_blk;
+ uint64_t logical_block_size = VIRTIO_BLK_SECTOR_SIZE;
+ uint16_t num_queues = VDUSE_DEFAULT_NUM_QUEUE;
+ uint16_t queue_size = VDUSE_DEFAULT_QUEUE_SIZE;
+ Error *local_err = NULL;
+ struct virtio_blk_config config = { 0 };
+ uint64_t features;
+ int i;
+
+ if (vblk_opts->has_num_queues) {
+ num_queues = vblk_opts->num_queues;
+ if (num_queues == 0) {
+ error_setg(errp, "num-queues must be greater than 0");
+ return -EINVAL;
+ }
+ }
+
+ if (vblk_opts->has_queue_size) {
+ queue_size = vblk_opts->queue_size;
+ if (queue_size <= 2 || !is_power_of_2(queue_size) ||
+ queue_size > VIRTQUEUE_MAX_SIZE) {
+ error_setg(errp, "queue-size is invalid");
+ return -EINVAL;
+ }
+ }
+
+ if (vblk_opts->has_logical_block_size) {
+ logical_block_size = vblk_opts->logical_block_size;
+ check_block_size(exp->id, "logical-block-size", logical_block_size,
+ &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return -EINVAL;
+ }
+ }
+ blk_set_guest_block_size(exp->blk, logical_block_size);
+
+ vblk_exp->blk_size = logical_block_size;
+ vblk_exp->writable = opts->writable;
+ vblk_exp->num_queues = num_queues;
+
+ config.capacity =
+ cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
+ config.seg_max = cpu_to_le32(queue_size - 2);
+ config.size_max = cpu_to_le32(0);
+ config.min_io_size = cpu_to_le16(1);
+ config.opt_io_size = cpu_to_le32(1);
+ config.num_queues = cpu_to_le16(num_queues);
+ config.blk_size = cpu_to_le32(logical_block_size);
+
+ features = vduse_get_virtio_features() |
+ (1ULL << VIRTIO_BLK_F_SIZE_MAX) |
+ (1ULL << VIRTIO_BLK_F_SEG_MAX) |
+ (1ULL << VIRTIO_BLK_F_TOPOLOGY) |
+ (1ULL << VIRTIO_BLK_F_BLK_SIZE);
+
+ if (num_queues > 1) {
+ features |= 1ULL << VIRTIO_BLK_F_MQ;
+ }
+ if (!vblk_exp->writable) {
+ features |= 1ULL << VIRTIO_BLK_F_RO;
+ }
+
+ vblk_exp->dev = vduse_dev_create(exp->id, VIRTIO_ID_BLOCK, 0,
+ features, num_queues,
+ sizeof(struct virtio_blk_config),
+ (char *)&config, &vduse_blk_ops,
+ vblk_exp);
+ if (!vblk_exp->dev) {
+ error_setg(errp, "failed to create vduse device");
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < num_queues; i++) {
+ vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
+ }
+
+ aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), true,
+ on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev);
+
+ blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
+ vblk_exp);
+
+ return 0;
+}
+
+static void vduse_blk_exp_delete(BlockExport *exp)
+{
+ VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
+ vblk_exp);
+ vduse_dev_destroy(vblk_exp->dev);
+}
+
+static void vduse_blk_exp_request_shutdown(BlockExport *exp)
+{
+ VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+
+ vduse_blk_detach_ctx(vblk_exp);
+}
+
+const BlockExportDriver blk_exp_vduse_blk = {
+ .type = BLOCK_EXPORT_TYPE_VDUSE_BLK,
+ .instance_size = sizeof(VduseBlkExport),
+ .create = vduse_blk_exp_create,
+ .delete = vduse_blk_exp_delete,
+ .request_shutdown = vduse_blk_exp_request_shutdown,
+};
diff --git a/block/export/vduse-blk.h b/block/export/vduse-blk.h
new file mode 100644
index 0000000000..c4eeb1b70e
--- /dev/null
+++ b/block/export/vduse-blk.h
@@ -0,0 +1,20 @@
+/*
+ * Export QEMU block device via VDUSE
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved.
+ *
+ * Author:
+ * Xie Yongji <xieyongji@bytedance.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+
+#ifndef VDUSE_BLK_H
+#define VDUSE_BLK_H
+
+#include "block/export.h"
+
+extern const BlockExportDriver blk_exp_vduse_blk;
+
+#endif /* VDUSE_BLK_H */
diff --git a/meson.build b/meson.build
index 5c71904461..4a079f05f8 100644
--- a/meson.build
+++ b/meson.build
@@ -1366,6 +1366,17 @@ if have_libvduse
libvduse = libvduse_proj.get_variable('libvduse_dep')
endif
+have_vduse_blk_export = (have_libvduse and targetos == 'linux')
+if get_option('vduse_blk_export').enabled()
+ if targetos != 'linux'
+ error('vduse_blk_export requires linux')
+ elif not have_libvduse
+ error('vduse_blk_export requires libvduse support')
+ endif
+elif get_option('vduse_blk_export').disabled()
+ have_vduse_blk_export = false
+endif
+
# libbpf
libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
if libbpf.found() and not cc.links('''
@@ -1569,6 +1580,7 @@ config_host_data.set('CONFIG_TPM', have_tpm)
config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
config_host_data.set('CONFIG_VDE', vde.found())
config_host_data.set('CONFIG_VHOST_USER_BLK_SERVER', have_vhost_user_blk_server)
+config_host_data.set('CONFIG_VDUSE_BLK_EXPORT', have_vduse_blk_export)
config_host_data.set('CONFIG_VNC', vnc.found())
config_host_data.set('CONFIG_VNC_JPEG', jpeg.found())
config_host_data.set('CONFIG_VNC_PNG', png.found())
@@ -3596,6 +3608,7 @@ if have_block
summary_info += {'qed support': get_option('qed').allowed()}
summary_info += {'parallels support': get_option('parallels').allowed()}
summary_info += {'FUSE exports': fuse}
+ summary_info += {'VDUSE block exports': have_vduse_blk_export}
endif
summary(summary_info, bool_yn: true, section: 'Block layer support')
diff --git a/meson_options.txt b/meson_options.txt
index e25af3277d..60c680134a 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -221,6 +221,8 @@ option('virtiofsd', type: 'feature', value: 'auto',
description: 'build virtiofs daemon (virtiofsd)')
option('libvduse', type: 'feature', value: 'auto',
description: 'build VDUSE Library')
+option('vduse_blk_export', type: 'feature', value: 'auto',
+ description: 'VDUSE block export support')
option('capstone', type: 'combo', value: 'auto',
choices: ['disabled', 'enabled', 'auto', 'system', 'internal'],
diff --git a/qapi/block-export.json b/qapi/block-export.json
index f183522d0d..8d194e90c0 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -170,6 +170,23 @@
'*allow-other': 'FuseExportAllowOther' },
'if': 'CONFIG_FUSE' }
+##
+# @BlockExportOptionsVduseBlk:
+#
+# A vduse-blk block export.
+#
+# @num-queues: the number of virtqueues. Defaults to 1.
+# @queue-size: the size of virtqueue. Defaults to 256.
+# @logical-block-size: Logical block size in bytes. Range [512, PAGE_SIZE]
+# and must be power of 2. Defaults to 512 bytes.
+#
+# Since: 7.1
+##
+{ 'struct': 'BlockExportOptionsVduseBlk',
+ 'data': { '*num-queues': 'uint16',
+ '*queue-size': 'uint16',
+ '*logical-block-size': 'size'} }
+
##
# @NbdServerAddOptions:
#
@@ -273,6 +290,7 @@
# @nbd: NBD export
# @vhost-user-blk: vhost-user-blk export (since 5.2)
# @fuse: FUSE export (since: 6.0)
+# @vduse-blk: vduse-blk export (since 7.1)
#
# Since: 4.2
##
@@ -280,7 +298,8 @@
'data': [ 'nbd',
{ 'name': 'vhost-user-blk',
'if': 'CONFIG_VHOST_USER_BLK_SERVER' },
- { 'name': 'fuse', 'if': 'CONFIG_FUSE' } ] }
+ { 'name': 'fuse', 'if': 'CONFIG_FUSE' },
+ { 'name': 'vduse-blk', 'if': 'CONFIG_VDUSE_BLK_EXPORT' } ] }
##
# @BlockExportOptions:
@@ -324,7 +343,9 @@
'vhost-user-blk': { 'type': 'BlockExportOptionsVhostUserBlk',
'if': 'CONFIG_VHOST_USER_BLK_SERVER' },
'fuse': { 'type': 'BlockExportOptionsFuse',
- 'if': 'CONFIG_FUSE' }
+ 'if': 'CONFIG_FUSE' },
+ 'vduse-blk': { 'type': 'BlockExportOptionsVduseBlk',
+ 'if': 'CONFIG_VDUSE_BLK_EXPORT' }
} }
##
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index ccab9ca9da..162f362243 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -117,6 +117,8 @@ meson_options_help() {
printf "%s\n" ' usb-redir libusbredir support'
printf "%s\n" ' vde vde network backend support'
printf "%s\n" ' vdi vdi image format support'
+ printf "%s\n" ' vduse-blk-export'
+ printf "%s\n" ' VDUSE block export support'
printf "%s\n" ' vhost-user-blk-server'
printf "%s\n" ' build vhost-user-blk server'
printf "%s\n" ' virglrenderer virgl rendering support'
@@ -338,6 +340,8 @@ _meson_option_parse() {
--disable-vde) printf "%s" -Dvde=disabled ;;
--enable-vdi) printf "%s" -Dvdi=enabled ;;
--disable-vdi) printf "%s" -Dvdi=disabled ;;
+ --enable-vduse-blk-export) printf "%s" -Dvduse_blk_export=enabled ;;
+ --disable-vduse-blk-export) printf "%s" -Dvduse_blk_export=disabled ;;
--enable-vhost-user-blk-server) printf "%s" -Dvhost_user_blk_server=enabled ;;
--disable-vhost-user-blk-server) printf "%s" -Dvhost_user_blk_server=disabled ;;
--enable-virglrenderer) printf "%s" -Dvirglrenderer=enabled ;;
--
2.20.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 5/6] vduse-blk: Add vduse-blk resize support
2022-04-06 7:59 [PATCH v4 0/6] Support exporting BDSs via VDUSE Xie Yongji
` (3 preceding siblings ...)
2022-04-06 7:59 ` [PATCH v4 4/6] vduse-blk: implements vduse-blk export Xie Yongji
@ 2022-04-06 7:59 ` Xie Yongji
2022-04-06 7:59 ` [PATCH v4 6/6] libvduse: Add support for reconnecting Xie Yongji
5 siblings, 0 replies; 11+ messages in thread
From: Xie Yongji @ 2022-04-06 7:59 UTC (permalink / raw)
To: mst, jasowang, stefanha, sgarzare, kwolf, mreitz, mlureau, jsnow, eblake
Cc: qemu-devel, qemu-block
To support block resize, this uses vduse_dev_update_config()
to update the capacity field in configuration space and inject
config interrupt on the block resize callback.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/export/vduse-blk.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 3f4e0df34b..e027b2e5ff 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -310,6 +310,23 @@ static void blk_aio_detach(void *opaque)
vblk_exp->export.ctx = NULL;
}
+static void vduse_blk_resize(void *opaque)
+{
+ BlockExport *exp = opaque;
+ VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+ struct virtio_blk_config config;
+
+ config.capacity =
+ cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
+ vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity),
+ offsetof(struct virtio_blk_config, capacity),
+ (char *)&config.capacity);
+}
+
+static const BlockDevOps vduse_block_ops = {
+ .resize_cb = vduse_blk_resize,
+};
+
static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
Error **errp)
{
@@ -397,6 +414,8 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
vblk_exp);
+ blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
+
return 0;
}
@@ -406,6 +425,7 @@ static void vduse_blk_exp_delete(BlockExport *exp)
blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
vblk_exp);
+ blk_set_dev_ops(exp->blk, NULL, NULL);
vduse_dev_destroy(vblk_exp->dev);
}
--
2.20.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 6/6] libvduse: Add support for reconnecting
2022-04-06 7:59 [PATCH v4 0/6] Support exporting BDSs via VDUSE Xie Yongji
` (4 preceding siblings ...)
2022-04-06 7:59 ` [PATCH v4 5/6] vduse-blk: Add vduse-blk resize support Xie Yongji
@ 2022-04-06 7:59 ` Xie Yongji
5 siblings, 0 replies; 11+ messages in thread
From: Xie Yongji @ 2022-04-06 7:59 UTC (permalink / raw)
To: mst, jasowang, stefanha, sgarzare, kwolf, mreitz, mlureau, jsnow, eblake
Cc: qemu-devel, qemu-block
To support reconnecting after restart or crash, VDUSE backend
might need to resubmit inflight I/Os. This stores the metadata
such as the index of inflight I/O's descriptors to a shm file so
that VDUSE backend can restore them during reconnecting.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
---
block/export/vduse-blk.c | 14 ++
subprojects/libvduse/libvduse.c | 235 +++++++++++++++++++++++++++++++-
subprojects/libvduse/libvduse.h | 12 ++
3 files changed, 256 insertions(+), 5 deletions(-)
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index e027b2e5ff..b24b5aeda9 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -38,6 +38,7 @@ typedef struct VduseBlkExport {
uint16_t num_queues;
uint32_t blk_size;
bool writable;
+ char *recon_file;
} VduseBlkExport;
struct virtio_blk_inhdr {
@@ -233,6 +234,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+ /* Make sure we don't miss any kick afer reconnecting */
+ eventfd_write(vduse_queue_get_fd(vq), 1);
}
static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
@@ -404,6 +407,15 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
return -ENOMEM;
}
+ vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
+ g_get_tmp_dir(), exp->id);
+ if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
+ error_setg(errp, "failed to set reconnect log file");
+ vduse_dev_destroy(vblk_exp->dev);
+ g_free(vblk_exp->recon_file);
+ return -EINVAL;
+ }
+
for (i = 0; i < num_queues; i++) {
vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
}
@@ -427,6 +439,8 @@ static void vduse_blk_exp_delete(BlockExport *exp)
vblk_exp);
blk_set_dev_ops(exp->blk, NULL, NULL);
vduse_dev_destroy(vblk_exp->dev);
+ unlink(vblk_exp->recon_file);
+ g_free(vblk_exp->recon_file);
}
static void vduse_blk_exp_request_shutdown(BlockExport *exp)
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index ecee9c0568..b27145ceed 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -41,6 +41,8 @@
#define VDUSE_VQ_ALIGN 4096
#define MAX_IOVA_REGIONS 256
+#define LOG_ALIGNMENT 64
+
/* Round number down to multiple */
#define ALIGN_DOWN(n, m) ((n) / (m) * (m))
@@ -51,6 +53,31 @@
#define unlikely(x) __builtin_expect(!!(x), 0)
#endif
+typedef struct VduseDescStateSplit {
+ uint8_t inflight;
+ uint8_t padding[5];
+ uint16_t next;
+ uint64_t counter;
+} VduseDescStateSplit;
+
+typedef struct VduseVirtqLogInflight {
+ uint64_t features;
+ uint16_t version;
+ uint16_t desc_num;
+ uint16_t last_batch_head;
+ uint16_t used_idx;
+ VduseDescStateSplit desc[];
+} VduseVirtqLogInflight;
+
+typedef struct VduseVirtqLog {
+ VduseVirtqLogInflight inflight;
+} VduseVirtqLog;
+
+typedef struct VduseVirtqInflightDesc {
+ uint16_t index;
+ uint64_t counter;
+} VduseVirtqInflightDesc;
+
typedef struct VduseRing {
unsigned int num;
uint64_t desc_addr;
@@ -73,6 +100,10 @@ struct VduseVirtq {
bool ready;
int fd;
VduseDev *dev;
+ VduseVirtqInflightDesc *resubmit_list;
+ uint16_t resubmit_num;
+ uint64_t counter;
+ VduseVirtqLog *log;
};
typedef struct VduseIovaRegion {
@@ -96,8 +127,36 @@ struct VduseDev {
int fd;
int ctrl_fd;
void *priv;
+ void *log;
};
+static inline size_t vduse_vq_log_size(uint16_t queue_size)
+{
+ return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size +
+ sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT);
+}
+
+static void *vduse_log_get(const char *filename, size_t size)
+{
+ void *ptr = MAP_FAILED;
+ int fd;
+
+ fd = open(filename, O_RDWR | O_CREAT, 0600);
+ if (fd == -1) {
+ return MAP_FAILED;
+ }
+
+ if (ftruncate(fd, size) == -1) {
+ goto out;
+ }
+
+ ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
+out:
+ close(fd);
+ return ptr;
+}
+
static inline bool has_feature(uint64_t features, unsigned int fbit)
{
assert(fbit < 64);
@@ -148,6 +207,105 @@ static int vduse_inject_irq(VduseDev *dev, int index)
return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index);
}
+static int inflight_desc_compare(const void *a, const void *b)
+{
+ VduseVirtqInflightDesc *desc0 = (VduseVirtqInflightDesc *)a,
+ *desc1 = (VduseVirtqInflightDesc *)b;
+
+ if (desc1->counter > desc0->counter &&
+ (desc1->counter - desc0->counter) < VIRTQUEUE_MAX_SIZE * 2) {
+ return 1;
+ }
+
+ return -1;
+}
+
+static int vduse_queue_check_inflights(VduseVirtq *vq)
+{
+ int i = 0;
+ VduseDev *dev = vq->dev;
+
+ vq->used_idx = le16toh(vq->vring.used->idx);
+ vq->resubmit_num = 0;
+ vq->resubmit_list = NULL;
+ vq->counter = 0;
+
+ if (unlikely(vq->log->inflight.used_idx != vq->used_idx)) {
+ if (vq->log->inflight.last_batch_head > VIRTQUEUE_MAX_SIZE) {
+ return -1;
+ }
+
+ vq->log->inflight.desc[vq->log->inflight.last_batch_head].inflight = 0;
+
+ barrier();
+
+ vq->log->inflight.used_idx = vq->used_idx;
+ }
+
+ for (i = 0; i < vq->log->inflight.desc_num; i++) {
+ if (vq->log->inflight.desc[i].inflight == 1) {
+ vq->inuse++;
+ }
+ }
+
+ vq->shadow_avail_idx = vq->last_avail_idx = vq->inuse + vq->used_idx;
+
+ if (vq->inuse) {
+ vq->resubmit_list = calloc(vq->inuse, sizeof(VduseVirtqInflightDesc));
+ if (!vq->resubmit_list) {
+ return -1;
+ }
+
+ for (i = 0; i < vq->log->inflight.desc_num; i++) {
+ if (vq->log->inflight.desc[i].inflight) {
+ vq->resubmit_list[vq->resubmit_num].index = i;
+ vq->resubmit_list[vq->resubmit_num].counter =
+ vq->log->inflight.desc[i].counter;
+ vq->resubmit_num++;
+ }
+ }
+
+ if (vq->resubmit_num > 1) {
+ qsort(vq->resubmit_list, vq->resubmit_num,
+ sizeof(VduseVirtqInflightDesc), inflight_desc_compare);
+ }
+ vq->counter = vq->resubmit_list[0].counter + 1;
+ }
+
+ vduse_inject_irq(dev, vq->index);
+
+ return 0;
+}
+
+static int vduse_queue_inflight_get(VduseVirtq *vq, int desc_idx)
+{
+ vq->log->inflight.desc[desc_idx].counter = vq->counter++;
+
+ barrier();
+
+ vq->log->inflight.desc[desc_idx].inflight = 1;
+
+ return 0;
+}
+
+static int vduse_queue_inflight_pre_put(VduseVirtq *vq, int desc_idx)
+{
+ vq->log->inflight.last_batch_head = desc_idx;
+
+ return 0;
+}
+
+static int vduse_queue_inflight_post_put(VduseVirtq *vq, int desc_idx)
+{
+ vq->log->inflight.desc[desc_idx].inflight = 0;
+
+ barrier();
+
+ vq->log->inflight.used_idx = vq->used_idx;
+
+ return 0;
+}
+
static void vduse_iova_remove_region(VduseDev *dev, uint64_t start,
uint64_t last)
{
@@ -596,11 +754,24 @@ void *vduse_queue_pop(VduseVirtq *vq, size_t sz)
unsigned int head;
VduseVirtqElement *elem;
VduseDev *dev = vq->dev;
+ int i;
if (unlikely(!vq->vring.avail)) {
return NULL;
}
+ if (unlikely(vq->resubmit_list && vq->resubmit_num > 0)) {
+ i = (--vq->resubmit_num);
+ elem = vduse_queue_map_desc(vq, vq->resubmit_list[i].index, sz);
+
+ if (!vq->resubmit_num) {
+ free(vq->resubmit_list);
+ vq->resubmit_list = NULL;
+ }
+
+ return elem;
+ }
+
if (vduse_queue_empty(vq)) {
return NULL;
}
@@ -628,6 +799,8 @@ void *vduse_queue_pop(VduseVirtq *vq, size_t sz)
vq->inuse++;
+ vduse_queue_inflight_get(vq, head);
+
return elem;
}
@@ -685,7 +858,9 @@ void vduse_queue_push(VduseVirtq *vq, const VduseVirtqElement *elem,
unsigned int len)
{
vduse_queue_fill(vq, elem, len, 0);
+ vduse_queue_inflight_pre_put(vq, elem->index);
vduse_queue_flush(vq, 1);
+ vduse_queue_inflight_post_put(vq, elem->index);
}
static int vduse_queue_update_vring(VduseVirtq *vq, uint64_t desc_addr,
@@ -758,12 +933,15 @@ static void vduse_queue_enable(VduseVirtq *vq)
}
vq->fd = fd;
- vq->shadow_avail_idx = vq->last_avail_idx = vq_info.split.avail_index;
- vq->inuse = 0;
- vq->used_idx = 0;
vq->signalled_used_valid = false;
vq->ready = true;
+ if (vduse_queue_check_inflights(vq)) {
+ fprintf(stderr, "Failed to check inflights for vq[%d]\n", vq->index);
+ close(fd);
+ return;
+ }
+
dev->ops->enable_queue(dev, vq);
}
@@ -813,11 +991,15 @@ static void vduse_dev_start_dataplane(VduseDev *dev)
static void vduse_dev_stop_dataplane(VduseDev *dev)
{
+ size_t log_size = dev->num_queues * vduse_vq_log_size(VIRTQUEUE_MAX_SIZE);
int i;
for (i = 0; i < dev->num_queues; i++) {
vduse_queue_disable(&dev->vqs[i]);
}
+ if (dev->log) {
+ memset(dev->log, 0, log_size);
+ }
dev->features = 0;
vduse_iova_remove_region(dev, 0, ULONG_MAX);
}
@@ -926,6 +1108,30 @@ int vduse_dev_setup_queue(VduseDev *dev, int index, int max_size)
return -errno;
}
+ vduse_queue_enable(vq);
+
+ return 0;
+}
+
+int vduse_set_reconnect_log_file(VduseDev *dev, const char *filename)
+{
+
+ size_t log_size = dev->num_queues * vduse_vq_log_size(VIRTQUEUE_MAX_SIZE);
+ void *log;
+ int i;
+
+ dev->log = log = vduse_log_get(filename, log_size);
+ if (log == MAP_FAILED) {
+ fprintf(stderr, "Failed to get vduse log\n");
+ return -EINVAL;
+ }
+
+ for (i = 0; i < dev->num_queues; i++) {
+ dev->vqs[i].log = log;
+ dev->vqs[i].log->inflight.desc_num = VIRTQUEUE_MAX_SIZE;
+ log = (void *)((char *)log + vduse_vq_log_size(VIRTQUEUE_MAX_SIZE));
+ }
+
return 0;
}
@@ -970,6 +1176,12 @@ static int vduse_dev_init(VduseDev *dev, const char *name,
return -errno;
}
+ if (ioctl(fd, VDUSE_DEV_GET_FEATURES, &dev->features)) {
+ fprintf(stderr, "Failed to get features: %s\n", strerror(errno));
+ close(fd);
+ return -errno;
+ }
+
dev_name = strdup(name);
if (!dev_name) {
close(fd);
@@ -1014,6 +1226,12 @@ VduseDev *vduse_dev_create_by_fd(int fd, uint16_t num_queues,
return NULL;
}
+ if (ioctl(fd, VDUSE_DEV_GET_FEATURES, &dev->features)) {
+ fprintf(stderr, "Failed to get features: %s\n", strerror(errno));
+ free(dev);
+ return NULL;
+ }
+
ret = vduse_dev_init_vqs(dev, num_queues);
if (ret) {
fprintf(stderr, "Failed to init vqs\n");
@@ -1113,7 +1331,7 @@ VduseDev *vduse_dev_create(const char *name, uint32_t device_id,
ret = ioctl(ctrl_fd, VDUSE_CREATE_DEV, dev_config);
free(dev_config);
- if (ret < 0) {
+ if (ret && errno != EEXIST) {
fprintf(stderr, "Failed to create vduse device %s: %s\n",
name, strerror(errno));
goto err_dev;
@@ -1140,8 +1358,15 @@ err_ctrl:
int vduse_dev_destroy(VduseDev *dev)
{
- int ret = 0;
+ size_t log_size = dev->num_queues * vduse_vq_log_size(VIRTQUEUE_MAX_SIZE);
+ int i, ret = 0;
+ if (dev->log) {
+ munmap(dev->log, log_size);
+ }
+ for (i = 0; i < dev->num_queues; i++) {
+ free(dev->vqs[i].resubmit_list);
+ }
free(dev->vqs);
if (dev->fd > 0) {
close(dev->fd);
diff --git a/subprojects/libvduse/libvduse.h b/subprojects/libvduse/libvduse.h
index 6c2fe98213..32f19e7b48 100644
--- a/subprojects/libvduse/libvduse.h
+++ b/subprojects/libvduse/libvduse.h
@@ -173,6 +173,18 @@ int vduse_dev_update_config(VduseDev *dev, uint32_t size,
*/
int vduse_dev_setup_queue(VduseDev *dev, int index, int max_size);
+/**
+ * vduse_set_reconnect_log_file:
+ * @dev: VDUSE device
+ * @file: filename of reconnect log
+ *
+ * Specify the file to store log for reconnecting. It should
+ * be called before vduse_dev_setup_queue().
+ *
+ * Returns: 0 on success, -errno on failure.
+ */
+int vduse_set_reconnect_log_file(VduseDev *dev, const char *filename);
+
/**
* vduse_dev_create_by_fd:
* @fd: passed file descriptor
--
2.20.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v4 4/6] vduse-blk: implements vduse-blk export
2022-04-06 7:59 ` [PATCH v4 4/6] vduse-blk: implements vduse-blk export Xie Yongji
@ 2022-04-26 17:03 ` Kevin Wolf
2022-04-27 3:11 ` Yongji Xie
0 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2022-04-26 17:03 UTC (permalink / raw)
To: Xie Yongji
Cc: qemu-block, mst, eblake, jasowang, qemu-devel, mreitz, mlureau,
stefanha, jsnow, sgarzare
Am 06.04.2022 um 09:59 hat Xie Yongji geschrieben:
> This implements a VDUSE block backends based on
> the libvduse library. We can use it to export the BDSs
> for both VM and container (host) usage.
>
> The new command-line syntax is:
>
> $ qemu-storage-daemon \
> --blockdev file,node-name=drive0,filename=test.img \
> --export vduse-blk,node-name=drive0,id=vduse-export0,writable=on
>
> After the qemu-storage-daemon started, we need to use
> the "vdpa" command to attach the device to vDPA bus:
>
> $ vdpa dev add name vduse-export0 mgmtdev vduse
>
> Also the device must be removed via the "vdpa" command
> before we stop the qemu-storage-daemon.
>
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
The request handling code is almos the same as for the vhost-user-blk
export. I wonder if we could share this code instead of copying.
The main difference seems to be that you chose not to support discard
and write_zeroes yet. I'm curious if there is a reason why the
vhost-user-blk code wouldn't work for vdpa there?
> + features = vduse_get_virtio_features() |
> + (1ULL << VIRTIO_BLK_F_SIZE_MAX) |
> + (1ULL << VIRTIO_BLK_F_SEG_MAX) |
> + (1ULL << VIRTIO_BLK_F_TOPOLOGY) |
> + (1ULL << VIRTIO_BLK_F_BLK_SIZE);
> +
> + if (num_queues > 1) {
> + features |= 1ULL << VIRTIO_BLK_F_MQ;
> + }
> + if (!vblk_exp->writable) {
> + features |= 1ULL << VIRTIO_BLK_F_RO;
> + }
VIRTIO_BLK_F_FLUSH seems to be missing even though the flush command is
implemented.
(This is not a full review yet, just two or three things I noticed while
having a quick look.)
Kevin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 4/6] vduse-blk: implements vduse-blk export
2022-04-26 17:03 ` Kevin Wolf
@ 2022-04-27 3:11 ` Yongji Xie
2022-04-27 13:22 ` Kevin Wolf
0 siblings, 1 reply; 11+ messages in thread
From: Yongji Xie @ 2022-04-27 3:11 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-block, Michael S. Tsirkin, Eric Blake, Jason Wang,
qemu-devel, mreitz, mlureau, Stefan Hajnoczi, jsnow,
Stefano Garzarella
On Wed, Apr 27, 2022 at 1:03 AM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 06.04.2022 um 09:59 hat Xie Yongji geschrieben:
> > This implements a VDUSE block backends based on
> > the libvduse library. We can use it to export the BDSs
> > for both VM and container (host) usage.
> >
> > The new command-line syntax is:
> >
> > $ qemu-storage-daemon \
> > --blockdev file,node-name=drive0,filename=test.img \
> > --export vduse-blk,node-name=drive0,id=vduse-export0,writable=on
> >
> > After the qemu-storage-daemon started, we need to use
> > the "vdpa" command to attach the device to vDPA bus:
> >
> > $ vdpa dev add name vduse-export0 mgmtdev vduse
> >
> > Also the device must be removed via the "vdpa" command
> > before we stop the qemu-storage-daemon.
> >
> > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
>
> The request handling code is almos the same as for the vhost-user-blk
> export. I wonder if we could share this code instead of copying.
>
I think we can. Will do it v5.
> The main difference seems to be that you chose not to support discard
> and write_zeroes yet. I'm curious if there is a reason why the
> vhost-user-blk code wouldn't work for vdpa there?
>
They are different protocols. The data plane is similar, so we can
share some codes. But the control plane is different, e.g., vhost-user
can only work for guests but vdpa can work for both guests and hosts.
> > + features = vduse_get_virtio_features() |
> > + (1ULL << VIRTIO_BLK_F_SIZE_MAX) |
> > + (1ULL << VIRTIO_BLK_F_SEG_MAX) |
> > + (1ULL << VIRTIO_BLK_F_TOPOLOGY) |
> > + (1ULL << VIRTIO_BLK_F_BLK_SIZE);
> > +
> > + if (num_queues > 1) {
> > + features |= 1ULL << VIRTIO_BLK_F_MQ;
> > + }
> > + if (!vblk_exp->writable) {
> > + features |= 1ULL << VIRTIO_BLK_F_RO;
> > + }
>
> VIRTIO_BLK_F_FLUSH seems to be missing even though the flush command is
> implemented.
>
Oops. Will fix it.
> (This is not a full review yet, just two or three things I noticed while
> having a quick look.)
>
Thank you for your time!
Thanks,
Yongji
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 4/6] vduse-blk: implements vduse-blk export
2022-04-27 3:11 ` Yongji Xie
@ 2022-04-27 13:22 ` Kevin Wolf
2022-04-27 13:42 ` Yongji Xie
0 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2022-04-27 13:22 UTC (permalink / raw)
To: Yongji Xie
Cc: qemu-block, Michael S. Tsirkin, Eric Blake, Jason Wang,
qemu-devel, mreitz, mlureau, Stefan Hajnoczi, jsnow,
Stefano Garzarella
Am 27.04.2022 um 05:11 hat Yongji Xie geschrieben:
> On Wed, Apr 27, 2022 at 1:03 AM Kevin Wolf <kwolf@redhat.com> wrote:
> >
> > Am 06.04.2022 um 09:59 hat Xie Yongji geschrieben:
> > > This implements a VDUSE block backends based on
> > > the libvduse library. We can use it to export the BDSs
> > > for both VM and container (host) usage.
> > >
> > > The new command-line syntax is:
> > >
> > > $ qemu-storage-daemon \
> > > --blockdev file,node-name=drive0,filename=test.img \
> > > --export vduse-blk,node-name=drive0,id=vduse-export0,writable=on
> > >
> > > After the qemu-storage-daemon started, we need to use
> > > the "vdpa" command to attach the device to vDPA bus:
> > >
> > > $ vdpa dev add name vduse-export0 mgmtdev vduse
> > >
> > > Also the device must be removed via the "vdpa" command
> > > before we stop the qemu-storage-daemon.
> > >
> > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> >
> > The request handling code is almos the same as for the vhost-user-blk
> > export. I wonder if we could share this code instead of copying.
> >
>
> I think we can. Will do it v5.
>
> > The main difference seems to be that you chose not to support discard
> > and write_zeroes yet. I'm curious if there is a reason why the
> > vhost-user-blk code wouldn't work for vdpa there?
> >
>
> They are different protocols. The data plane is similar, so we can
> share some codes. But the control plane is different, e.g., vhost-user
> can only work for guests but vdpa can work for both guests and hosts.
Yes, sure, but discard/write_zeroes are part of the data plane, no?
You're already sharing (or at the moment copying) the code for the other
request types mostly unchanged, so I wondered what is different about
discard/write_zeroes.
Kevin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 4/6] vduse-blk: implements vduse-blk export
2022-04-27 13:22 ` Kevin Wolf
@ 2022-04-27 13:42 ` Yongji Xie
0 siblings, 0 replies; 11+ messages in thread
From: Yongji Xie @ 2022-04-27 13:42 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-block, Michael S. Tsirkin, Eric Blake, Jason Wang,
qemu-devel, mreitz, mlureau, Stefan Hajnoczi, jsnow,
Stefano Garzarella
On Wed, Apr 27, 2022 at 9:22 PM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 27.04.2022 um 05:11 hat Yongji Xie geschrieben:
> > On Wed, Apr 27, 2022 at 1:03 AM Kevin Wolf <kwolf@redhat.com> wrote:
> > >
> > > Am 06.04.2022 um 09:59 hat Xie Yongji geschrieben:
> > > > This implements a VDUSE block backends based on
> > > > the libvduse library. We can use it to export the BDSs
> > > > for both VM and container (host) usage.
> > > >
> > > > The new command-line syntax is:
> > > >
> > > > $ qemu-storage-daemon \
> > > > --blockdev file,node-name=drive0,filename=test.img \
> > > > --export vduse-blk,node-name=drive0,id=vduse-export0,writable=on
> > > >
> > > > After the qemu-storage-daemon started, we need to use
> > > > the "vdpa" command to attach the device to vDPA bus:
> > > >
> > > > $ vdpa dev add name vduse-export0 mgmtdev vduse
> > > >
> > > > Also the device must be removed via the "vdpa" command
> > > > before we stop the qemu-storage-daemon.
> > > >
> > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> > >
> > > The request handling code is almos the same as for the vhost-user-blk
> > > export. I wonder if we could share this code instead of copying.
> > >
> >
> > I think we can. Will do it v5.
> >
> > > The main difference seems to be that you chose not to support discard
> > > and write_zeroes yet. I'm curious if there is a reason why the
> > > vhost-user-blk code wouldn't work for vdpa there?
> > >
> >
> > They are different protocols. The data plane is similar, so we can
> > share some codes. But the control plane is different, e.g., vhost-user
> > can only work for guests but vdpa can work for both guests and hosts.
>
> Yes, sure, but discard/write_zeroes are part of the data plane, no?
> You're already sharing (or at the moment copying) the code for the other
> request types mostly unchanged, so I wondered what is different about
> discard/write_zeroes.
>
I get your point. There is no limitation on discard/write_zeroes
support for vduse. I will do it in v5.
Thanks,
Yongji
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-04-27 13:49 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 7:59 [PATCH v4 0/6] Support exporting BDSs via VDUSE Xie Yongji
2022-04-06 7:59 ` [PATCH v4 1/6] block: Support passing NULL ops to blk_set_dev_ops() Xie Yongji
2022-04-06 7:59 ` [PATCH v4 2/6] linux-headers: Add vduse.h Xie Yongji
2022-04-06 7:59 ` [PATCH v4 3/6] libvduse: Add VDUSE (vDPA Device in Userspace) library Xie Yongji
2022-04-06 7:59 ` [PATCH v4 4/6] vduse-blk: implements vduse-blk export Xie Yongji
2022-04-26 17:03 ` Kevin Wolf
2022-04-27 3:11 ` Yongji Xie
2022-04-27 13:22 ` Kevin Wolf
2022-04-27 13:42 ` Yongji Xie
2022-04-06 7:59 ` [PATCH v4 5/6] vduse-blk: Add vduse-blk resize support Xie Yongji
2022-04-06 7:59 ` [PATCH v4 6/6] libvduse: Add support for reconnecting Xie Yongji
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).