* [PATCH v3 0/5] vhost-user block device backend implementation
@ 2020-02-12 9:51 Coiby Xu
2020-02-12 9:51 ` [PATCH v3 1/5] extend libvhost to support IOThread and coroutine Coiby Xu
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Coiby Xu @ 2020-02-12 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, bharatlkmlkvm, Coiby Xu, stefanha
v3:
* separate generic vhost-user-server code from vhost-user-blk-server
code
* re-write vu_message_read and kick hander function as coroutines to
directly call blk_co_preadv, blk_co_pwritev, etc.
* add aio_context notifier functions to support multi-threading model
* other fixes regarding coding style, warning report, etc.
v2:
* Only enable this feauture for Linux because eventfd is a Linux-specific
feature
This patch series is an implementation of vhost-user block device
backend server, thanks to Stefan and Kevin's guidance.
Vhost-user block device backend server is a UserCreatable object and can be
started using object_add,
(qemu) object_add vhost-user-blk-server,id=ID,unix-socket=/tmp/vhost-user-blk_vhost.socket,node-name=DRIVE_NAME,writable=off,blk-size=512
(qemu) object_del ID
or appending the "-object" option when starting QEMU,
$ -object vhost-user-blk-server,id=disk,unix-socket=/tmp/vhost-user-blk_vhost.socket,node-name=DRIVE_NAME,writable=off,blk-size=512
Then vhost-user client can connect to the server backend.
For example, QEMU could act as a client,
$ -m 256 -object memory-backend-memfd,id=mem,size=256M,share=on -numa node,memdev=mem -chardev socket,id=char1,path=/tmp/vhost-user-blk_vhost.socket -device vhost-user-blk-pci,id=blk0,chardev=char1
And guest OS could access this vhost-user block device after mouting it.
Coiby Xu (5):
extend libvhost to support IOThread and coroutine
generic vhost user server
vhost-user block device backend server
a standone-alone tool to directly share disk image file via vhost-user
protocol
new qTest case to test the vhost-user-blk-server
Makefile | 4 +
Makefile.target | 1 +
backends/Makefile.objs | 2 +
backends/vhost-user-blk-server.c | 716 ++++++++++++++++++++++++++
backends/vhost-user-blk-server.h | 21 +
configure | 3 +
contrib/libvhost-user/libvhost-user.c | 54 +-
contrib/libvhost-user/libvhost-user.h | 38 +-
qemu-vu.c | 252 +++++++++
tests/libqos/vhost-user-blk.c | 126 +++++
tests/libqos/vhost-user-blk.h | 44 ++
tests/vhost-user-blk-test.c | 694 +++++++++++++++++++++++++
util/Makefile.objs | 3 +
util/vhost-user-server.c | 429 ++++++++++++++++
util/vhost-user-server.h | 56 ++
vl.c | 4 +
16 files changed, 2440 insertions(+), 7 deletions(-)
create mode 100644 backends/vhost-user-blk-server.c
create mode 100644 backends/vhost-user-blk-server.h
create mode 100644 qemu-vu.c
create mode 100644 tests/libqos/vhost-user-blk.c
create mode 100644 tests/libqos/vhost-user-blk.h
create mode 100644 tests/vhost-user-blk-test.c
create mode 100644 util/vhost-user-server.c
create mode 100644 util/vhost-user-server.h
--
2.25.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 1/5] extend libvhost to support IOThread and coroutine
2020-02-12 9:51 [PATCH v3 0/5] vhost-user block device backend implementation Coiby Xu
@ 2020-02-12 9:51 ` Coiby Xu
2020-02-12 9:51 ` [PATCH v3 2/5] generic vhost user server Coiby Xu
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Coiby Xu @ 2020-02-12 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, bharatlkmlkvm, Coiby Xu, stefanha
Previously libvhost dispatch events in its own GMainContext. Now vhost-user
client's kick event can be dispatched in block device drive's AioContext
thus IOThread is supported. And also allow vu_message_read and
vu_kick_cb to be replaced so QEMU can run them as coroutines.
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
---
contrib/libvhost-user/libvhost-user.c | 54 ++++++++++++++++++++++++---
contrib/libvhost-user/libvhost-user.h | 38 ++++++++++++++++++-
2 files changed, 85 insertions(+), 7 deletions(-)
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index b89bf18501..f95664bb22 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -67,8 +67,6 @@
/* The version of inflight buffer */
#define INFLIGHT_VERSION 1
-#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
-
/* The version of the protocol we support */
#define VHOST_USER_VERSION 1
#define LIBVHOST_USER_DEBUG 0
@@ -260,7 +258,7 @@ have_userfault(void)
}
static bool
-vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
+vu_message_read_(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
{
char control[CMSG_SPACE(VHOST_MEMORY_MAX_NREGIONS * sizeof(int))] = { };
struct iovec iov = {
@@ -328,6 +326,17 @@ fail:
return false;
}
+static bool vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
+{
+ vu_read_msg_cb read_msg;
+ if (dev->co_iface) {
+ read_msg = dev->co_iface->read_msg;
+ } else {
+ read_msg = vu_message_read_;
+ }
+ return read_msg(dev, conn_fd, vmsg);
+}
+
static bool
vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
{
@@ -1075,9 +1084,14 @@ vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg *vmsg)
}
if (dev->vq[index].kick_fd != -1 && dev->vq[index].handler) {
+ if (dev->set_watch_packed_data) {
+ dev->set_watch_packed_data(dev, dev->vq[index].kick_fd, VU_WATCH_IN,
+ dev->co_iface->kick_callback,
+ (void *)(long)index);
+ } else {
dev->set_watch(dev, dev->vq[index].kick_fd, VU_WATCH_IN,
vu_kick_cb, (void *)(long)index);
-
+ }
DPRINT("Waiting for kicks on fd: %d for vq: %d\n",
dev->vq[index].kick_fd, index);
}
@@ -1097,8 +1111,14 @@ void vu_set_queue_handler(VuDev *dev, VuVirtq *vq,
vq->handler = handler;
if (vq->kick_fd >= 0) {
if (handler) {
+ if (dev->set_watch_packed_data) {
+ dev->set_watch_packed_data(dev, vq->kick_fd, VU_WATCH_IN,
+ dev->co_iface->kick_callback,
+ (void *)(long)qidx);
+ } else {
dev->set_watch(dev, vq->kick_fd, VU_WATCH_IN,
vu_kick_cb, (void *)(long)qidx);
+ }
} else {
dev->remove_watch(dev, vq->kick_fd);
}
@@ -1627,6 +1647,12 @@ vu_deinit(VuDev *dev)
}
if (vq->kick_fd != -1) {
+ /* remove watch for kick_fd
+ * When client process is running in gdb and
+ * quit command is run in gdb, QEMU will still dispatch the event
+ * which will cause segment fault in the callback function
+ */
+ dev->remove_watch(dev, vq->kick_fd);
close(vq->kick_fd);
vq->kick_fd = -1;
}
@@ -1682,7 +1708,7 @@ vu_init(VuDev *dev,
assert(max_queues > 0);
assert(socket >= 0);
- assert(set_watch);
+ /* assert(set_watch); */
assert(remove_watch);
assert(iface);
assert(panic);
@@ -1715,6 +1741,24 @@ vu_init(VuDev *dev,
return true;
}
+bool
+vu_init_packed_data(VuDev *dev,
+ uint16_t max_queues,
+ int socket,
+ vu_panic_cb panic,
+ vu_set_watch_cb_packed_data set_watch_packed_data,
+ vu_remove_watch_cb remove_watch,
+ const VuDevIface *iface,
+ const CoIface *co_iface)
+{
+ if (vu_init(dev, max_queues, socket, panic, NULL, remove_watch, iface)) {
+ dev->set_watch_packed_data = set_watch_packed_data;
+ dev->co_iface = co_iface;
+ return true;
+ }
+ return false;
+}
+
VuVirtq *
vu_get_queue(VuDev *dev, int qidx)
{
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 5cb7708559..6aadeaa0f2 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -30,6 +30,8 @@
#define VHOST_MEMORY_MAX_NREGIONS 8
+#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
+
typedef enum VhostSetConfigType {
VHOST_SET_CONFIG_TYPE_MASTER = 0,
VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
@@ -201,6 +203,7 @@ typedef uint64_t (*vu_get_features_cb) (VuDev *dev);
typedef void (*vu_set_features_cb) (VuDev *dev, uint64_t features);
typedef int (*vu_process_msg_cb) (VuDev *dev, VhostUserMsg *vmsg,
int *do_reply);
+typedef bool (*vu_read_msg_cb) (VuDev *dev, int sock, VhostUserMsg *vmsg);
typedef void (*vu_queue_set_started_cb) (VuDev *dev, int qidx, bool started);
typedef bool (*vu_queue_is_processed_in_order_cb) (VuDev *dev, int qidx);
typedef int (*vu_get_config_cb) (VuDev *dev, uint8_t *config, uint32_t len);
@@ -208,6 +211,20 @@ typedef int (*vu_set_config_cb) (VuDev *dev, const uint8_t *data,
uint32_t offset, uint32_t size,
uint32_t flags);
+typedef void (*vu_watch_cb_packed_data) (void *packed_data);
+
+typedef void (*vu_set_watch_cb_packed_data) (VuDev *dev, int fd, int condition,
+ vu_watch_cb_packed_data cb,
+ void *data);
+/*
+ * allowing vu_read_msg_cb and kick_callback to be replaced so QEMU
+ * can run them as coroutines
+ */
+typedef struct CoIface {
+ vu_read_msg_cb read_msg;
+ vu_watch_cb_packed_data kick_callback;
+} CoIface;
+
typedef struct VuDevIface {
/* called by VHOST_USER_GET_FEATURES to get the features bitmask */
vu_get_features_cb get_features;
@@ -372,7 +389,8 @@ struct VuDev {
/* @set_watch: add or update the given fd to the watch set,
* call cb when condition is met */
vu_set_watch_cb set_watch;
-
+ /* AIO dispatch will only one data pointer to callback function */
+ vu_set_watch_cb_packed_data set_watch_packed_data;
/* @remove_watch: remove the given fd from the watch set */
vu_remove_watch_cb remove_watch;
@@ -380,7 +398,7 @@ struct VuDev {
* re-initialize */
vu_panic_cb panic;
const VuDevIface *iface;
-
+ const CoIface *co_iface;
/* Postcopy data */
int postcopy_ufd;
bool postcopy_listening;
@@ -417,6 +435,22 @@ bool vu_init(VuDev *dev,
const VuDevIface *iface);
+/**
+ * vu_init_packed_data:
+ * Same as vu_init except for set_watch_packed_data which will pack
+ * two parameters into a struct thus QEMU aio_dispatch can pass the
+ * required data to callback function.
+ *
+ * Returns: true on success, false on failure.
+ **/
+bool vu_init_packed_data(VuDev *dev,
+ uint16_t max_queues,
+ int socket,
+ vu_panic_cb panic,
+ vu_set_watch_cb_packed_data set_watch_packed_data,
+ vu_remove_watch_cb remove_watch,
+ const VuDevIface *iface,
+ const CoIface *co_iface);
/**
* vu_deinit:
* @dev: a VuDev context
--
2.25.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 2/5] generic vhost user server
2020-02-12 9:51 [PATCH v3 0/5] vhost-user block device backend implementation Coiby Xu
2020-02-12 9:51 ` [PATCH v3 1/5] extend libvhost to support IOThread and coroutine Coiby Xu
@ 2020-02-12 9:51 ` Coiby Xu
2020-02-12 9:51 ` [PATCH v3 3/5] vhost-user block device backend server Coiby Xu
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Coiby Xu @ 2020-02-12 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, bharatlkmlkvm, Coiby Xu, stefanha
Sharing QEMU devices via vhost-user protocol
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
---
util/Makefile.objs | 3 +
util/vhost-user-server.c | 429 +++++++++++++++++++++++++++++++++++++++
util/vhost-user-server.h | 56 +++++
3 files changed, 489 insertions(+)
create mode 100644 util/vhost-user-server.c
create mode 100644 util/vhost-user-server.h
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 11262aafaf..5e450e501c 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -36,6 +36,9 @@ util-obj-y += readline.o
util-obj-y += rcu.o
util-obj-$(CONFIG_MEMBARRIER) += sys_membarrier.o
util-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
+ifdef CONFIG_LINUX
+util-obj-y += vhost-user-server.o
+endif
util-obj-y += qemu-coroutine-sleep.o
util-obj-y += qemu-co-shared-resource.o
util-obj-y += coroutine-$(CONFIG_COROUTINE_BACKEND).o
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
new file mode 100644
index 0000000000..0766b414c3
--- /dev/null
+++ b/util/vhost-user-server.c
@@ -0,0 +1,429 @@
+/*
+ * Sharing QEMU devices via vhost-user protocol
+ *
+ * Author: Coiby Xu <coiby.xu@gmail.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include <sys/eventfd.h>
+#include "qemu/main-loop.h"
+#include "vhost-user-server.h"
+
+static void vmsg_close_fds(VhostUserMsg *vmsg)
+{
+ int i;
+ for (i = 0; i < vmsg->fd_num; i++) {
+ close(vmsg->fds[i]);
+ }
+}
+
+static void vmsg_unblock_fds(VhostUserMsg *vmsg)
+{
+ int i;
+ for (i = 0; i < vmsg->fd_num; i++) {
+ qemu_set_nonblock(vmsg->fds[i]);
+ }
+}
+
+
+static void close_client(VuClient *client)
+{
+ vu_deinit(&client->parent);
+ client->sioc = NULL;
+ object_unref(OBJECT(client->ioc));
+ client->closed = true;
+
+}
+
+static void panic_cb(VuDev *vu_dev, const char *buf)
+{
+ if (buf) {
+ error_report("vu_panic: %s", buf);
+ }
+
+ VuClient *client = container_of(vu_dev, VuClient, parent);
+ VuServer *server = client->server;
+ if (!client->closed) {
+ close_client(client);
+ QTAILQ_REMOVE(&server->clients, client, next);
+ }
+
+ if (server->device_panic_notifier) {
+ server->device_panic_notifier(client);
+ }
+}
+
+
+
+static bool coroutine_fn
+vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
+{
+ struct iovec iov = {
+ .iov_base = (char *)vmsg,
+ .iov_len = VHOST_USER_HDR_SIZE,
+ };
+ int rc, read_bytes = 0;
+ /*
+ * VhostUserMsg is a packed structure, gcc will complain about passing
+ * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
+ * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
+ * thus two temporary variables nfds and fds are used here.
+ */
+ size_t nfds = 0, nfds_t = 0;
+ int *fds = NULL, *fds_t = NULL;
+ VuClient *client = container_of(vu_dev, VuClient, parent);
+ QIOChannel *ioc = client->ioc;
+
+ Error *erp;
+ assert(qemu_in_coroutine());
+ do {
+ /*
+ * qio_channel_readv_full may have short reads, keeping calling it
+ * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
+ */
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &erp);
+ if (rc < 0) {
+ if (rc == QIO_CHANNEL_ERR_BLOCK) {
+ qio_channel_yield(ioc, G_IO_IN);
+ continue;
+ } else {
+ error_report("Error while recvmsg: %s", strerror(errno));
+ return false;
+ }
+ }
+ read_bytes += rc;
+ fds = g_renew(int, fds_t, nfds + nfds_t);
+ memcpy(fds + nfds, fds_t, nfds_t);
+ nfds += nfds_t;
+ if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
+ break;
+ }
+ } while (true);
+
+ vmsg->fd_num = nfds;
+ memcpy(vmsg->fds, fds, nfds * sizeof(int));
+ g_free(fds);
+ /* qio_channel_readv_full will make socket fds blocking, unblock them */
+ vmsg_unblock_fds(vmsg);
+ if (vmsg->size > sizeof(vmsg->payload)) {
+ error_report("Error: too big message request: %d, "
+ "size: vmsg->size: %u, "
+ "while sizeof(vmsg->payload) = %zu",
+ vmsg->request, vmsg->size, sizeof(vmsg->payload));
+ goto fail;
+ }
+
+ struct iovec iov_payload = {
+ .iov_base = (char *)&vmsg->payload,
+ .iov_len = vmsg->size,
+ };
+ if (vmsg->size) {
+ rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &erp);
+ if (rc == -1) {
+ error_report("Error while reading: %s", strerror(errno));
+ goto fail;
+ }
+ }
+
+ return true;
+
+fail:
+ vmsg_close_fds(vmsg);
+
+ return false;
+}
+
+
+static coroutine_fn void vu_client_next_trip(VuClient *client);
+
+static coroutine_fn void vu_client_trip(void *opaque)
+{
+ VuClient *client = opaque;
+
+ vu_dispatch(&client->parent);
+ client->co_trip = NULL;
+ if (!client->closed) {
+ vu_client_next_trip(client);
+ }
+}
+
+static coroutine_fn void vu_client_next_trip(VuClient *client)
+{
+ if (!client->co_trip) {
+ client->co_trip = qemu_coroutine_create(vu_client_trip, client);
+ aio_co_schedule(client->ioc->ctx, client->co_trip);
+ }
+}
+
+static void vu_client_start(VuClient *client)
+{
+ client->co_trip = qemu_coroutine_create(vu_client_trip, client);
+ aio_co_enter(client->ioc->ctx, client->co_trip);
+}
+
+static void coroutine_fn vu_kick_cb_next(VuClient *client,
+ kick_info *data);
+
+static void coroutine_fn vu_kick_cb(void *opaque)
+{
+ kick_info *data = (kick_info *) opaque;
+ int index = data->index;
+ VuDev *dev = data->vu_dev;
+ VuClient *client;
+ client = container_of(dev, VuClient, parent);
+ VuVirtq *vq = &dev->vq[index];
+ int sock = vq->kick_fd;
+ if (sock == -1) {
+ return;
+ }
+ assert(sock == data->fd);
+ eventfd_t kick_data;
+ ssize_t rc;
+ /*
+ * When eventfd is closed, the revent is POLLNVAL (=G_IO_NVAL) and
+ * reading eventfd will return errno=EBADF (Bad file number).
+ * Calling qio_channel_yield(ioc, G_IO_IN) will set reading handler
+ * for QIOChannel, but aio_dispatch_handlers will only dispatch
+ * G_IO_IN | G_IO_HUP | G_IO_ERR revents while ignoring
+ * G_IO_NVAL (POLLNVAL) revents.
+ *
+ * Thus when eventfd is closed by vhost-user client, QEMU will ignore
+ * G_IO_NVAL and keeping polling by repeatedly calling qemu_poll_ns which
+ * will lead to 100% CPU usage.
+ *
+ * To aovid this issue, make sure set_watch and remove_watch use the same
+ * AIOContext for QIOChannel. Thus remove_watch will eventually succefully
+ * remove eventfd from the set of file descriptors polled for
+ * corresponding GSource.
+ */
+ rc = read(sock, &kick_data, sizeof(eventfd_t));
+ if (rc != sizeof(eventfd_t)) {
+ if (errno == EAGAIN) {
+ qio_channel_yield(data->ioc, G_IO_IN);
+ } else if (errno != EINTR) {
+ data->co = NULL;
+ return;
+ }
+ } else {
+ vq->handler(dev, index);
+ }
+ data->co = NULL;
+ vu_kick_cb_next(client, data);
+
+}
+
+static void coroutine_fn vu_kick_cb_next(VuClient *client,
+ kick_info *cb_data)
+{
+ if (!cb_data->co) {
+ cb_data->co = qemu_coroutine_create(vu_kick_cb, cb_data);
+ aio_co_schedule(client->ioc->ctx, cb_data->co);
+ }
+}
+static const CoIface co_iface = {
+ .read_msg = vu_message_read,
+ .kick_callback = vu_kick_cb,
+};
+
+
+static void
+set_watch(VuDev *vu_dev, int fd, int vu_evt,
+ vu_watch_cb_packed_data cb, void *pvt)
+{
+ /*
+ * since aio_dispatch can only pass one user data pointer to the
+ * callback function, pack VuDev, pvt into a struct
+ */
+
+ VuClient *client;
+
+ client = container_of(vu_dev, VuClient, parent);
+ g_assert(vu_dev);
+ g_assert(fd >= 0);
+ long index = (intptr_t) pvt;
+ g_assert(cb);
+ kick_info *kick_info = &client->kick_info[index];
+ if (!kick_info->co) {
+ kick_info->fd = fd;
+ QIOChannelFile *fioc = qio_channel_file_new_fd(fd);
+ QIOChannel *ioc = QIO_CHANNEL(fioc);
+ ioc->ctx = client->ioc->ctx;
+ qio_channel_set_blocking(QIO_CHANNEL(ioc), false, NULL);
+ kick_info->fioc = fioc;
+ kick_info->ioc = ioc;
+ kick_info->vu_dev = vu_dev;
+ kick_info->co = qemu_coroutine_create(cb, kick_info);
+ aio_co_enter(client->ioc->ctx, kick_info->co);
+ }
+}
+
+
+static void remove_watch(VuDev *vu_dev, int fd)
+{
+ VuClient *client;
+ int i;
+ int index = -1;
+ g_assert(vu_dev);
+ g_assert(fd >= 0);
+
+ client = container_of(vu_dev, VuClient, parent);
+ for (i = 0; i < vu_dev->max_queues; i++) {
+ if (client->kick_info[i].fd == fd) {
+ index = i;
+ break;
+ }
+ }
+
+ if (index == -1) {
+ return;
+ }
+
+ kick_info *kick_info = &client->kick_info[index];
+ if (kick_info->ioc) {
+ aio_set_fd_handler(client->ioc->ctx, fd, false, NULL,
+ NULL, NULL, NULL);
+ kick_info->ioc = NULL;
+ g_free(kick_info->fioc);
+ kick_info->co = NULL;
+ kick_info->fioc = NULL;
+ }
+}
+
+
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
+ gpointer opaque)
+{
+ VuClient *client;
+ VuServer *server = opaque;
+ client = g_new0(VuClient, 1);
+
+ if (!vu_init_packed_data(&client->parent, server->max_queues,
+ sioc->fd, panic_cb,
+ set_watch, remove_watch,
+ server->vu_iface, &co_iface)) {
+ error_report("Failed to initialized libvhost-user");
+ g_free(client);
+ return;
+ }
+
+ client->server = server;
+ client->sioc = sioc;
+ client->kick_info = g_new0(struct kick_info, server->max_queues);
+ /*
+ * increase the object reference, so cioc will not freed by
+ * qio_net_listener_channel_func which will call object_unref(OBJECT(sioc))
+ */
+ object_ref(OBJECT(client->sioc));
+ qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
+ client->ioc = QIO_CHANNEL(sioc);
+ object_ref(OBJECT(client->ioc));
+ object_ref(OBJECT(sioc));
+ qio_channel_attach_aio_context(client->ioc, server->ctx);
+ qio_channel_set_blocking(QIO_CHANNEL(client->sioc), false, NULL);
+ client->closed = false;
+ QTAILQ_INSERT_TAIL(&server->clients, client, next);
+ vu_client_start(client);
+}
+
+
+void vhost_user_server_stop(VuServer *server)
+{
+ if (!server) {
+ return;
+ }
+
+ VuClient *client, *next;
+ QTAILQ_FOREACH_SAFE(client, &server->clients, next, next) {
+ if (!client->closed) {
+ close_client(client);
+ QTAILQ_REMOVE(&server->clients, client, next);
+ }
+ }
+
+ if (server->listener) {
+ qio_net_listener_disconnect(server->listener);
+ object_unref(OBJECT(server->listener));
+ }
+}
+
+static void detach_context(VuServer *server)
+{
+ VuClient *client;
+ int i;
+ QTAILQ_FOREACH(client, &server->clients, next) {
+ qio_channel_detach_aio_context(client->ioc);
+ for (i = 0; i < client->parent.max_queues; i++) {
+ if (client->kick_info[i].ioc) {
+ qio_channel_detach_aio_context(client->kick_info[i].ioc);
+ }
+ }
+ }
+}
+
+static void attach_context(VuServer *server, AioContext *ctx)
+{
+ VuClient *client;
+ int i;
+ QTAILQ_FOREACH(client, &server->clients, next) {
+ qio_channel_attach_aio_context(client->ioc, ctx);
+ if (client->co_trip) {
+ aio_co_schedule(ctx, client->co_trip);
+ }
+ for (i = 0; i < client->parent.max_queues; i++) {
+ if (client->kick_info[i].co) {
+ qio_channel_attach_aio_context(client->kick_info[i].ioc, ctx);
+ aio_co_schedule(ctx, client->kick_info[i].co);
+ }
+ }
+ }
+}
+void change_vu_context(AioContext *ctx, VuServer *server)
+{
+ AioContext *acquire_ctx = ctx ? ctx : server->ctx;
+ aio_context_acquire(acquire_ctx);
+ server->ctx = ctx ? ctx : qemu_get_aio_context();
+ if (ctx) {
+ attach_context(server, ctx);
+ } else {
+ detach_context(server);
+ }
+ aio_context_release(acquire_ctx);
+}
+
+
+VuServer *vhost_user_server_start(uint16_t max_queues,
+ char *unix_socket,
+ AioContext *ctx,
+ void *server_ptr,
+ void *device_panic_notifier,
+ const VuDevIface *vu_iface,
+ Error **errp)
+{
+ VuServer *server = g_new0(VuServer, 1);
+ server->ptr_in_device = server_ptr;
+ server->listener = qio_net_listener_new();
+ SocketAddress addr = {};
+ addr.u.q_unix.path = (char *) unix_socket;
+ addr.type = SOCKET_ADDRESS_TYPE_UNIX;
+ if (qio_net_listener_open_sync(server->listener, &addr, 1, errp) < 0) {
+ goto error;
+ }
+
+ qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
+
+ server->vu_iface = vu_iface;
+ server->max_queues = max_queues;
+ server->ctx = ctx;
+ qio_net_listener_set_client_func(server->listener,
+ vu_accept,
+ server,
+ NULL);
+
+ QTAILQ_INIT(&server->clients);
+ return server;
+error:
+ g_free(server);
+ return NULL;
+}
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
new file mode 100644
index 0000000000..2172d67937
--- /dev/null
+++ b/util/vhost-user-server.h
@@ -0,0 +1,56 @@
+#include "io/channel-socket.h"
+#include "io/channel-file.h"
+#include "io/net-listener.h"
+#include "contrib/libvhost-user/libvhost-user.h"
+#include "standard-headers/linux/virtio_blk.h"
+#include "qemu/error-report.h"
+
+typedef struct VuClient VuClient;
+
+typedef struct VuServer {
+ QIONetListener *listener;
+ AioContext *ctx;
+ QTAILQ_HEAD(, VuClient) clients;
+ void (*device_panic_notifier)(struct VuClient *client) ;
+ int max_queues;
+ const VuDevIface *vu_iface;
+ /*
+ * @ptr_in_device: VuServer pointer memory location in vhost-user device
+ * struct, so later container_of can be used to get device destruct
+ */
+ void *ptr_in_device;
+ bool close;
+} VuServer;
+
+typedef struct kick_info {
+ VuDev *vu_dev;
+ int fd; /*kick fd*/
+ long index; /*queue index*/
+ QIOChannel *ioc; /*I/O channel for kick fd*/
+ QIOChannelFile *fioc; /*underlying data channel for kick fd*/
+ Coroutine *co;
+} kick_info;
+
+struct VuClient {
+ VuDev parent;
+ VuServer *server;
+ QIOChannel *ioc; /* The current I/O channel */
+ QIOChannelSocket *sioc; /* The underlying data channel */
+ Coroutine *co_trip;
+ struct kick_info *kick_info;
+ QTAILQ_ENTRY(VuClient) next;
+ bool closed;
+};
+
+
+VuServer *vhost_user_server_start(uint16_t max_queues,
+ char *unix_socket,
+ AioContext *ctx,
+ void *server_ptr,
+ void *device_panic_notifier,
+ const VuDevIface *vu_iface,
+ Error **errp);
+
+void vhost_user_server_stop(VuServer *server);
+
+void change_vu_context(AioContext *ctx, VuServer *server);
--
2.25.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 3/5] vhost-user block device backend server
2020-02-12 9:51 [PATCH v3 0/5] vhost-user block device backend implementation Coiby Xu
2020-02-12 9:51 ` [PATCH v3 1/5] extend libvhost to support IOThread and coroutine Coiby Xu
2020-02-12 9:51 ` [PATCH v3 2/5] generic vhost user server Coiby Xu
@ 2020-02-12 9:51 ` Coiby Xu
2020-02-12 9:51 ` [PATCH v3 4/5] a standone-alone tool to directly share disk image file via vhost-user protocol Coiby Xu
2020-02-12 9:51 ` [PATCH v3 5/5] new qTest case to test the vhost-user-blk-server Coiby Xu
4 siblings, 0 replies; 7+ messages in thread
From: Coiby Xu @ 2020-02-12 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, bharatlkmlkvm, Coiby Xu, stefanha
By making use of libvhost, multiple block device drives can be exported
and each drive can serve multiple clients simultaneously.
Since vhost-user-server needs a block drive to be created first, delay
the creation of this object.
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
---
Makefile.target | 1 +
backends/Makefile.objs | 2 +
backends/vhost-user-blk-server.c | 716 +++++++++++++++++++++++++++++++
backends/vhost-user-blk-server.h | 21 +
vl.c | 4 +
5 files changed, 744 insertions(+)
create mode 100644 backends/vhost-user-blk-server.c
create mode 100644 backends/vhost-user-blk-server.h
diff --git a/Makefile.target b/Makefile.target
index 6e61f607b1..8c6c01eb3a 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -159,6 +159,7 @@ obj-y += monitor/
obj-y += qapi/
obj-y += memory.o
obj-y += memory_mapping.o
+obj-$(CONFIG_LINUX) += ../contrib/libvhost-user/libvhost-user.o
obj-y += migration/ram.o
LIBS := $(libs_softmmu) $(LIBS)
diff --git a/backends/Makefile.objs b/backends/Makefile.objs
index 28a847cd57..4e7be731e0 100644
--- a/backends/Makefile.objs
+++ b/backends/Makefile.objs
@@ -14,6 +14,8 @@ common-obj-y += cryptodev-vhost.o
common-obj-$(CONFIG_VHOST_CRYPTO) += cryptodev-vhost-user.o
endif
+common-obj-$(CONFIG_LINUX) += vhost-user-blk-server.o
+
common-obj-$(call land,$(CONFIG_VHOST_USER),$(CONFIG_VIRTIO)) += vhost-user.o
common-obj-$(CONFIG_LINUX) += hostmem-memfd.o
diff --git a/backends/vhost-user-blk-server.c b/backends/vhost-user-blk-server.c
new file mode 100644
index 0000000000..7293ad87be
--- /dev/null
+++ b/backends/vhost-user-blk-server.c
@@ -0,0 +1,716 @@
+/*
+ * Sharing QEMU block devices via vhost-user protocal
+ *
+ * Author: Coiby Xu <coiby.xu@gmail.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "block/block.h"
+#include "vhost-user-blk-server.h"
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+#include "sysemu/block-backend.h"
+
+enum {
+ VHOST_USER_BLK_MAX_QUEUES = 1,
+};
+struct virtio_blk_inhdr {
+ unsigned char status;
+};
+
+static QTAILQ_HEAD(, VuBlockDev) vu_block_devs =
+ QTAILQ_HEAD_INITIALIZER(vu_block_devs);
+
+
+typedef struct VuBlockReq {
+ VuVirtqElement *elem;
+ int64_t sector_num;
+ size_t size;
+ struct virtio_blk_inhdr *in;
+ struct virtio_blk_outhdr out;
+ VuClient *client;
+ struct VuVirtq *vq;
+} VuBlockReq;
+
+
+static void vu_block_req_complete(VuBlockReq *req)
+{
+ VuDev *vu_dev = &req->client->parent;
+
+ /* IO size with 1 extra status byte */
+ vu_queue_push(vu_dev, req->vq, req->elem,
+ req->size + 1);
+ vu_queue_notify(vu_dev, req->vq);
+
+ if (req->elem) {
+ free(req->elem);
+ }
+
+ g_free(req);
+}
+
+static VuBlockDev *get_vu_block_device_by_client(VuClient *client)
+{
+ return container_of(client->server->ptr_in_device, VuBlockDev, vu_server);
+}
+
+static int coroutine_fn
+vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
+ uint32_t iovcnt, uint32_t type)
+{
+ struct virtio_blk_discard_write_zeroes desc;
+ ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
+ if (unlikely(size != sizeof(desc))) {
+ error_report("Invalid size %ld, expect %ld", size, sizeof(desc));
+ return -1;
+ }
+
+ VuBlockDev *vdev_blk = get_vu_block_device_by_client(req->client);
+ uint64_t range[2] = { le64toh(desc.sector) << 9,
+ le32toh(desc.num_sectors) << 9 };
+ if (type == VIRTIO_BLK_T_DISCARD) {
+ if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
+ return 0;
+ }
+ } else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
+ if (blk_co_pwrite_zeroes(vdev_blk->backend,
+ range[0], range[1], 0) == 0) {
+ return 0;
+ }
+ }
+
+ return -1;
+}
+
+
+static void coroutine_fn vu_block_flush(VuBlockReq *req)
+{
+ VuBlockDev *vdev_blk = get_vu_block_device_by_client(req->client);
+ BlockBackend *backend = vdev_blk->backend;
+ blk_co_flush(backend);
+}
+
+
+static int coroutine_fn vu_block_virtio_process_req(VuClient *client,
+ VuVirtq *vq)
+{
+ VuDev *vu_dev = &client->parent;
+ VuVirtqElement *elem;
+ uint32_t type;
+ VuBlockReq *req;
+
+ VuBlockDev *vdev_blk = get_vu_block_device_by_client(client);
+ BlockBackend *backend = vdev_blk->backend;
+ elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
+ sizeof(VuBlockReq));
+ if (!elem) {
+ return -1;
+ }
+
+ struct iovec *in_iov = elem->in_sg;
+ struct iovec *out_iov = elem->out_sg;
+ unsigned in_num = elem->in_num;
+ unsigned out_num = elem->out_num;
+ /* refer to hw/block/virtio_blk.c */
+ if (elem->out_num < 1 || elem->in_num < 1) {
+ error_report("virtio-blk request missing headers");
+ free(elem);
+ return -1;
+ }
+
+ req = g_new0(VuBlockReq, 1);
+ req->client = client;
+ req->vq = vq;
+ req->elem = elem;
+
+ if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
+ sizeof(req->out)) != sizeof(req->out))) {
+ error_report("virtio-blk request outhdr too short");
+ goto err;
+ }
+
+ iov_discard_front(&out_iov, &out_num, sizeof(req->out));
+
+ if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
+ error_report("virtio-blk request inhdr too short");
+ goto err;
+ }
+
+ /* We always touch the last byte, so just see how big in_iov is. */
+ req->in = (void *)in_iov[in_num - 1].iov_base
+ + in_iov[in_num - 1].iov_len
+ - sizeof(struct virtio_blk_inhdr);
+ iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
+
+
+ type = le32toh(req->out.type);
+ switch (type & ~VIRTIO_BLK_T_BARRIER) {
+ case VIRTIO_BLK_T_IN:
+ case VIRTIO_BLK_T_OUT: {
+ ssize_t ret = 0;
+ bool is_write = type & VIRTIO_BLK_T_OUT;
+ req->sector_num = le64toh(req->out.sector);
+
+ int64_t offset = req->sector_num * vdev_blk->blk_size;
+ QEMUIOVector *qiov = g_new0(QEMUIOVector, 1);
+ if (is_write) {
+ qemu_iovec_init_external(qiov, out_iov, out_num);
+ ret = blk_co_pwritev(backend, offset, qiov->size,
+ qiov, 0);
+ } else {
+ qemu_iovec_init_external(qiov, in_iov, in_num);
+ ret = blk_co_preadv(backend, offset, qiov->size,
+ qiov, 0);
+ }
+ aio_wait_kick();
+ if (ret >= 0) {
+ req->in->status = VIRTIO_BLK_S_OK;
+ } else {
+ req->in->status = VIRTIO_BLK_S_IOERR;
+ }
+ vu_block_req_complete(req);
+ break;
+ }
+ case VIRTIO_BLK_T_FLUSH:
+ vu_block_flush(req);
+ req->in->status = VIRTIO_BLK_S_OK;
+ vu_block_req_complete(req);
+ break;
+ case VIRTIO_BLK_T_GET_ID: {
+ size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
+ VIRTIO_BLK_ID_BYTES);
+ snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk_server");
+ req->in->status = VIRTIO_BLK_S_OK;
+ req->size = elem->in_sg[0].iov_len;
+ vu_block_req_complete(req);
+ break;
+ }
+ case VIRTIO_BLK_T_DISCARD:
+ case VIRTIO_BLK_T_WRITE_ZEROES: {
+ int rc;
+ rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
+ out_num, type);
+ if (rc == 0) {
+ req->in->status = VIRTIO_BLK_S_OK;
+ } else {
+ req->in->status = VIRTIO_BLK_S_IOERR;
+ }
+ vu_block_req_complete(req);
+ break;
+ }
+ default:
+ req->in->status = VIRTIO_BLK_S_UNSUPP;
+ vu_block_req_complete(req);
+ break;
+ }
+
+ return 0;
+
+err:
+ free(elem);
+ g_free(req);
+ return -1;
+}
+
+
+static void vu_block_process_vq(VuDev *vu_dev, int idx)
+{
+ VuClient *client;
+ VuVirtq *vq;
+ int ret;
+
+ client = container_of(vu_dev, VuClient, parent);
+ assert(client);
+
+ vq = vu_get_queue(vu_dev, idx);
+ assert(vq);
+
+ while (1) {
+ ret = vu_block_virtio_process_req(client, vq);
+ if (ret) {
+ break;
+ }
+ }
+}
+
+static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
+{
+ VuVirtq *vq;
+
+ assert(vu_dev);
+
+ vq = vu_get_queue(vu_dev, idx);
+ vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
+}
+
+static uint64_t vu_block_get_features(VuDev *dev)
+{
+ uint64_t features;
+ VuClient *client = container_of(dev, VuClient, parent);
+ VuBlockDev *vdev_blk = get_vu_block_device_by_client(client);
+ features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
+ 1ull << VIRTIO_BLK_F_SEG_MAX |
+ 1ull << VIRTIO_BLK_F_TOPOLOGY |
+ 1ull << VIRTIO_BLK_F_BLK_SIZE |
+ 1ull << VIRTIO_BLK_F_FLUSH |
+ 1ull << VIRTIO_BLK_F_DISCARD |
+ 1ull << VIRTIO_BLK_F_WRITE_ZEROES |
+ 1ull << VIRTIO_BLK_F_CONFIG_WCE |
+ 1ull << VIRTIO_F_VERSION_1 |
+ 1ull << VIRTIO_RING_F_INDIRECT_DESC |
+ 1ull << VIRTIO_RING_F_EVENT_IDX |
+ 1ull << VHOST_USER_F_PROTOCOL_FEATURES;
+
+ if (!vdev_blk->writable) {
+ features |= 1ull << VIRTIO_BLK_F_RO;
+ }
+
+ return features;
+}
+
+static uint64_t vu_block_get_protocol_features(VuDev *dev)
+{
+ return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
+ 1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
+}
+
+static int
+vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
+{
+ VuClient *client = container_of(vu_dev, VuClient, parent);
+ VuBlockDev *vdev_blk = get_vu_block_device_by_client(client);
+ memcpy(config, &vdev_blk->blkcfg, len);
+
+ return 0;
+}
+
+static int
+vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
+ uint32_t offset, uint32_t size, uint32_t flags)
+{
+ VuClient *client = container_of(vu_dev, VuClient, parent);
+ VuBlockDev *vdev_blk = get_vu_block_device_by_client(client);
+ uint8_t wce;
+
+ /* don't support live migration */
+ if (flags != VHOST_SET_CONFIG_TYPE_MASTER) {
+ return -1;
+ }
+
+
+ if (offset != offsetof(struct virtio_blk_config, wce) ||
+ size != 1) {
+ return -1;
+ }
+
+ wce = *data;
+ if (wce == vdev_blk->blkcfg.wce) {
+ /* Do nothing as same with old configuration */
+ return 0;
+ }
+
+ vdev_blk->blkcfg.wce = wce;
+ blk_set_enable_write_cache(vdev_blk->backend, wce);
+ return 0;
+}
+
+
+/*
+ * When the client disconnects, it send a VHOST_USER_NONE request
+ * and vu_process_message will simple call exit which cause the VM
+ * to exit abruptly.
+ * To avoid this issue, process VHOST_USER_NONE request ahead
+ * of vu_process_message.
+ *
+ */
+static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
+{
+ if (vmsg->request == VHOST_USER_NONE) {
+ dev->panic(dev, "disconnect");
+ return true;
+ }
+ return false;
+}
+
+
+static const VuDevIface vu_block_iface = {
+ .get_features = vu_block_get_features,
+ .queue_set_started = vu_block_queue_set_started,
+ .get_protocol_features = vu_block_get_protocol_features,
+ .get_config = vu_block_get_config,
+ .set_config = vu_block_set_config,
+ .process_msg = vu_block_process_msg,
+};
+
+
+static void vu_block_free(VuBlockDev *vu_block_dev)
+{
+ if (!vu_block_dev) {
+ return;
+ }
+
+ blk_unref(vu_block_dev->backend);
+
+ if (vu_block_dev->next.tqe_circ.tql_prev) {
+ /*
+ * if vu_block_dev->next.tqe_circ.tql_prev = null,
+ * vu_block_dev hasn't been inserted into the queue and
+ * vu_block_free is called by obj->instance_finalize.
+ */
+ QTAILQ_REMOVE(&vu_block_devs, vu_block_dev, next);
+ }
+}
+
+static void blk_aio_attached(AioContext *ctx, void *opaque)
+{
+ VuBlockDev *vub_dev = opaque;
+ aio_context_acquire(ctx);
+ change_vu_context(ctx, vub_dev->vu_server);
+ aio_context_release(ctx);
+}
+
+
+static void blk_aio_detach(void *opaque)
+{
+ VuBlockDev *vub_dev = opaque;
+ AioContext *ctx = vub_dev->vu_server->ctx;
+ aio_context_acquire(ctx);
+ change_vu_context(NULL, vub_dev->vu_server);
+ aio_context_release(ctx);
+}
+
+
+
+static void
+vu_block_initialize_config(BlockDriverState *bs,
+ struct virtio_blk_config *config, uint32_t blk_size)
+{
+ config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
+ config->blk_size = blk_size;
+ config->size_max = 0;
+ config->seg_max = 128 - 2;
+ config->min_io_size = 1;
+ config->opt_io_size = 1;
+ config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
+ config->max_discard_sectors = 32768;
+ config->max_discard_seg = 1;
+ config->discard_sector_alignment = config->blk_size >> 9;
+ config->max_write_zeroes_sectors = 32768;
+ config->max_write_zeroes_seg = 1;
+}
+
+
+static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
+{
+
+ BlockBackend *blk;
+ Error *local_error = NULL;
+ const char *node_name = vu_block_device->node_name;
+ bool writable = vu_block_device->writable;
+ /*
+ * Don't allow resize while the vhost user server is running,
+ * otherwise we don't care what happens with the node.
+ */
+ uint64_t perm = BLK_PERM_CONSISTENT_READ;
+ int ret;
+
+ AioContext *ctx;
+
+ BlockDriverState *bs = bdrv_lookup_bs(node_name,
+ node_name,
+ &local_error);
+
+ if (!bs) {
+ error_propagate(errp, local_error);
+ return NULL;
+ }
+
+ if (bdrv_is_read_only(bs)) {
+ writable = false;
+ }
+
+ if (writable) {
+ perm |= BLK_PERM_WRITE;
+ }
+
+ ctx = bdrv_get_aio_context(bs);
+ aio_context_acquire(ctx);
+ bdrv_invalidate_cache(bs, NULL);
+ aio_context_release(ctx);
+
+ blk = blk_new(bdrv_get_aio_context(bs), perm,
+ BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
+ BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
+ ret = blk_insert_bs(blk, bs, errp);
+
+ if (ret < 0) {
+ goto fail;
+ }
+
+ blk_set_enable_write_cache(blk, false);
+
+ blk_set_allow_aio_context_change(blk, true);
+
+ vu_block_device->blkcfg.wce = 0;
+ vu_block_device->backend = blk;
+ if (!vu_block_device->blk_size) {
+ vu_block_device->blk_size = BDRV_SECTOR_SIZE;
+ }
+ vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
+ blk_set_guest_block_size(blk, vu_block_device->blk_size);
+ vu_block_initialize_config(bs, &vu_block_device->blkcfg,
+ vu_block_device->blk_size);
+ return vu_block_device;
+
+fail:
+ blk_unref(blk);
+ return NULL;
+}
+
+static void vhost_user_blk_server_free(VuBlockDev *vu_block_device)
+{
+ if (!vu_block_device) {
+ return;
+ }
+ vhost_user_server_stop(vu_block_device->vu_server);
+ vu_block_free(vu_block_device);
+
+}
+
+/*
+ * A exported drive can serve multiple multiple clients simutateously,
+ * thus no need to export the same drive twice.
+ *
+ */
+static VuBlockDev *vu_block_dev_find(const char *node_name)
+{
+ VuBlockDev *vu_block_device;
+ QTAILQ_FOREACH(vu_block_device, &vu_block_devs, next) {
+ if (strcmp(node_name, vu_block_device->node_name) == 0) {
+ return vu_block_device;
+ }
+ }
+
+ return NULL;
+}
+
+
+static VuBlockDev *vu_block_dev_find_by_unix_socket(const char *unix_socket)
+{
+ VuBlockDev *vu_block_device;
+ QTAILQ_FOREACH(vu_block_device, &vu_block_devs, next) {
+ if (strcmp(unix_socket, vu_block_device->unix_socket) == 0) {
+ return vu_block_device;
+ }
+ }
+
+ return NULL;
+}
+
+
+static void device_panic_notifier(VuClient *client)
+{
+ VuBlockDev *vdev_blk = get_vu_block_device_by_client(client);
+ if (vdev_blk->exit_when_panic) {
+ vdev_blk->vu_server->close = true;
+ }
+}
+
+static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
+ Error **errp)
+{
+
+ const char *name = vu_block_device->node_name;
+ char *unix_socket = vu_block_device->unix_socket;
+ if (vu_block_dev_find(name) ||
+ vu_block_dev_find_by_unix_socket(unix_socket)) {
+ error_setg(errp, "Vhost user server with name '%s' or "
+ "with socket_path '%s' has already been started",
+ name, unix_socket);
+ return;
+ }
+
+ if (!vu_block_init(vu_block_device, errp)) {
+ return;
+ }
+
+
+ AioContext *ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
+ VuServer *vu_server = vhost_user_server_start(VHOST_USER_BLK_MAX_QUEUES,
+ unix_socket,
+ ctx,
+ &vu_block_device->vu_server,
+ device_panic_notifier,
+ &vu_block_iface,
+ errp);
+
+ if (!vu_server) {
+ goto error;
+ }
+ vu_block_device->vu_server = vu_server;
+ QTAILQ_INSERT_TAIL(&vu_block_devs, vu_block_device, next);
+ blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
+ blk_aio_detach, vu_block_device);
+ return;
+
+ error:
+ vu_block_free(vu_block_device);
+}
+
+static void vu_set_node_name(Object *obj, const char *value, Error **errp)
+{
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+
+ if (vus->node_name) {
+ error_setg(errp, "evdev property already set");
+ return;
+ }
+
+ vus->node_name = g_strdup(value);
+}
+
+static char *vu_get_node_name(Object *obj, Error **errp)
+{
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+ return g_strdup(vus->node_name);
+}
+
+
+static void vu_set_unix_socket(Object *obj, const char *value,
+ Error **errp)
+{
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+
+ if (vus->unix_socket) {
+ error_setg(errp, "unix_socket property already set");
+ return;
+ }
+
+ vus->unix_socket = g_strdup(value);
+}
+
+static char *vu_get_unix_socket(Object *obj, Error **errp)
+{
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+ return g_strdup(vus->unix_socket);
+}
+
+static bool vu_get_block_writable(Object *obj, Error **errp)
+{
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+ return vus->writable;
+}
+
+static void vu_set_block_writable(Object *obj, bool value, Error **errp)
+{
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+
+ vus->writable = value;
+}
+
+static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
+ void *opaque, Error **errp)
+{
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+ uint32_t value = vus->blk_size;
+
+ visit_type_uint32(v, name, &value, errp);
+}
+
+static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
+ void *opaque, Error **errp)
+{
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+
+ Error *local_err = NULL;
+ uint32_t value;
+
+ visit_type_uint32(v, name, &value, &local_err);
+ if (local_err) {
+ goto out;
+ }
+ if (value != BDRV_SECTOR_SIZE && value != 4096) {
+ error_setg(&local_err,
+ "Property '%s.%s' can only take value 512 or 4096",
+ object_get_typename(obj), name);
+ goto out;
+ }
+
+ vus->blk_size = value;
+
+out:
+ error_propagate(errp, local_err);
+ vus->blk_size = value;
+}
+
+static void vhost_user_blk_server_instance_init(Object *obj)
+{
+
+ object_property_add_bool(obj, "writable",
+ vu_get_block_writable,
+ vu_set_block_writable, NULL);
+
+ object_property_add_str(obj, "node-name",
+ vu_get_node_name,
+ vu_set_node_name, NULL);
+
+ object_property_add_str(obj, "unix-socket",
+ vu_get_unix_socket,
+ vu_set_unix_socket, NULL);
+
+ object_property_add(obj, "blk-size", "uint32",
+ vu_get_blk_size, vu_set_blk_size,
+ NULL, NULL, NULL);
+
+}
+
+static void vhost_user_blk_server_instance_finalize(Object *obj)
+{
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
+
+ blk_remove_aio_context_notifier(vub->backend, blk_aio_attached,
+ blk_aio_detach, vub);
+ vhost_user_blk_server_free(vub);
+}
+
+static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
+{
+ Error *local_error = NULL;
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
+
+ vhost_user_blk_server_start(vub, &local_error);
+
+ if (local_error) {
+ error_propagate(errp, local_error);
+ return;
+ }
+}
+
+static void vhost_user_blk_server_class_init(ObjectClass *klass,
+ void *class_data)
+{
+ UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
+ ucc->complete = vhost_user_blk_server_complete;
+}
+
+static const TypeInfo vhost_user_blk_server_info = {
+ .name = TYPE_VHOST_USER_BLK_SERVER,
+ .parent = TYPE_OBJECT,
+ .instance_size = sizeof(VuBlockDev),
+ .instance_init = vhost_user_blk_server_instance_init,
+ .instance_finalize = vhost_user_blk_server_instance_finalize,
+ .class_init = vhost_user_blk_server_class_init,
+ .interfaces = (InterfaceInfo[]) {
+ {TYPE_USER_CREATABLE},
+ {}
+ },
+};
+
+static void vhost_user_blk_server_register_types(void)
+{
+ type_register_static(&vhost_user_blk_server_info);
+}
+
+type_init(vhost_user_blk_server_register_types)
diff --git a/backends/vhost-user-blk-server.h b/backends/vhost-user-blk-server.h
new file mode 100644
index 0000000000..e572ae3801
--- /dev/null
+++ b/backends/vhost-user-blk-server.h
@@ -0,0 +1,21 @@
+#include "util/vhost-user-server.h"
+typedef struct VuBlockDev VuBlockDev;
+#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
+#define VHOST_USER_BLK_SERVER(obj) \
+ OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
+
+/* vhost user block device */
+struct VuBlockDev {
+ Object parent_obj;
+ char *node_name;
+ char *unix_socket;
+ bool exit_when_panic;
+ AioContext *ctx;
+ VuServer *vu_server;
+ uint32_t blk_size;
+ BlockBackend *backend;
+ QIOChannelSocket *sioc;
+ QTAILQ_ENTRY(VuBlockDev) next;
+ struct virtio_blk_config blkcfg;
+ bool writable;
+};
diff --git a/vl.c b/vl.c
index 7dcb0879c4..d0d593c0d3 100644
--- a/vl.c
+++ b/vl.c
@@ -2572,6 +2572,10 @@ static bool object_create_initial(const char *type, QemuOpts *opts)
}
#endif
+ /* Reason: vhost-user-server property "name" */
+ if (g_str_equal(type, "vhost-user-blk-server")) {
+ return false;
+ }
/*
* Reason: filter-* property "netdev" etc.
*/
--
2.25.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 4/5] a standone-alone tool to directly share disk image file via vhost-user protocol
2020-02-12 9:51 [PATCH v3 0/5] vhost-user block device backend implementation Coiby Xu
` (2 preceding siblings ...)
2020-02-12 9:51 ` [PATCH v3 3/5] vhost-user block device backend server Coiby Xu
@ 2020-02-12 9:51 ` Coiby Xu
2020-02-13 7:05 ` Coiby Xu
2020-02-12 9:51 ` [PATCH v3 5/5] new qTest case to test the vhost-user-blk-server Coiby Xu
4 siblings, 1 reply; 7+ messages in thread
From: Coiby Xu @ 2020-02-12 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, bharatlkmlkvm, Coiby Xu, stefanha
vhost-user-blk could have played as vhost-user backend but it only supports raw
file and don't support VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES
operations on raw file (ioctl(fd, BLKDISCARD) is only valid for real
block device).
In the future Kevin's qemu-storage-daemon will be used to replace this
tool.
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
---
Makefile | 4 +
configure | 3 +
qemu-vu.c | 252 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 259 insertions(+)
create mode 100644 qemu-vu.c
diff --git a/Makefile b/Makefile
index f0e1a2fc1d..0bfd2f1ddd 100644
--- a/Makefile
+++ b/Makefile
@@ -572,6 +572,10 @@ qemu-img.o: qemu-img-cmds.h
qemu-img$(EXESUF): qemu-img.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
qemu-nbd$(EXESUF): qemu-nbd.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
+
+ifdef CONFIG_LINUX
+qemu-vu$(EXESUF): qemu-vu.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS) libvhost-user.a
+endif
qemu-io$(EXESUF): qemu-io.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o $(COMMON_LDADDS)
diff --git a/configure b/configure
index 115dc38085..e87c9a5587 100755
--- a/configure
+++ b/configure
@@ -6217,6 +6217,9 @@ if test "$want_tools" = "yes" ; then
if [ "$linux" = "yes" -o "$bsd" = "yes" -o "$solaris" = "yes" ] ; then
tools="qemu-nbd\$(EXESUF) $tools"
fi
+ if [ "$linux" = "yes" ] ; then
+ tools="qemu-vu\$(EXESUF) $tools"
+ fi
if [ "$ivshmem" = "yes" ]; then
tools="ivshmem-client\$(EXESUF) ivshmem-server\$(EXESUF) $tools"
fi
diff --git a/qemu-vu.c b/qemu-vu.c
new file mode 100644
index 0000000000..dd1032b205
--- /dev/null
+++ b/qemu-vu.c
@@ -0,0 +1,252 @@
+/*
+ * Copyright (C) 2020 Coiby Xu <coiby.xu@gmail.com>
+ *
+ * standone-alone vhost-user-blk device server backend
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; under version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include <getopt.h>
+#include <libgen.h>
+#include "backends/vhost-user-blk-server.h"
+#include "block/block_int.h"
+#include "io/net-listener.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
+#include "qemu/config-file.h"
+#include "qemu/cutils.h"
+#include "qemu/main-loop.h"
+#include "qemu/module.h"
+#include "qemu/option.h"
+#include "qemu-common.h"
+#include "qemu-version.h"
+#include "qom/object_interfaces.h"
+#include "sysemu/block-backend.h"
+#define QEMU_VU_OPT_CACHE 256
+#define QEMU_VU_OPT_AIO 257
+#define QEMU_VU_OBJ_ID "vu_disk"
+static QemuOptsList qemu_object_opts = {
+ .name = "object",
+ .implied_opt_name = "qom-type",
+ .head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
+ .desc = {
+ { }
+ },
+};
+static char *srcpath;
+
+static void usage(const char *name)
+{
+ (printf) (
+"Usage: %s [OPTIONS] FILE\n"
+" or: %s -L [OPTIONS]\n"
+"QEMU Vhost-user Server Utility\n"
+"\n"
+" -h, --help display this help and exit\n"
+" -V, --version output version information and exit\n"
+"\n"
+"Connection properties:\n"
+" -k, --socket=PATH path to the unix socket\n"
+"\n"
+"General purpose options:\n"
+" -e, -- exit-panic When the panic callback is called, the program\n"
+" will exit. Useful for make check-qtest.\n"
+"\n"
+"Block device options:\n"
+" -f, --format=FORMAT set image format (raw, qcow2, ...)\n"
+" -r, --read-only export read-only\n"
+" -n, --nocache disable host cache\n"
+" --cache=MODE set cache mode (none, writeback, ...)\n"
+" --aio=MODE set AIO mode (native or threads)\n"
+"\n"
+QEMU_HELP_BOTTOM "\n"
+ , name, name);
+}
+
+static void version(const char *name)
+{
+ printf(
+"%s " QEMU_FULL_VERSION "\n"
+"Written by Coiby Xu, based on qemu-nbd by Anthony Liguori\n"
+"\n"
+QEMU_COPYRIGHT "\n"
+"This is free software; see the source for copying conditions. There is NO\n"
+"warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n"
+ , name);
+}
+
+static VuBlockDev *vu_block_device;
+
+static void vus_shutdown(void)
+{
+
+ Error *local_err = NULL;
+ job_cancel_sync_all();
+ bdrv_close_all();
+ user_creatable_del(QEMU_VU_OBJ_ID, &local_err);
+}
+
+int main(int argc, char **argv)
+{
+ BlockBackend *blk;
+ BlockDriverState *bs;
+ bool readonly = false;
+ char *sockpath = NULL;
+ const char *sopt = "hVrnvek:f:";
+ struct option lopt[] = {
+ { "help", no_argument, NULL, 'h' },
+ { "version", no_argument, NULL, 'V' },
+ { "exit-panic", no_argument, NULL, 'e' },
+ { "socket", required_argument, NULL, 'k' },
+ { "read-only", no_argument, NULL, 'r' },
+ { "nocache", no_argument, NULL, 'n' },
+ { "cache", required_argument, NULL, QEMU_VU_OPT_CACHE },
+ { "aio", required_argument, NULL, QEMU_VU_OPT_AIO },
+ { "format", required_argument, NULL, 'f' },
+ { NULL, 0, NULL, 0 }
+ };
+ int ch;
+ int opt_ind = 0;
+ int flags = BDRV_O_RDWR;
+ bool seen_cache = false;
+ bool seen_aio = false;
+ const char *fmt = NULL;
+ Error *local_err = NULL;
+ QDict *options = NULL;
+ bool writethrough = true;
+ bool exit_when_panic = false;
+
+ error_init(argv[0]);
+
+ module_call_init(MODULE_INIT_QOM);
+ qemu_init_exec_dir(argv[0]);
+
+ while ((ch = getopt_long(argc, argv, sopt, lopt, &opt_ind)) != -1) {
+ switch (ch) {
+ case 'e':
+ exit_when_panic = true;
+ break;
+ case 'n':
+ optarg = (char *) "none";
+ /* fallthrough */
+ case QEMU_VU_OPT_CACHE:
+ if (seen_cache) {
+ error_report("-n and --cache can only be specified once");
+ exit(EXIT_FAILURE);
+ }
+ seen_cache = true;
+ if (bdrv_parse_cache_mode(optarg, &flags, &writethrough) == -1) {
+ error_report("Invalid cache mode `%s'", optarg);
+ exit(EXIT_FAILURE);
+ }
+ break;
+ case QEMU_VU_OPT_AIO:
+ if (seen_aio) {
+ error_report("--aio can only be specified once");
+ exit(EXIT_FAILURE);
+ }
+ seen_aio = true;
+ if (!strcmp(optarg, "native")) {
+ flags |= BDRV_O_NATIVE_AIO;
+ } else if (!strcmp(optarg, "threads")) {
+ /* this is the default */
+ } else {
+ error_report("invalid aio mode `%s'", optarg);
+ exit(EXIT_FAILURE);
+ }
+ break;
+ case 'r':
+ readonly = true;
+ flags &= ~BDRV_O_RDWR;
+ break;
+ case 'k':
+ sockpath = optarg;
+ if (sockpath[0] != '/') {
+ error_report("socket path must be absolute");
+ exit(EXIT_FAILURE);
+ }
+ break;
+ case 'f':
+ fmt = optarg;
+ break;
+ case 'V':
+ version(argv[0]);
+ exit(0);
+ break;
+ case 'h':
+ usage(argv[0]);
+ exit(0);
+ break;
+ case '?':
+ error_report("Try `%s --help' for more information.", argv[0]);
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ if ((argc - optind) != 1) {
+ error_report("Invalid number of arguments");
+ error_printf("Try `%s --help' for more information.\n", argv[0]);
+ exit(EXIT_FAILURE);
+ }
+ if (qemu_init_main_loop(&local_err)) {
+ error_report_err(local_err);
+ exit(EXIT_FAILURE);
+ }
+ bdrv_init();
+
+ srcpath = argv[optind];
+ if (fmt) {
+ options = qdict_new();
+ qdict_put_str(options, "driver", fmt);
+ }
+ blk = blk_new_open(srcpath, NULL, options, flags, &local_err);
+
+ if (!blk) {
+ error_reportf_err(local_err, "Failed to blk_new_open '%s': ",
+ argv[optind]);
+ exit(EXIT_FAILURE);
+ }
+ bs = blk_bs(blk);
+
+ char buf[300];
+ snprintf(buf, 300, "%s,id=%s,node-name=%s,unix-socket=%s,writable=%s",
+ TYPE_VHOST_USER_BLK_SERVER, QEMU_VU_OBJ_ID, bdrv_get_node_name(bs),
+ sockpath, !readonly ? "on" : "off");
+ /* While calling user_creatable_del, 'object' group is required */
+ qemu_add_opts(&qemu_object_opts);
+ QemuOpts *opts = qemu_opts_parse(&qemu_object_opts, buf, true, &local_err);
+ if (local_err) {
+ error_report_err(local_err);
+ goto error;
+ }
+
+ Object *obj = user_creatable_add_opts(opts, &local_err);
+
+ if (local_err) {
+ error_report_err(local_err);
+ goto error;
+ }
+
+ vu_block_device = VHOST_USER_BLK_SERVER(obj);
+ vu_block_device->exit_when_panic = exit_when_panic;
+
+ do {
+ main_loop_wait(false);
+ } while (!vu_block_device->exit_when_panic || !vu_block_device->vu_server->close);
+
+ error:
+ vus_shutdown();
+ exit(EXIT_SUCCESS);
+}
--
2.25.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 5/5] new qTest case to test the vhost-user-blk-server
2020-02-12 9:51 [PATCH v3 0/5] vhost-user block device backend implementation Coiby Xu
` (3 preceding siblings ...)
2020-02-12 9:51 ` [PATCH v3 4/5] a standone-alone tool to directly share disk image file via vhost-user protocol Coiby Xu
@ 2020-02-12 9:51 ` Coiby Xu
4 siblings, 0 replies; 7+ messages in thread
From: Coiby Xu @ 2020-02-12 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, bharatlkmlkvm, Coiby Xu, stefanha
This test case has the same tests as tests/virtio-blk-test.c except for
tests have block_resize.
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
---
tests/libqos/vhost-user-blk.c | 126 ++++++
tests/libqos/vhost-user-blk.h | 44 +++
tests/vhost-user-blk-test.c | 694 ++++++++++++++++++++++++++++++++++
3 files changed, 864 insertions(+)
create mode 100644 tests/libqos/vhost-user-blk.c
create mode 100644 tests/libqos/vhost-user-blk.h
create mode 100644 tests/vhost-user-blk-test.c
diff --git a/tests/libqos/vhost-user-blk.c b/tests/libqos/vhost-user-blk.c
new file mode 100644
index 0000000000..ec46b7ddb4
--- /dev/null
+++ b/tests/libqos/vhost-user-blk.c
@@ -0,0 +1,126 @@
+/*
+ * libqos driver framework
+ *
+ * Copyright (c) 2018 Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2 as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "qemu/module.h"
+#include "standard-headers/linux/virtio_blk.h"
+#include "libqos/qgraph.h"
+#include "libqos/vhost-user-blk.h"
+
+#define PCI_SLOT 0x04
+#define PCI_FN 0x00
+
+/* virtio-blk-device */
+static void *qvhost_user_blk_get_driver(QVhostUserBlk *v_blk,
+ const char *interface)
+{
+ if (!g_strcmp0(interface, "vhost-user-blk")) {
+ return v_blk;
+ }
+ if (!g_strcmp0(interface, "virtio")) {
+ return v_blk->vdev;
+ }
+
+ fprintf(stderr, "%s not present in vhost-user-blk-device\n", interface);
+ g_assert_not_reached();
+}
+
+static void *qvhost_user_blk_device_get_driver(void *object,
+ const char *interface)
+{
+ QVhostUserBlkDevice *v_blk = object;
+ return qvhost_user_blk_get_driver(&v_blk->blk, interface);
+}
+
+static void *vhost_user_blk_device_create(void *virtio_dev,
+ QGuestAllocator *t_alloc,
+ void *addr)
+{
+ QVhostUserBlkDevice *vhost_user_blk = g_new0(QVhostUserBlkDevice, 1);
+ QVhostUserBlk *interface = &vhost_user_blk->blk;
+
+ interface->vdev = virtio_dev;
+
+ vhost_user_blk->obj.get_driver = qvhost_user_blk_device_get_driver;
+
+ return &vhost_user_blk->obj;
+}
+
+/* virtio-blk-pci */
+static void *qvhost_user_blk_pci_get_driver(void *object, const char *interface)
+{
+ QVhostUserBlkPCI *v_blk = object;
+ if (!g_strcmp0(interface, "pci-device")) {
+ return v_blk->pci_vdev.pdev;
+ }
+ return qvhost_user_blk_get_driver(&v_blk->blk, interface);
+}
+
+static void *vhost_user_blk_pci_create(void *pci_bus, QGuestAllocator *t_alloc,
+ void *addr)
+{
+ QVhostUserBlkPCI *vhost_user_blk = g_new0(QVhostUserBlkPCI, 1);
+ QVhostUserBlk *interface = &vhost_user_blk->blk;
+ QOSGraphObject *obj = &vhost_user_blk->pci_vdev.obj;
+
+ virtio_pci_init(&vhost_user_blk->pci_vdev, pci_bus, addr);
+ interface->vdev = &vhost_user_blk->pci_vdev.vdev;
+
+ g_assert_cmphex(interface->vdev->device_type, ==, VIRTIO_ID_BLOCK);
+
+ obj->get_driver = qvhost_user_blk_pci_get_driver;
+
+ return obj;
+}
+
+static void vhost_user_blk_register_nodes(void)
+{
+ /*
+ * FIXME: every test using these two nodes needs to setup a
+ * -drive,id=drive0 otherwise QEMU is not going to start.
+ * Therefore, we do not include "produces" edge for virtio
+ * and pci-device yet.
+ */
+
+ char *arg = g_strdup_printf("id=drv0,chardev=char1,addr=%x.%x",
+ PCI_SLOT, PCI_FN);
+
+ QPCIAddress addr = {
+ .devfn = QPCI_DEVFN(PCI_SLOT, PCI_FN),
+ };
+
+ QOSGraphEdgeOptions opts = { };
+
+ /* virtio-blk-device */
+ /** opts.extra_device_opts = "drive=drive0"; */
+ qos_node_create_driver("vhost-user-blk-device", vhost_user_blk_device_create);
+ qos_node_consumes("vhost-user-blk-device", "virtio-bus", &opts);
+ qos_node_produces("vhost-user-blk-device", "vhost-user-blk");
+
+ /* virtio-blk-pci */
+ opts.extra_device_opts = arg;
+ add_qpci_address(&opts, &addr);
+ qos_node_create_driver("vhost-user-blk-pci", vhost_user_blk_pci_create);
+ qos_node_consumes("vhost-user-blk-pci", "pci-bus", &opts);
+ qos_node_produces("vhost-user-blk-pci", "vhost-user-blk");
+
+ g_free(arg);
+}
+
+libqos_init(vhost_user_blk_register_nodes);
diff --git a/tests/libqos/vhost-user-blk.h b/tests/libqos/vhost-user-blk.h
new file mode 100644
index 0000000000..ef4ef09cca
--- /dev/null
+++ b/tests/libqos/vhost-user-blk.h
@@ -0,0 +1,44 @@
+/*
+ * libqos driver framework
+ *
+ * Copyright (c) 2018 Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2 as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#ifndef TESTS_LIBQOS_VHOST_USER_BLK_H
+#define TESTS_LIBQOS_VHOST_USER_BLK_H
+
+#include "libqos/qgraph.h"
+#include "libqos/virtio.h"
+#include "libqos/virtio-pci.h"
+
+typedef struct QVhostUserBlk QVhostUserBlk;
+typedef struct QVhostUserBlkPCI QVhostUserBlkPCI;
+typedef struct QVhostUserBlkDevice QVhostUserBlkDevice;
+
+struct QVhostUserBlk {
+ QVirtioDevice *vdev;
+};
+
+struct QVhostUserBlkPCI {
+ QVirtioPCIDevice pci_vdev;
+ QVhostUserBlk blk;
+};
+
+struct QVhostUserBlkDevice {
+ QOSGraphObject obj;
+ QVhostUserBlk blk;
+};
+
+#endif
diff --git a/tests/vhost-user-blk-test.c b/tests/vhost-user-blk-test.c
new file mode 100644
index 0000000000..528f034b55
--- /dev/null
+++ b/tests/vhost-user-blk-test.c
@@ -0,0 +1,694 @@
+/*
+ * QTest testcase for VirtIO Block Device
+ *
+ * Copyright (c) 2014 SUSE LINUX Products GmbH
+ * Copyright (c) 2014 Marc Marí
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest-single.h"
+#include "qemu/bswap.h"
+#include "qemu/module.h"
+#include "standard-headers/linux/virtio_blk.h"
+#include "standard-headers/linux/virtio_pci.h"
+#include "libqos/qgraph.h"
+#include "libqos/vhost-user-blk.h"
+#include "libqos/libqos-pc.h"
+
+/* TODO actually test the results and get rid of this */
+#define qmp_discard_response(...) qobject_unref(qmp(__VA_ARGS__))
+
+#define TEST_IMAGE_SIZE (64 * 1024 * 1024)
+#define QVIRTIO_BLK_TIMEOUT_US (30 * 1000 * 1000)
+#define PCI_SLOT_HP 0x06
+
+typedef struct QVirtioBlkReq {
+ uint32_t type;
+ uint32_t ioprio;
+ uint64_t sector;
+ char *data;
+ uint8_t status;
+} QVirtioBlkReq;
+
+
+#ifdef HOST_WORDS_BIGENDIAN
+static const bool host_is_big_endian = true;
+#else
+static const bool host_is_big_endian; /* false */
+#endif
+
+static inline void virtio_blk_fix_request(QVirtioDevice *d, QVirtioBlkReq *req)
+{
+ if (qvirtio_is_big_endian(d) != host_is_big_endian) {
+ req->type = bswap32(req->type);
+ req->ioprio = bswap32(req->ioprio);
+ req->sector = bswap64(req->sector);
+ }
+}
+
+
+static inline void virtio_blk_fix_dwz_hdr(QVirtioDevice *d,
+ struct virtio_blk_discard_write_zeroes *dwz_hdr)
+{
+ if (qvirtio_is_big_endian(d) != host_is_big_endian) {
+ dwz_hdr->sector = bswap64(dwz_hdr->sector);
+ dwz_hdr->num_sectors = bswap32(dwz_hdr->num_sectors);
+ dwz_hdr->flags = bswap32(dwz_hdr->flags);
+ }
+}
+
+static uint64_t virtio_blk_request(QGuestAllocator *alloc, QVirtioDevice *d,
+ QVirtioBlkReq *req, uint64_t data_size)
+{
+ uint64_t addr;
+ uint8_t status = 0xFF;
+
+ switch (req->type) {
+ case VIRTIO_BLK_T_IN:
+ case VIRTIO_BLK_T_OUT:
+ g_assert_cmpuint(data_size % 512, ==, 0);
+ break;
+ case VIRTIO_BLK_T_DISCARD:
+ case VIRTIO_BLK_T_WRITE_ZEROES:
+ g_assert_cmpuint(data_size %
+ sizeof(struct virtio_blk_discard_write_zeroes), ==, 0);
+ break;
+ default:
+ g_assert_cmpuint(data_size, ==, 0);
+ }
+
+ addr = guest_alloc(alloc, sizeof(*req) + data_size);
+
+ virtio_blk_fix_request(d, req);
+
+ memwrite(addr, req, 16);
+ memwrite(addr + 16, req->data, data_size);
+ memwrite(addr + 16 + data_size, &status, sizeof(status));
+
+ return addr;
+}
+
+/* Returns the request virtqueue so the caller can perform further tests */
+static QVirtQueue *test_basic(QVirtioDevice *dev, QGuestAllocator *alloc)
+{
+ QVirtioBlkReq req;
+ uint64_t req_addr;
+ uint64_t capacity;
+ uint64_t features;
+ uint32_t free_head;
+ uint8_t status;
+ char *data;
+ QTestState *qts = global_qtest;
+ QVirtQueue *vq;
+
+ features = qvirtio_get_features(dev);
+ features = features & ~(QVIRTIO_F_BAD_FEATURE |
+ (1u << VIRTIO_RING_F_INDIRECT_DESC) |
+ (1u << VIRTIO_RING_F_EVENT_IDX) |
+ (1u << VIRTIO_BLK_F_SCSI));
+ qvirtio_set_features(dev, features);
+
+ capacity = qvirtio_config_readq(dev, 0);
+ g_assert_cmpint(capacity, ==, TEST_IMAGE_SIZE / 512);
+
+ vq = qvirtqueue_setup(dev, alloc, 0);
+
+ qvirtio_set_driver_ok(dev);
+
+ /* Write and read with 3 descriptor layout */
+ /* Write request */
+ req.type = VIRTIO_BLK_T_OUT;
+ req.ioprio = 1;
+ req.sector = 0;
+ req.data = g_malloc0(512);
+ strcpy(req.data, "TEST");
+
+ req_addr = virtio_blk_request(alloc, dev, &req, 512);
+
+ g_free(req.data);
+
+ free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 16, 512, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 528, 1, true, false);
+
+ qvirtqueue_kick(qts, dev, vq, free_head);
+
+ qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+ QVIRTIO_BLK_TIMEOUT_US);
+ status = readb(req_addr + 528);
+ g_assert_cmpint(status, ==, 0);
+
+ guest_free(alloc, req_addr);
+
+ /* Read request */
+ req.type = VIRTIO_BLK_T_IN;
+ req.ioprio = 1;
+ req.sector = 0;
+ req.data = g_malloc0(512);
+
+ req_addr = virtio_blk_request(alloc, dev, &req, 512);
+
+ g_free(req.data);
+
+ free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 16, 512, true, true);
+ qvirtqueue_add(qts, vq, req_addr + 528, 1, true, false);
+
+ qvirtqueue_kick(qts, dev, vq, free_head);
+
+ qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+ QVIRTIO_BLK_TIMEOUT_US);
+ status = readb(req_addr + 528);
+ g_assert_cmpint(status, ==, 0);
+
+ data = g_malloc0(512);
+ memread(req_addr + 16, data, 512);
+ g_assert_cmpstr(data, ==, "TEST");
+ g_free(data);
+
+ guest_free(alloc, req_addr);
+
+ if (features & (1u << VIRTIO_BLK_F_WRITE_ZEROES)) {
+ struct virtio_blk_discard_write_zeroes dwz_hdr;
+ void *expected;
+
+ /*
+ * WRITE_ZEROES request on the same sector of previous test where
+ * we wrote "TEST".
+ */
+ req.type = VIRTIO_BLK_T_WRITE_ZEROES;
+ req.data = (char *) &dwz_hdr;
+ dwz_hdr.sector = 0;
+ dwz_hdr.num_sectors = 1;
+ dwz_hdr.flags = 0;
+
+ virtio_blk_fix_dwz_hdr(dev, &dwz_hdr);
+
+ req_addr = virtio_blk_request(alloc, dev, &req, sizeof(dwz_hdr));
+
+ free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 16, sizeof(dwz_hdr), false, true);
+ qvirtqueue_add(qts, vq, req_addr + 16 + sizeof(dwz_hdr), 1, true,
+ false);
+
+ qvirtqueue_kick(qts, dev, vq, free_head);
+
+ qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+ QVIRTIO_BLK_TIMEOUT_US);
+ status = readb(req_addr + 16 + sizeof(dwz_hdr));
+ g_assert_cmpint(status, ==, 0);
+
+ guest_free(alloc, req_addr);
+
+ /* Read request to check if the sector contains all zeroes */
+ req.type = VIRTIO_BLK_T_IN;
+ req.ioprio = 1;
+ req.sector = 0;
+ req.data = g_malloc0(512);
+
+ req_addr = virtio_blk_request(alloc, dev, &req, 512);
+
+ g_free(req.data);
+
+ free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 16, 512, true, true);
+ qvirtqueue_add(qts, vq, req_addr + 528, 1, true, false);
+
+ qvirtqueue_kick(qts, dev, vq, free_head);
+
+ qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+ QVIRTIO_BLK_TIMEOUT_US);
+ status = readb(req_addr + 528);
+ g_assert_cmpint(status, ==, 0);
+
+ data = g_malloc(512);
+ expected = g_malloc0(512);
+ memread(req_addr + 16, data, 512);
+ g_assert_cmpmem(data, 512, expected, 512);
+ g_free(expected);
+ g_free(data);
+
+ guest_free(alloc, req_addr);
+ }
+
+ if (features & (1u << VIRTIO_BLK_F_DISCARD)) {
+ struct virtio_blk_discard_write_zeroes dwz_hdr;
+
+ req.type = VIRTIO_BLK_T_DISCARD;
+ req.data = (char *) &dwz_hdr;
+ dwz_hdr.sector = 0;
+ dwz_hdr.num_sectors = 1;
+ dwz_hdr.flags = 0;
+
+ virtio_blk_fix_dwz_hdr(dev, &dwz_hdr);
+
+ req_addr = virtio_blk_request(alloc, dev, &req, sizeof(dwz_hdr));
+
+ free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 16, sizeof(dwz_hdr), false, true);
+ qvirtqueue_add(qts, vq, req_addr + 16 + sizeof(dwz_hdr),
+ 1, true, false);
+
+ qvirtqueue_kick(qts, dev, vq, free_head);
+
+ qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+ QVIRTIO_BLK_TIMEOUT_US);
+ status = readb(req_addr + 16 + sizeof(dwz_hdr));
+ g_assert_cmpint(status, ==, 0);
+
+ guest_free(alloc, req_addr);
+ }
+
+ if (features & (1u << VIRTIO_F_ANY_LAYOUT)) {
+ /* Write and read with 2 descriptor layout */
+ /* Write request */
+ req.type = VIRTIO_BLK_T_OUT;
+ req.ioprio = 1;
+ req.sector = 1;
+ req.data = g_malloc0(512);
+ strcpy(req.data, "TEST");
+
+ req_addr = virtio_blk_request(alloc, dev, &req, 512);
+
+ g_free(req.data);
+
+ free_head = qvirtqueue_add(qts, vq, req_addr, 528, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 528, 1, true, false);
+ qvirtqueue_kick(qts, dev, vq, free_head);
+
+ qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+ QVIRTIO_BLK_TIMEOUT_US);
+ status = readb(req_addr + 528);
+ g_assert_cmpint(status, ==, 0);
+
+ guest_free(alloc, req_addr);
+
+ /* Read request */
+ req.type = VIRTIO_BLK_T_IN;
+ req.ioprio = 1;
+ req.sector = 1;
+ req.data = g_malloc0(512);
+
+ req_addr = virtio_blk_request(alloc, dev, &req, 512);
+
+ g_free(req.data);
+
+ free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 16, 513, true, false);
+
+ qvirtqueue_kick(qts, dev, vq, free_head);
+
+ qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+ QVIRTIO_BLK_TIMEOUT_US);
+ status = readb(req_addr + 528);
+ g_assert_cmpint(status, ==, 0);
+
+ data = g_malloc0(512);
+ memread(req_addr + 16, data, 512);
+ g_assert_cmpstr(data, ==, "TEST");
+ g_free(data);
+
+ guest_free(alloc, req_addr);
+ }
+
+ return vq;
+}
+
+static void basic(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+ QVhostUserBlk *blk_if = obj;
+ QVirtQueue *vq;
+
+ vq = test_basic(blk_if->vdev, t_alloc);
+ qvirtqueue_cleanup(blk_if->vdev->bus, vq, t_alloc);
+
+}
+
+static void indirect(void *obj, void *u_data, QGuestAllocator *t_alloc)
+{
+ QVirtQueue *vq;
+ QVhostUserBlk *blk_if = obj;
+ QVirtioDevice *dev = blk_if->vdev;
+ QVirtioBlkReq req;
+ QVRingIndirectDesc *indirect;
+ uint64_t req_addr;
+ uint64_t capacity;
+ uint64_t features;
+ uint32_t free_head;
+ uint8_t status;
+ char *data;
+ QTestState *qts = global_qtest;
+
+ features = qvirtio_get_features(dev);
+ g_assert_cmphex(features & (1u << VIRTIO_RING_F_INDIRECT_DESC), !=, 0);
+ features = features & ~(QVIRTIO_F_BAD_FEATURE |
+ (1u << VIRTIO_RING_F_EVENT_IDX) |
+ (1u << VIRTIO_BLK_F_SCSI));
+ qvirtio_set_features(dev, features);
+
+ capacity = qvirtio_config_readq(dev, 0);
+ g_assert_cmpint(capacity, ==, TEST_IMAGE_SIZE / 512);
+
+ vq = qvirtqueue_setup(dev, t_alloc, 0);
+ qvirtio_set_driver_ok(dev);
+
+ /* Write request */
+ req.type = VIRTIO_BLK_T_OUT;
+ req.ioprio = 1;
+ req.sector = 0;
+ req.data = g_malloc0(512);
+ strcpy(req.data, "TEST");
+
+ req_addr = virtio_blk_request(t_alloc, dev, &req, 512);
+
+ g_free(req.data);
+
+ indirect = qvring_indirect_desc_setup(qts, dev, t_alloc, 2);
+ qvring_indirect_desc_add(dev, qts, indirect, req_addr, 528, false);
+ qvring_indirect_desc_add(dev, qts, indirect, req_addr + 528, 1, true);
+ free_head = qvirtqueue_add_indirect(qts, vq, indirect);
+ qvirtqueue_kick(qts, dev, vq, free_head);
+
+ qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+ QVIRTIO_BLK_TIMEOUT_US);
+ status = readb(req_addr + 528);
+ g_assert_cmpint(status, ==, 0);
+
+ g_free(indirect);
+ guest_free(t_alloc, req_addr);
+
+ /* Read request */
+ req.type = VIRTIO_BLK_T_IN;
+ req.ioprio = 1;
+ req.sector = 0;
+ req.data = g_malloc0(512);
+ strcpy(req.data, "TEST");
+
+ req_addr = virtio_blk_request(t_alloc, dev, &req, 512);
+
+ g_free(req.data);
+
+ indirect = qvring_indirect_desc_setup(qts, dev, t_alloc, 2);
+ qvring_indirect_desc_add(dev, qts, indirect, req_addr, 16, false);
+ qvring_indirect_desc_add(dev, qts, indirect, req_addr + 16, 513, true);
+ free_head = qvirtqueue_add_indirect(qts, vq, indirect);
+ qvirtqueue_kick(qts, dev, vq, free_head);
+
+ qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+ QVIRTIO_BLK_TIMEOUT_US);
+ status = readb(req_addr + 528);
+ g_assert_cmpint(status, ==, 0);
+
+ data = g_malloc0(512);
+ memread(req_addr + 16, data, 512);
+ g_assert_cmpstr(data, ==, "TEST");
+ g_free(data);
+
+ g_free(indirect);
+ guest_free(t_alloc, req_addr);
+ qvirtqueue_cleanup(dev->bus, vq, t_alloc);
+}
+
+
+static void idx(void *obj, void *u_data, QGuestAllocator *t_alloc)
+{
+ QVirtQueue *vq;
+ QVhostUserBlkPCI *blk = obj;
+ QVirtioPCIDevice *pdev = &blk->pci_vdev;
+ QVirtioDevice *dev = &pdev->vdev;
+ QVirtioBlkReq req;
+ uint64_t req_addr;
+ uint64_t capacity;
+ uint64_t features;
+ uint32_t free_head;
+ uint32_t write_head;
+ uint32_t desc_idx;
+ uint8_t status;
+ char *data;
+ QOSGraphObject *blk_object = obj;
+ QPCIDevice *pci_dev = blk_object->get_driver(blk_object, "pci-device");
+ QTestState *qts = global_qtest;
+
+ if (qpci_check_buggy_msi(pci_dev)) {
+ return;
+ }
+
+ qpci_msix_enable(pdev->pdev);
+ qvirtio_pci_set_msix_configuration_vector(pdev, t_alloc, 0);
+
+ features = qvirtio_get_features(dev);
+ features = features & ~(QVIRTIO_F_BAD_FEATURE |
+ (1u << VIRTIO_RING_F_INDIRECT_DESC) |
+ (1u << VIRTIO_F_NOTIFY_ON_EMPTY) |
+ (1u << VIRTIO_BLK_F_SCSI));
+ qvirtio_set_features(dev, features);
+
+ capacity = qvirtio_config_readq(dev, 0);
+ g_assert_cmpint(capacity, ==, TEST_IMAGE_SIZE / 512);
+
+ vq = qvirtqueue_setup(dev, t_alloc, 0);
+ qvirtqueue_pci_msix_setup(pdev, (QVirtQueuePCI *)vq, t_alloc, 1);
+
+ qvirtio_set_driver_ok(dev);
+
+ /* Write request */
+ req.type = VIRTIO_BLK_T_OUT;
+ req.ioprio = 1;
+ req.sector = 0;
+ req.data = g_malloc0(512);
+ strcpy(req.data, "TEST");
+
+ req_addr = virtio_blk_request(t_alloc, dev, &req, 512);
+
+ g_free(req.data);
+
+ free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 16, 512, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 528, 1, true, false);
+ qvirtqueue_kick(qts, dev, vq, free_head);
+
+ qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+ QVIRTIO_BLK_TIMEOUT_US);
+
+ /* Write request */
+ req.type = VIRTIO_BLK_T_OUT;
+ req.ioprio = 1;
+ req.sector = 1;
+ req.data = g_malloc0(512);
+ strcpy(req.data, "TEST");
+
+ req_addr = virtio_blk_request(t_alloc, dev, &req, 512);
+
+ g_free(req.data);
+
+ /* Notify after processing the third request */
+ qvirtqueue_set_used_event(qts, vq, 2);
+ free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 16, 512, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 528, 1, true, false);
+ qvirtqueue_kick(qts, dev, vq, free_head);
+ write_head = free_head;
+
+ /* No notification expected */
+ status = qvirtio_wait_status_byte_no_isr(qts, dev,
+ vq, req_addr + 528,
+ QVIRTIO_BLK_TIMEOUT_US);
+ g_assert_cmpint(status, ==, 0);
+
+ guest_free(t_alloc, req_addr);
+
+ /* Read request */
+ req.type = VIRTIO_BLK_T_IN;
+ req.ioprio = 1;
+ req.sector = 1;
+ req.data = g_malloc0(512);
+
+ req_addr = virtio_blk_request(t_alloc, dev, &req, 512);
+
+ g_free(req.data);
+
+ free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+ qvirtqueue_add(qts, vq, req_addr + 16, 512, true, true);
+ qvirtqueue_add(qts, vq, req_addr + 528, 1, true, false);
+
+ qvirtqueue_kick(qts, dev, vq, free_head);
+
+ /* We get just one notification for both requests */
+ qvirtio_wait_used_elem(qts, dev, vq, write_head, NULL,
+ QVIRTIO_BLK_TIMEOUT_US);
+ g_assert(qvirtqueue_get_buf(qts, vq, &desc_idx, NULL));
+ g_assert_cmpint(desc_idx, ==, free_head);
+
+ status = readb(req_addr + 528);
+ g_assert_cmpint(status, ==, 0);
+
+ data = g_malloc0(512);
+ memread(req_addr + 16, data, 512);
+ g_assert_cmpstr(data, ==, "TEST");
+ g_free(data);
+
+ guest_free(t_alloc, req_addr);
+
+ /* End test */
+ qpci_msix_disable(pdev->pdev);
+
+ qvirtqueue_cleanup(dev->bus, vq, t_alloc);
+}
+
+static void pci_hotplug(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+ QVirtioPCIDevice *dev1 = obj;
+ QVirtioPCIDevice *dev;
+ QTestState *qts = dev1->pdev->bus->qts;
+
+ /* plug secondary disk */
+ qtest_qmp_device_add(qts, "vhost-user-blk-pci", "drv1",
+ "{'addr': %s, 'chardev': 'char2'}",
+ stringify(PCI_SLOT_HP) ".0");
+
+ dev = virtio_pci_new(dev1->pdev->bus,
+ &(QPCIAddress) { .devfn = QPCI_DEVFN(PCI_SLOT_HP, 0)
+ });
+ g_assert_nonnull(dev);
+ g_assert_cmpint(dev->vdev.device_type, ==, VIRTIO_ID_BLOCK);
+ qvirtio_pci_device_disable(dev);
+ qos_object_destroy((QOSGraphObject *)dev);
+
+ /* unplug secondary disk */
+ qpci_unplug_acpi_device_test(qts, "drv1", PCI_SLOT_HP);
+}
+
+/*
+ * Check that setting the vring addr on a non-existent virtqueue does
+ * not crash.
+ */
+static void test_nonexistent_virtqueue(void *obj, void *data,
+ QGuestAllocator *t_alloc)
+{
+ QVhostUserBlkPCI *blk = obj;
+ QVirtioPCIDevice *pdev = &blk->pci_vdev;
+ QPCIBar bar0;
+ QPCIDevice *dev;
+
+ dev = qpci_device_find(pdev->pdev->bus, QPCI_DEVFN(4, 0));
+ g_assert(dev != NULL);
+ qpci_device_enable(dev);
+
+ bar0 = qpci_iomap(dev, 0, NULL);
+
+ qpci_io_writeb(dev, bar0, VIRTIO_PCI_QUEUE_SEL, 2);
+ qpci_io_writel(dev, bar0, VIRTIO_PCI_QUEUE_PFN, 1);
+
+ g_free(dev);
+}
+
+static const char *qtest_qemu_vu_binary(void)
+{
+ const char *qemu_vu_bin;
+
+ qemu_vu_bin = getenv("QTEST_QEMU_VU_BINARY");
+ if (!qemu_vu_bin) {
+ fprintf(stderr, "Environment variable QTEST_QEMU_VU_BINARY required\n");
+ exit(0);
+ }
+
+ return qemu_vu_bin;
+}
+
+static void drive_destroy(void *path)
+{
+ unlink(path);
+ g_free(path);
+ qos_invalidate_command_line();
+}
+
+
+static char *drive_create(void)
+{
+ int fd, ret;
+ /** vhost-user-blk won't recognize drive located in /tmp */
+ char *t_path = g_strdup("qtest.XXXXXX");
+
+ /** Create a temporary raw image */
+ fd = mkstemp(t_path);
+ g_assert_cmpint(fd, >=, 0);
+ ret = ftruncate(fd, TEST_IMAGE_SIZE);
+ g_assert_cmpint(ret, ==, 0);
+ close(fd);
+
+ g_test_queue_destroy(drive_destroy, t_path);
+ return t_path;
+}
+
+
+
+static void start_vhost_user_blk(const char *img_path, const char *sock_path)
+{
+ const char *vhost_user_blk_bin = qtest_qemu_vu_binary();
+ /*
+ * "qemu-vu -e" will exit when the client disconnects thus the launched
+ * qemu-vu process will not block scripts/tap-driver.pl
+ */
+ gchar *command = g_strdup_printf("exec %s "
+ "-e "
+ "-k %s "
+ "-f raw "
+ "%s",
+ vhost_user_blk_bin,
+ sock_path, img_path);
+ g_test_message("starting vhost-user backend: %s", command);
+ pid_t pid = fork();
+ if (pid == 0) {
+ execlp("/bin/sh", "sh", "-c", command, NULL);
+ exit(1);
+ }
+ /*
+ * make sure qemu-vu i.e. socket server is started before tests
+ * otherwise qemu will complain,
+ * "Failed to connect socket ... Connection refused"
+ */
+ g_usleep(G_USEC_PER_SEC);
+}
+
+static void *vhost_user_blk_test_setup(GString *cmd_line, void *arg)
+{
+ /* create image file */
+ const char *img_path = drive_create();
+ const char *sock_path = "/tmp/vhost-user-blk_vhost.socket";
+ start_vhost_user_blk(img_path, sock_path);
+ /* "-chardev socket,id=char2" is used for pci_hotplug*/
+ g_string_append_printf(cmd_line,
+ " -object memory-backend-memfd,id=mem,size=128M,share=on -numa node,memdev=mem "
+ "-chardev socket,id=char1,path=%s "
+ "-chardev socket,id=char2,path=%s",
+ sock_path, sock_path);
+ return arg;
+}
+
+static void register_vhost_user_blk_test(void)
+{
+ QOSGraphTestOptions opts = {
+ .before = vhost_user_blk_test_setup,
+ };
+
+ /*
+ * tests for vhost-user-blk and vhost-user-blk-pci
+ * The tests are borrowed from tests/virtio-blk-test.c. But some tests
+ * regarding block_resize don't work for vhost-user-blk.
+ * vhost-user-blk device doesn't have -drive, so tests containing
+ * block_resize are also abandoned,
+ * - config
+ * - resize
+ */
+ qos_add_test("basic", "vhost-user-blk", basic, &opts);
+ qos_add_test("indirect", "vhost-user-blk", indirect, &opts);
+ qos_add_test("idx", "vhost-user-blk-pci", idx, &opts);
+ qos_add_test("nxvirtq", "vhost-user-blk-pci",
+ test_nonexistent_virtqueue, &opts);
+ qos_add_test("hotplug", "vhost-user-blk-pci", pci_hotplug, &opts);
+}
+
+libqos_init(register_vhost_user_blk_test);
--
2.25.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 4/5] a standone-alone tool to directly share disk image file via vhost-user protocol
2020-02-12 9:51 ` [PATCH v3 4/5] a standone-alone tool to directly share disk image file via vhost-user protocol Coiby Xu
@ 2020-02-13 7:05 ` Coiby Xu
0 siblings, 0 replies; 7+ messages in thread
From: Coiby Xu @ 2020-02-13 7:05 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, bharatlkmlkvm, stefanha
I forgot to put backends/vhost-user-blk-server.o as dependency for
qemu-vu target in Makefile,
qemu-img$(EXESUF): qemu-img.o $(authz-obj-y) $(block-obj-y)
$(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
qemu-nbd$(EXESUF): qemu-nbd.o $(authz-obj-y) $(block-obj-y)
$(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
+
+ifdef CONFIG_LINUX
+qemu-vu$(EXESUF): qemu-vu.o backends/vhost-user-blk-server.o
$(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y)
$(COMMON_LDADDS) libvhost-user.a
+endif
qemu-io$(EXESUF): qemu-io.o $(authz-obj-y) $(block-obj-y)
$(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
I also noticed in the latest version of QEMU, `make check-qtest`
somehow doesn't run qos-test. If you need to run vhost-user-blk
testsuite, please execute the following command after applying the
above fix and the 5th patch,
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_VU_BINARY=./qemu-vu
tests/qos-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl
--test-name="qos-test"
On Wed, Feb 12, 2020 at 5:52 PM Coiby Xu <coiby.xu@gmail.com> wrote:
>
> vhost-user-blk could have played as vhost-user backend but it only supports raw
> file and don't support VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES
> operations on raw file (ioctl(fd, BLKDISCARD) is only valid for real
> block device).
>
> In the future Kevin's qemu-storage-daemon will be used to replace this
> tool.
>
> Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
> ---
> Makefile | 4 +
> configure | 3 +
> qemu-vu.c | 252 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 259 insertions(+)
> create mode 100644 qemu-vu.c
>
> diff --git a/Makefile b/Makefile
> index f0e1a2fc1d..0bfd2f1ddd 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -572,6 +572,10 @@ qemu-img.o: qemu-img-cmds.h
>
> qemu-img$(EXESUF): qemu-img.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
> qemu-nbd$(EXESUF): qemu-nbd.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
> +
> +ifdef CONFIG_LINUX
> +qemu-vu$(EXESUF): qemu-vu.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS) libvhost-user.a
> +endif
> qemu-io$(EXESUF): qemu-io.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
>
> qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o $(COMMON_LDADDS)
> diff --git a/configure b/configure
> index 115dc38085..e87c9a5587 100755
> --- a/configure
> +++ b/configure
> @@ -6217,6 +6217,9 @@ if test "$want_tools" = "yes" ; then
> if [ "$linux" = "yes" -o "$bsd" = "yes" -o "$solaris" = "yes" ] ; then
> tools="qemu-nbd\$(EXESUF) $tools"
> fi
> + if [ "$linux" = "yes" ] ; then
> + tools="qemu-vu\$(EXESUF) $tools"
> + fi
> if [ "$ivshmem" = "yes" ]; then
> tools="ivshmem-client\$(EXESUF) ivshmem-server\$(EXESUF) $tools"
> fi
> diff --git a/qemu-vu.c b/qemu-vu.c
> new file mode 100644
> index 0000000000..dd1032b205
> --- /dev/null
> +++ b/qemu-vu.c
> @@ -0,0 +1,252 @@
> +/*
> + * Copyright (C) 2020 Coiby Xu <coiby.xu@gmail.com>
> + *
> + * standone-alone vhost-user-blk device server backend
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; under version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include <getopt.h>
> +#include <libgen.h>
> +#include "backends/vhost-user-blk-server.h"
> +#include "block/block_int.h"
> +#include "io/net-listener.h"
> +#include "qapi/error.h"
> +#include "qapi/qmp/qdict.h"
> +#include "qapi/qmp/qstring.h"
> +#include "qemu/config-file.h"
> +#include "qemu/cutils.h"
> +#include "qemu/main-loop.h"
> +#include "qemu/module.h"
> +#include "qemu/option.h"
> +#include "qemu-common.h"
> +#include "qemu-version.h"
> +#include "qom/object_interfaces.h"
> +#include "sysemu/block-backend.h"
> +#define QEMU_VU_OPT_CACHE 256
> +#define QEMU_VU_OPT_AIO 257
> +#define QEMU_VU_OBJ_ID "vu_disk"
> +static QemuOptsList qemu_object_opts = {
> + .name = "object",
> + .implied_opt_name = "qom-type",
> + .head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
> + .desc = {
> + { }
> + },
> +};
> +static char *srcpath;
> +
> +static void usage(const char *name)
> +{
> + (printf) (
> +"Usage: %s [OPTIONS] FILE\n"
> +" or: %s -L [OPTIONS]\n"
> +"QEMU Vhost-user Server Utility\n"
> +"\n"
> +" -h, --help display this help and exit\n"
> +" -V, --version output version information and exit\n"
> +"\n"
> +"Connection properties:\n"
> +" -k, --socket=PATH path to the unix socket\n"
> +"\n"
> +"General purpose options:\n"
> +" -e, -- exit-panic When the panic callback is called, the program\n"
> +" will exit. Useful for make check-qtest.\n"
> +"\n"
> +"Block device options:\n"
> +" -f, --format=FORMAT set image format (raw, qcow2, ...)\n"
> +" -r, --read-only export read-only\n"
> +" -n, --nocache disable host cache\n"
> +" --cache=MODE set cache mode (none, writeback, ...)\n"
> +" --aio=MODE set AIO mode (native or threads)\n"
> +"\n"
> +QEMU_HELP_BOTTOM "\n"
> + , name, name);
> +}
> +
> +static void version(const char *name)
> +{
> + printf(
> +"%s " QEMU_FULL_VERSION "\n"
> +"Written by Coiby Xu, based on qemu-nbd by Anthony Liguori\n"
> +"\n"
> +QEMU_COPYRIGHT "\n"
> +"This is free software; see the source for copying conditions. There is NO\n"
> +"warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n"
> + , name);
> +}
> +
> +static VuBlockDev *vu_block_device;
> +
> +static void vus_shutdown(void)
> +{
> +
> + Error *local_err = NULL;
> + job_cancel_sync_all();
> + bdrv_close_all();
> + user_creatable_del(QEMU_VU_OBJ_ID, &local_err);
> +}
> +
> +int main(int argc, char **argv)
> +{
> + BlockBackend *blk;
> + BlockDriverState *bs;
> + bool readonly = false;
> + char *sockpath = NULL;
> + const char *sopt = "hVrnvek:f:";
> + struct option lopt[] = {
> + { "help", no_argument, NULL, 'h' },
> + { "version", no_argument, NULL, 'V' },
> + { "exit-panic", no_argument, NULL, 'e' },
> + { "socket", required_argument, NULL, 'k' },
> + { "read-only", no_argument, NULL, 'r' },
> + { "nocache", no_argument, NULL, 'n' },
> + { "cache", required_argument, NULL, QEMU_VU_OPT_CACHE },
> + { "aio", required_argument, NULL, QEMU_VU_OPT_AIO },
> + { "format", required_argument, NULL, 'f' },
> + { NULL, 0, NULL, 0 }
> + };
> + int ch;
> + int opt_ind = 0;
> + int flags = BDRV_O_RDWR;
> + bool seen_cache = false;
> + bool seen_aio = false;
> + const char *fmt = NULL;
> + Error *local_err = NULL;
> + QDict *options = NULL;
> + bool writethrough = true;
> + bool exit_when_panic = false;
> +
> + error_init(argv[0]);
> +
> + module_call_init(MODULE_INIT_QOM);
> + qemu_init_exec_dir(argv[0]);
> +
> + while ((ch = getopt_long(argc, argv, sopt, lopt, &opt_ind)) != -1) {
> + switch (ch) {
> + case 'e':
> + exit_when_panic = true;
> + break;
> + case 'n':
> + optarg = (char *) "none";
> + /* fallthrough */
> + case QEMU_VU_OPT_CACHE:
> + if (seen_cache) {
> + error_report("-n and --cache can only be specified once");
> + exit(EXIT_FAILURE);
> + }
> + seen_cache = true;
> + if (bdrv_parse_cache_mode(optarg, &flags, &writethrough) == -1) {
> + error_report("Invalid cache mode `%s'", optarg);
> + exit(EXIT_FAILURE);
> + }
> + break;
> + case QEMU_VU_OPT_AIO:
> + if (seen_aio) {
> + error_report("--aio can only be specified once");
> + exit(EXIT_FAILURE);
> + }
> + seen_aio = true;
> + if (!strcmp(optarg, "native")) {
> + flags |= BDRV_O_NATIVE_AIO;
> + } else if (!strcmp(optarg, "threads")) {
> + /* this is the default */
> + } else {
> + error_report("invalid aio mode `%s'", optarg);
> + exit(EXIT_FAILURE);
> + }
> + break;
> + case 'r':
> + readonly = true;
> + flags &= ~BDRV_O_RDWR;
> + break;
> + case 'k':
> + sockpath = optarg;
> + if (sockpath[0] != '/') {
> + error_report("socket path must be absolute");
> + exit(EXIT_FAILURE);
> + }
> + break;
> + case 'f':
> + fmt = optarg;
> + break;
> + case 'V':
> + version(argv[0]);
> + exit(0);
> + break;
> + case 'h':
> + usage(argv[0]);
> + exit(0);
> + break;
> + case '?':
> + error_report("Try `%s --help' for more information.", argv[0]);
> + exit(EXIT_FAILURE);
> + }
> + }
> +
> + if ((argc - optind) != 1) {
> + error_report("Invalid number of arguments");
> + error_printf("Try `%s --help' for more information.\n", argv[0]);
> + exit(EXIT_FAILURE);
> + }
> + if (qemu_init_main_loop(&local_err)) {
> + error_report_err(local_err);
> + exit(EXIT_FAILURE);
> + }
> + bdrv_init();
> +
> + srcpath = argv[optind];
> + if (fmt) {
> + options = qdict_new();
> + qdict_put_str(options, "driver", fmt);
> + }
> + blk = blk_new_open(srcpath, NULL, options, flags, &local_err);
> +
> + if (!blk) {
> + error_reportf_err(local_err, "Failed to blk_new_open '%s': ",
> + argv[optind]);
> + exit(EXIT_FAILURE);
> + }
> + bs = blk_bs(blk);
> +
> + char buf[300];
> + snprintf(buf, 300, "%s,id=%s,node-name=%s,unix-socket=%s,writable=%s",
> + TYPE_VHOST_USER_BLK_SERVER, QEMU_VU_OBJ_ID, bdrv_get_node_name(bs),
> + sockpath, !readonly ? "on" : "off");
> + /* While calling user_creatable_del, 'object' group is required */
> + qemu_add_opts(&qemu_object_opts);
> + QemuOpts *opts = qemu_opts_parse(&qemu_object_opts, buf, true, &local_err);
> + if (local_err) {
> + error_report_err(local_err);
> + goto error;
> + }
> +
> + Object *obj = user_creatable_add_opts(opts, &local_err);
> +
> + if (local_err) {
> + error_report_err(local_err);
> + goto error;
> + }
> +
> + vu_block_device = VHOST_USER_BLK_SERVER(obj);
> + vu_block_device->exit_when_panic = exit_when_panic;
> +
> + do {
> + main_loop_wait(false);
> + } while (!vu_block_device->exit_when_panic || !vu_block_device->vu_server->close);
> +
> + error:
> + vus_shutdown();
> + exit(EXIT_SUCCESS);
> +}
> --
> 2.25.0
>
--
Best regards,
Coiby
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-02-13 7:06 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12 9:51 [PATCH v3 0/5] vhost-user block device backend implementation Coiby Xu
2020-02-12 9:51 ` [PATCH v3 1/5] extend libvhost to support IOThread and coroutine Coiby Xu
2020-02-12 9:51 ` [PATCH v3 2/5] generic vhost user server Coiby Xu
2020-02-12 9:51 ` [PATCH v3 3/5] vhost-user block device backend server Coiby Xu
2020-02-12 9:51 ` [PATCH v3 4/5] a standone-alone tool to directly share disk image file via vhost-user protocol Coiby Xu
2020-02-13 7:05 ` Coiby Xu
2020-02-12 9:51 ` [PATCH v3 5/5] new qTest case to test the vhost-user-blk-server Coiby Xu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.