All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/8] blkio: add libblkio BlockDriver
@ 2022-03-23 11:17 Stefan Hajnoczi
  2022-03-23 11:17 ` [RFC 1/8] blkio: add io_uring block driver using libblkio Stefan Hajnoczi
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-23 11:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, Philippe Mathieu-Daudé,
	Alberto Faria, Markus Armbruster, Yanan Wang, Eduardo Habkost,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Eric Blake, sgarzare

This patch series adds a QEMU BlockDriver for libblkio
(https://gitlab.com/libblkio/libblkio/), a library for high-performance block
device I/O. Currently libblkio has basic io_uring support with additional
drivers in development.

The first patch adds the core BlockDriver and most of the libblkio API usage.
The remainder of the patch series reworks the existing QEMU bdrv_register_buf()
API so virtio-blk emulation efficiently map guest RAM for libblkio - some
libblkio drivers require that I/O buffer memory is pre-registered (think VFIO,
vhost, etc).

This block driver is incomplete because bdrv_refresh_limits() and several other
APIs are not yet implemented. You can already boot a guest though. Once the
missing gaps have been filled in I will send a non-RFC patch series.

Regarding the design: each libblkio driver is a separately named BlockDriver.
That means there is an "io_uring" BlockDriver and not a generic "libblkio"
BlockDriver. In the future there will be additional BlockDrivers, all defined
in block/blkio.c. This way QAPI and open parameters are type-safe and mandatory
parameters can be checked by QEMU.

Stefan Hajnoczi (8):
  blkio: add io_uring block driver using libblkio
  numa: call ->ram_block_removed() in ram_block_notifer_remove()
  block: pass size to bdrv_unregister_buf()
  block: add BDRV_REQ_REGISTERED_BUF request flag
  block: add BlockRAMRegistrar
  stubs: add memory_region_from_host() and memory_region_get_fd()
  blkio: implement BDRV_REQ_REGISTERED_BUF optimization
  virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint

 MAINTAINERS                                 |   7 +
 meson_options.txt                           |   2 +
 qapi/block-core.json                        |  18 +-
 meson.build                                 |   9 +
 include/block/block-common.h                |   9 +
 include/block/block-global-state.h          |   5 +-
 include/block/block_int-common.h            |   2 +-
 include/hw/virtio/virtio-blk.h              |   2 +
 include/sysemu/block-backend-global-state.h |   2 +-
 include/sysemu/block-ram-registrar.h        |  30 ++
 block/blkio.c                               | 560 ++++++++++++++++++++
 block/blkverify.c                           |   4 +-
 block/block-backend.c                       |   4 +-
 block/block-ram-registrar.c                 |  39 ++
 block/crypto.c                              |   2 +
 block/io.c                                  |  36 +-
 block/mirror.c                              |   2 +
 block/nvme.c                                |   2 +-
 block/raw-format.c                          |   2 +
 hw/block/virtio-blk.c                       |  13 +-
 hw/core/numa.c                              |  17 +
 qemu-img.c                                  |   4 +-
 stubs/memory.c                              |  13 +
 tests/qtest/modules-test.c                  |   3 +
 util/vfio-helpers.c                         |   5 +-
 block/meson.build                           |   2 +
 scripts/meson-buildoptions.sh               |   3 +
 stubs/meson.build                           |   1 +
 28 files changed, 772 insertions(+), 26 deletions(-)
 create mode 100644 include/sysemu/block-ram-registrar.h
 create mode 100644 block/blkio.c
 create mode 100644 block/block-ram-registrar.c
 create mode 100644 stubs/memory.c

-- 
2.35.1




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 1/8] blkio: add io_uring block driver using libblkio
  2022-03-23 11:17 [RFC 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
@ 2022-03-23 11:17 ` Stefan Hajnoczi
  2022-03-23 11:17 ` [RFC 2/8] numa: call ->ram_block_removed() in ram_block_notifer_remove() Stefan Hajnoczi
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-23 11:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, Philippe Mathieu-Daudé,
	Alberto Faria, Markus Armbruster, Yanan Wang, Eduardo Habkost,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Eric Blake, sgarzare

libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
high-performance disk I/O. It currently supports io_uring with
additional drivers planned.

One of the reasons for developing libblkio is that other applications
besides QEMU can use it. This will be particularly useful for
vhost-user-blk which applications may wish to use for connecting to
qemu-storage-daemon.

libblkio also gives us an opportunity to develop in Rust behind a C API
that is easy to consume from QEMU.

This commit adds an io_uring BlockDriver to QEMU using libblkio. For now
I/O buffers are copied through bounce buffers if the libblkio driver
requires it. Later commits add an optimization for pre-registering guest
RAM to avoid bounce buffers. It will be easy to add other libblkio
drivers since they will share the majority of code.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 MAINTAINERS                   |   6 +
 meson_options.txt             |   2 +
 qapi/block-core.json          |  18 +-
 meson.build                   |   9 +
 block/blkio.c                 | 460 ++++++++++++++++++++++++++++++++++
 tests/qtest/modules-test.c    |   3 +
 block/meson.build             |   1 +
 scripts/meson-buildoptions.sh |   3 +
 8 files changed, 501 insertions(+), 1 deletion(-)
 create mode 100644 block/blkio.c

diff --git a/MAINTAINERS b/MAINTAINERS
index cc364afef7..0fb08dd4f7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3349,6 +3349,12 @@ L: qemu-block@nongnu.org
 S: Maintained
 F: block/vdi.c
 
+blkio
+M: Stefan Hajnoczi <stefanha@redhat.com>
+L: qemu-block@nongnu.org
+S: Maintained
+F: block/blkio.c
+
 iSCSI
 M: Ronnie Sahlberg <ronniesahlberg@gmail.com>
 M: Paolo Bonzini <pbonzini@redhat.com>
diff --git a/meson_options.txt b/meson_options.txt
index 52b11cead4..1e82e770e7 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -101,6 +101,8 @@ option('bzip2', type : 'feature', value : 'auto',
        description: 'bzip2 support for DMG images')
 option('cap_ng', type : 'feature', value : 'auto',
        description: 'cap_ng support')
+option('blkio', type : 'feature', value : 'auto',
+       description: 'libblkio block device driver')
 option('bpf', type : 'feature', value : 'auto',
         description: 'eBPF support')
 option('cocoa', type : 'feature', value : 'auto',
diff --git a/qapi/block-core.json b/qapi/block-core.json
index e89f2dfb5b..d9bb283108 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2924,7 +2924,9 @@
             'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
             {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
             {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
-            'http', 'https', 'iscsi',
+            'http', 'https',
+            { 'name': 'io_uring', 'if': 'CONFIG_BLKIO' },
+            'iscsi',
             'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
             'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
             { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
@@ -3656,6 +3658,18 @@
             '*debug': 'int',
             '*logfile': 'str' } }
 
+##
+# @BlockdevOptionsIoUring:
+#
+# Driver specific block device options for the io_uring backend.
+#
+# @filename: path to the image file
+#
+# Since: 6.3
+##
+{ 'struct': 'BlockdevOptionsIoUring',
+  'data': { 'filename': 'str' } }
+
 ##
 # @IscsiTransport:
 #
@@ -4254,6 +4268,8 @@
                        'if': 'HAVE_HOST_BLOCK_DEVICE' },
       'http':       'BlockdevOptionsCurlHttp',
       'https':      'BlockdevOptionsCurlHttps',
+      'io_uring':   { 'type': 'BlockdevOptionsIoUring',
+                      'if': 'CONFIG_BLKIO' },
       'iscsi':      'BlockdevOptionsIscsi',
       'luks':       'BlockdevOptionsLUKS',
       'nbd':        'BlockdevOptionsNbd',
diff --git a/meson.build b/meson.build
index aef724ad3c..d52f542b2e 100644
--- a/meson.build
+++ b/meson.build
@@ -636,6 +636,13 @@ if not get_option('virglrenderer').auto() or have_system or have_vhost_user_gpu
                      required: get_option('virglrenderer'),
                      kwargs: static_kwargs)
 endif
+blkio = not_found
+if not get_option('blkio').auto() or have_block
+  blkio = dependency('blkio',
+                     method: 'pkg-config',
+                     required: get_option('blkio'),
+                     kwargs: static_kwargs)
+endif
 curl = not_found
 if not get_option('curl').auto() or have_block
   curl = dependency('libcurl', version: '>=7.29.0',
@@ -1519,6 +1526,7 @@ config_host_data.set('CONFIG_LIBUDEV', libudev.found())
 config_host_data.set('CONFIG_LZO', lzo.found())
 config_host_data.set('CONFIG_MPATH', mpathpersist.found())
 config_host_data.set('CONFIG_MPATH_NEW_API', mpathpersist_new_api)
+config_host_data.set('CONFIG_BLKIO', blkio.found())
 config_host_data.set('CONFIG_CURL', curl.found())
 config_host_data.set('CONFIG_CURSES', curses.found())
 config_host_data.set('CONFIG_GBM', gbm.found())
@@ -3672,6 +3680,7 @@ summary_info += {'PAM':               pam}
 summary_info += {'iconv support':     iconv}
 summary_info += {'curses support':    curses}
 summary_info += {'virgl support':     virgl}
+summary_info += {'blkio support':     blkio}
 summary_info += {'curl support':      curl}
 summary_info += {'Multipath support': mpathpersist}
 summary_info += {'VNC support':       vnc}
diff --git a/block/blkio.c b/block/blkio.c
new file mode 100644
index 0000000000..dd2308b967
--- /dev/null
+++ b/block/blkio.c
@@ -0,0 +1,460 @@
+#include "qemu/osdep.h"
+#include <blkio.h>
+#include "block/block_int.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "qemu/module.h"
+
+typedef struct BlkAIOCB {
+    BlockAIOCB common;
+    struct blkio_mem_region mem_region;
+    QEMUIOVector qiov;
+    struct iovec bounce_iov;
+} BlkioAIOCB;
+
+typedef struct {
+    /* Protects ->blkio and request submission on ->blkioq */
+    QemuMutex lock;
+
+    struct blkio *blkio;
+    struct blkioq *blkioq; /* this could be multi-queue in the future */
+    int completion_fd;
+
+    /* The value of the "mem-region-alignment" property */
+    size_t mem_region_alignment;
+
+    /* Can we skip adding/deleting blkio_mem_regions? */
+    bool needs_mem_regions;
+
+    /*
+     * blkio_completion_fd_poll() stashes the next completion for
+     * blkio_completion_fd_poll_ready().
+     */
+    struct blkio_completion pending_completion;
+} BDRVBlkioState;
+
+static void blkio_aiocb_complete(BlkioAIOCB *acb, int ret)
+{
+    /* Copy bounce buffer back to qiov */
+    if (acb->qiov.niov > 0) {
+        qemu_iovec_from_buf(&acb->qiov, 0,
+                acb->bounce_iov.iov_base,
+                acb->bounce_iov.iov_len);
+        qemu_iovec_destroy(&acb->qiov);
+    }
+
+    acb->common.cb(acb->common.opaque, ret);
+
+    if (acb->mem_region.len > 0) {
+        BDRVBlkioState *s = acb->common.bs->opaque;
+
+        WITH_QEMU_LOCK_GUARD(&s->lock) {
+            blkio_free_mem_region(s->blkio, &acb->mem_region);
+        }
+    }
+
+    qemu_aio_unref(&acb->common);
+}
+
+/*
+ * Only the thread that calls aio_poll() invokes fd and poll handlers.
+ * Therefore locks are not necessary except when accessing s->blkio.
+ *
+ * No locking is performed around blkioq_get_completions() although other
+ * threads may submit I/O requests on s->blkioq. We're assuming there is no
+ * inteference between blkioq_get_completions() and other s->blkioq APIs.
+ */
+
+static void blkio_completion_fd_read(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVBlkioState *s = bs->opaque;
+    struct blkio_completion completion;
+    uint64_t val;
+    ssize_t ret __attribute__((unused));
+
+    /* Reset completion fd status */
+    ret = read(s->completion_fd, &val, sizeof(val));
+
+    /*
+     * Reading one completion at a time makes nested event loop re-entrancy
+     * simple. Change this loop to get multiple completions in one go if it
+     * becomes a performance bottleneck.
+     */
+    while (blkioq_get_completions(s->blkioq, &completion, 1) == 1) {
+        blkio_aiocb_complete(completion.user_data, completion.ret);
+    }
+}
+
+static bool blkio_completion_fd_poll(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVBlkioState *s = bs->opaque;
+
+    return blkioq_get_completions(s->blkioq, &s->pending_completion, 1) == 1;
+}
+
+static void blkio_completion_fd_poll_ready(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVBlkioState *s = bs->opaque;
+    BlkioAIOCB *acb = s->pending_completion.user_data;
+
+    blkio_aiocb_complete(acb, s->pending_completion.ret);
+
+    /* Process any remaining completions */
+    blkio_completion_fd_read(opaque);
+}
+
+static void blkio_attach_aio_context(BlockDriverState *bs,
+                                     AioContext *new_context)
+{
+    BDRVBlkioState *s = bs->opaque;
+
+    aio_set_fd_handler(new_context,
+                       s->completion_fd,
+                       false,
+                       blkio_completion_fd_read,
+                       NULL,
+                       blkio_completion_fd_poll,
+                       blkio_completion_fd_poll_ready,
+                       bs);
+}
+
+static void blkio_detach_aio_context(BlockDriverState *bs)
+{
+    BDRVBlkioState *s = bs->opaque;
+
+    aio_set_fd_handler(bdrv_get_aio_context(bs),
+                       s->completion_fd,
+                       false, NULL, NULL, NULL, NULL, NULL);
+}
+
+static const AIOCBInfo blkio_aiocb_info = {
+    .aiocb_size = sizeof(BlkioAIOCB),
+};
+
+/* Create a BlkioAIOCB */
+static BlkioAIOCB *blkio_aiocb_get(BlockDriverState *bs,
+                                   BlockCompletionFunc *cb,
+                                   void *opaque)
+{
+    BlkioAIOCB *acb = qemu_aio_get(&blkio_aiocb_info, bs, cb, opaque);
+
+    /* A few fields need to be initialized, leave the rest... */
+    acb->qiov.niov = 0;
+    acb->mem_region.len = 0;
+    return acb;
+}
+
+/* s->lock must be held */
+static int blkio_aiocb_init_mem_region_locked(BlkioAIOCB *acb, size_t len)
+{
+    BDRVBlkioState *s = acb->common.bs->opaque;
+    size_t mem_region_len = QEMU_ALIGN_UP(len, s->mem_region_alignment);
+    int ret;
+
+    ret = blkio_alloc_mem_region(s->blkio, &acb->mem_region, mem_region_len,
+                                 0, NULL);
+    if (ret < 0) {
+        return ret;
+    }
+
+    acb->bounce_iov.iov_base = acb->mem_region.addr;
+    acb->bounce_iov.iov_len = len;
+    return 0;
+}
+
+static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
+        int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags,
+        BlockCompletionFunc *cb, void *opaque)
+{
+    BDRVBlkioState *s = bs->opaque;
+    struct iovec *iov = qiov->iov;
+    int iovcnt = qiov->niov;
+    BlkioAIOCB *acb;
+    int ret;
+
+    QEMU_LOCK_GUARD(&s->lock);
+
+    acb = blkio_aiocb_get(bs, cb, opaque);
+
+    if (s->needs_mem_regions) {
+        if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) {
+            qemu_aio_unref(&acb->common);
+            return NULL;
+        }
+
+        /* Copy qiov because we'll call qemu_iovec_from_buf() on completion */
+        qemu_iovec_init_slice(&acb->qiov, qiov, 0, qiov->size);
+
+        iov = &acb->bounce_iov;
+        iovcnt = 1;
+    }
+
+    ret = blkioq_readv(s->blkioq, offset, iov, iovcnt, acb, 0);
+    if (ret < 0) {
+        if (s->needs_mem_regions) {
+            blkio_free_mem_region(s->blkio, &acb->mem_region);
+            qemu_iovec_destroy(&acb->qiov);
+        }
+        qemu_aio_unref(&acb->common);
+        return NULL;
+    }
+
+    if (qatomic_read(&bs->io_plugged) == 0) {
+        blkioq_submit_and_wait(s->blkioq, 0, NULL, NULL);
+    }
+
+    return &acb->common;
+}
+
+static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset,
+        int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags,
+        BlockCompletionFunc *cb, void *opaque)
+{
+    uint32_t blkio_flags = (flags & BDRV_REQ_FUA) ? BLKIO_REQ_FUA : 0;
+    BDRVBlkioState *s = bs->opaque;
+    struct iovec *iov = qiov->iov;
+    int iovcnt = qiov->niov;
+    BlkioAIOCB *acb;
+    int ret;
+
+    QEMU_LOCK_GUARD(&s->lock);
+
+    acb = blkio_aiocb_get(bs, cb, opaque);
+
+    if (s->needs_mem_regions) {
+        if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) {
+            qemu_aio_unref(&acb->common);
+            return NULL;
+        }
+
+        qemu_iovec_to_buf(qiov, 0, acb->bounce_iov.iov_base, bytes);
+
+        iov = &acb->bounce_iov;
+        iovcnt = 1;
+    }
+
+    ret = blkioq_writev(s->blkioq, offset, iov, iovcnt, acb, blkio_flags);
+    if (ret < 0) {
+        if (s->needs_mem_regions) {
+            blkio_free_mem_region(s->blkio, &acb->mem_region);
+        }
+        qemu_aio_unref(&acb->common);
+        return NULL;
+    }
+
+    if (qatomic_read(&bs->io_plugged) == 0) {
+        blkioq_submit_and_wait(s->blkioq, 0, NULL, NULL);
+    }
+
+    return &acb->common;
+}
+
+static BlockAIOCB *blkio_aio_flush(BlockDriverState *bs,
+                                   BlockCompletionFunc *cb,
+                                   void *opaque)
+{
+    BDRVBlkioState *s = bs->opaque;
+    BlkioAIOCB *acb;
+    int ret;
+
+    QEMU_LOCK_GUARD(&s->lock);
+
+    acb = blkio_aiocb_get(bs, cb, opaque);
+
+    ret = blkioq_flush(s->blkioq, acb, 0);
+    if (ret < 0) {
+        qemu_aio_unref(&acb->common);
+        return NULL;
+    }
+
+    if (qatomic_read(&bs->io_plugged) == 0) {
+        blkioq_submit_and_wait(s->blkioq, 0, NULL, NULL);
+    }
+
+    return &acb->common;
+}
+
+static void blkio_io_unplug(BlockDriverState *bs)
+{
+    BDRVBlkioState *s = bs->opaque;
+
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        blkioq_submit_and_wait(s->blkioq, 0, NULL, NULL);
+    }
+}
+
+static void blkio_parse_filename_io_uring(const char *filename, QDict *options,
+                                          Error **errp)
+{
+    bdrv_parse_filename_strip_prefix(filename, "io_uring:", options);
+}
+
+static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags,
+                           Error **errp)
+{
+    const char *filename = qdict_get_try_str(options, "filename");
+    const char *blkio_driver = bs->drv->protocol_name;
+    BDRVBlkioState *s = bs->opaque;
+    char *errmsg;
+    int ret;
+
+    ret = blkio_create(blkio_driver, &s->blkio, &errmsg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "blkio_create failed: %s", errmsg);
+        free(errmsg);
+        return ret;
+    }
+
+    ret = blkio_set_str(s->blkio, "path", filename, &errmsg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "failed to set path: %s", errmsg);
+        free(errmsg);
+        blkio_destroy(&s->blkio);
+        return ret;
+    }
+
+    if (flags & BDRV_O_NOCACHE) {
+        ret = blkio_set_bool(s->blkio, "direct", true, &errmsg);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "failed to set direct: %s", errmsg);
+            free(errmsg);
+            blkio_destroy(&s->blkio);
+            return ret;
+        }
+    }
+
+    if (!(flags & BDRV_O_RDWR)) {
+        ret = blkio_set_bool(s->blkio, "readonly", true, &errmsg);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "failed to set readonly: %s", errmsg);
+            free(errmsg);
+            blkio_destroy(&s->blkio);
+            return ret;
+        }
+    }
+
+    ret = blkio_connect(s->blkio, &errmsg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "blkio_connect failed: %s", errmsg);
+        free(errmsg);
+        blkio_destroy(&s->blkio);
+        return ret;
+    }
+
+    ret = blkio_get_bool(s->blkio,
+                         "needs-mem-regions",
+                         &s->needs_mem_regions,
+                         &errmsg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "failed to get needs-mem-regions: %s", errmsg);
+        free(errmsg);
+        blkio_destroy(&s->blkio);
+        return ret;
+    }
+
+    ret = blkio_get_uint64(s->blkio,
+                           "mem-region-alignment",
+                           &s->mem_region_alignment,
+                           &errmsg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "failed to get mem-region-alignment: %s", errmsg);
+        free(errmsg);
+        blkio_destroy(&s->blkio);
+        return ret;
+    }
+
+    ret = blkio_start(s->blkio, &errmsg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "blkio_start failed: %s", errmsg);
+        free(errmsg);
+        blkio_destroy(&s->blkio);
+        return ret;
+    }
+
+    bs->supported_write_flags = BDRV_REQ_FUA;
+
+    qemu_mutex_init(&s->lock);
+    s->blkioq = blkio_get_queue(s->blkio, 0);
+    s->completion_fd = blkioq_get_completion_fd(s->blkioq);
+
+    blkio_attach_aio_context(bs, bdrv_get_aio_context(bs));
+
+    qdict_del(options, "filename");
+    return 0;
+}
+
+static void blkio_close(BlockDriverState *bs)
+{
+    BDRVBlkioState *s = bs->opaque;
+
+    qemu_mutex_destroy(&s->lock);
+    blkio_destroy(&s->blkio);
+}
+
+static int64_t blkio_getlength(BlockDriverState *bs)
+{
+    BDRVBlkioState *s = bs->opaque;
+    char *errmsg;
+    uint64_t capacity;
+    int ret;
+
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        ret = blkio_get_uint64(s->blkio, "capacity", &capacity, &errmsg);
+    }
+    if (ret < 0) {
+        free(errmsg);
+        return -ret;
+    }
+
+    return capacity;
+}
+
+static int blkio_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    return 0;
+}
+
+static BlockDriver bdrv_io_uring = {
+    .format_name                = "io_uring",
+    .protocol_name              = "io_uring",
+    .instance_size              = sizeof(BDRVBlkioState),
+    .bdrv_needs_filename        = true,
+    .bdrv_parse_filename        = blkio_parse_filename_io_uring,
+    .bdrv_file_open             = blkio_file_open,
+    .bdrv_close                 = blkio_close,
+    .bdrv_getlength             = blkio_getlength,
+    .has_variable_length        = true,
+    .bdrv_get_info              = blkio_get_info,
+    .bdrv_attach_aio_context    = blkio_attach_aio_context,
+    .bdrv_detach_aio_context    = blkio_detach_aio_context,
+    .bdrv_aio_preadv            = blkio_aio_preadv,
+    .bdrv_aio_pwritev           = blkio_aio_pwritev,
+    .bdrv_aio_flush             = blkio_aio_flush,
+    .bdrv_io_unplug             = blkio_io_unplug,
+
+    /*
+     * TODO
+     * Missing libblkio APIs:
+     * - bdrv_refresh_limits
+     * - write zeroes
+     * - discard
+     * - block_status
+     * - co_invalidate_cache
+     *
+     * Out of scope?
+     * - create
+     * - truncate
+     */
+};
+
+static void bdrv_blkio_init(void)
+{
+    bdrv_register(&bdrv_io_uring);
+}
+
+block_init(bdrv_blkio_init);
diff --git a/tests/qtest/modules-test.c b/tests/qtest/modules-test.c
index c238b3f422..2e0f73fdc6 100644
--- a/tests/qtest/modules-test.c
+++ b/tests/qtest/modules-test.c
@@ -16,6 +16,9 @@ static void test_modules_load(const void *data)
 int main(int argc, char *argv[])
 {
     const char *modules[] = {
+#ifdef CONFIG_BLKIO
+        "block-", "blkio",
+#endif
 #ifdef CONFIG_CURL
         "block-", "curl",
 #endif
diff --git a/block/meson.build b/block/meson.build
index 0b2a60c99b..787667384a 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -92,6 +92,7 @@ block_modules = {}
 
 modsrc = []
 foreach m : [
+  [blkio, 'blkio', files('blkio.c')],
   [curl, 'curl', files('curl.c')],
   [glusterfs, 'gluster', files('gluster.c')],
   [libiscsi, 'iscsi', [files('iscsi.c'), libm]],
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 1e26f4571e..845a29b9b7 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -40,6 +40,7 @@ meson_options_help() {
   printf "%s\n" '  auth-pam        PAM access control'
   printf "%s\n" '  avx2            AVX2 optimizations'
   printf "%s\n" '  avx512f         AVX512F optimizations'
+  printf "%s\n" '  blkio           libblkio block device driver'
   printf "%s\n" '  bochs           bochs image format support'
   printf "%s\n" '  bpf             eBPF support'
   printf "%s\n" '  brlapi          brlapi character device driver'
@@ -146,6 +147,8 @@ _meson_option_parse() {
     --disable-avx2) printf "%s" -Davx2=disabled ;;
     --enable-avx512f) printf "%s" -Davx512f=enabled ;;
     --disable-avx512f) printf "%s" -Davx512f=disabled ;;
+    --enable-blkio) printf "%s" -Dblkio=enabled ;;
+    --disable-blkio) printf "%s" -Dblkio=disabled ;;
     --enable-block-drv-whitelist-in-tools) printf "%s" -Dblock_drv_whitelist_in_tools=true ;;
     --disable-block-drv-whitelist-in-tools) printf "%s" -Dblock_drv_whitelist_in_tools=false ;;
     --enable-bochs) printf "%s" -Dbochs=enabled ;;
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 2/8] numa: call ->ram_block_removed() in ram_block_notifer_remove()
  2022-03-23 11:17 [RFC 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
  2022-03-23 11:17 ` [RFC 1/8] blkio: add io_uring block driver using libblkio Stefan Hajnoczi
@ 2022-03-23 11:17 ` Stefan Hajnoczi
  2022-03-30  8:42   ` David Hildenbrand
  2022-03-23 11:17 ` [RFC 3/8] block: pass size to bdrv_unregister_buf() Stefan Hajnoczi
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-23 11:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, David Hildenbrand, Philippe Mathieu-Daudé,
	Alberto Faria, Markus Armbruster, Yanan Wang, Eduardo Habkost,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Eric Blake, sgarzare

When a RAMBlockNotifier is added, ->ram_block_added() is called with all
existing RAMBlocks. There is no equivalent ->ram_block_removed() call
when a RAMBlockNotifier is removed.

The util/vfio-helpers.c code (the sole user of RAMBlockNotifier) is fine
with this asymmetry because it does not rely on RAMBlockNotifier for
cleanup. It walks its internal list of DMA mappings and unmaps them by
itself.

Future users of RAMBlockNotifier may not have an internal data structure
that records added RAMBlocks so they will need ->ram_block_removed()
callbacks.

This patch makes ram_block_notifier_remove() symmetric with respect to
callbacks. Now util/vfio-helpers.c needs to unmap remaining DMA mappings
after ram_block_notifier_remove() has been called. This is necessary
since users like block/nvme.c may create additional DMA mappings that do
not originate from the RAMBlockNotifier.

Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/core/numa.c      | 17 +++++++++++++++++
 util/vfio-helpers.c |  5 ++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 1aa05dcf42..6bf9694d20 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -822,6 +822,19 @@ static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
     return 0;
 }
 
+static int ram_block_notify_remove_single(RAMBlock *rb, void *opaque)
+{
+    const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+    const ram_addr_t size = qemu_ram_get_used_length(rb);
+    void *host = qemu_ram_get_host_addr(rb);
+    RAMBlockNotifier *notifier = opaque;
+
+    if (host) {
+        notifier->ram_block_removed(notifier, host, size, max_size);
+    }
+    return 0;
+}
+
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
     QLIST_INSERT_HEAD(&ram_list.ramblock_notifiers, n, next);
@@ -835,6 +848,10 @@ void ram_block_notifier_add(RAMBlockNotifier *n)
 void ram_block_notifier_remove(RAMBlockNotifier *n)
 {
     QLIST_REMOVE(n, next);
+
+    if (n->ram_block_removed) {
+        qemu_ram_foreach_block(ram_block_notify_remove_single, n);
+    }
 }
 
 void ram_block_notify_add(void *host, size_t size, size_t max_size)
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index b037d5faa5..dc90496592 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -847,10 +847,13 @@ void qemu_vfio_close(QEMUVFIOState *s)
     if (!s) {
         return;
     }
+
+    ram_block_notifier_remove(&s->ram_notifier);
+
     for (i = 0; i < s->nr_mappings; ++i) {
         qemu_vfio_undo_mapping(s, &s->mappings[i], NULL);
     }
-    ram_block_notifier_remove(&s->ram_notifier);
+
     g_free(s->usable_iova_ranges);
     s->nb_iova_ranges = 0;
     qemu_vfio_reset(s);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 3/8] block: pass size to bdrv_unregister_buf()
  2022-03-23 11:17 [RFC 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
  2022-03-23 11:17 ` [RFC 1/8] blkio: add io_uring block driver using libblkio Stefan Hajnoczi
  2022-03-23 11:17 ` [RFC 2/8] numa: call ->ram_block_removed() in ram_block_notifer_remove() Stefan Hajnoczi
@ 2022-03-23 11:17 ` Stefan Hajnoczi
  2022-03-23 11:17 ` [RFC 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag Stefan Hajnoczi
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-23 11:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, Philippe Mathieu-Daudé,
	Alberto Faria, Markus Armbruster, Yanan Wang, Eduardo Habkost,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Eric Blake, sgarzare

The only implementor of bdrv_register_buf() is block/nvme.c, where the
size is not needed when unregistering a buffer. This is because
util/vfio-helpers.c can look up mappings by address.

Future block drivers that implement bdrv_register_buf() may not be able
to do their job given only the buffer address. Add a size argument to
bdrv_unregister_buf().

Also document the assumptions about
bdrv_register_buf()/bdrv_unregister_buf() calls. The same <host, size>
values that were given to bdrv_register_buf() must be given to
bdrv_unregister_buf().

gcc 11.2.1 emits a spurious warning that img_bench()'s buf_size local
variable might be uninitialized, so it's necessary to silence the
compiler.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/block-global-state.h          | 5 ++++-
 include/block/block_int-common.h            | 2 +-
 include/sysemu/block-backend-global-state.h | 2 +-
 block/block-backend.c                       | 4 ++--
 block/io.c                                  | 6 +++---
 block/nvme.c                                | 2 +-
 qemu-img.c                                  | 4 ++--
 7 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index 25bb69bbef..2295a7c767 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -244,9 +244,12 @@ void bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp);
  * Register/unregister a buffer for I/O. For example, VFIO drivers are
  * interested to know the memory areas that would later be used for I/O, so
  * that they can prepare IOMMU mapping etc., to get better performance.
+ *
+ * Buffers must not overlap and they must be unregistered with the same <host,
+ * size> values that they were registered with.
  */
 void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
-void bdrv_unregister_buf(BlockDriverState *bs, void *host);
+void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size);
 
 void bdrv_cancel_in_flight(BlockDriverState *bs);
 
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 8947abab76..b7a7cbd3a5 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -435,7 +435,7 @@ struct BlockDriver {
      * DMA mapping for hot buffers.
      */
     void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size);
-    void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host);
+    void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host, size_t size);
 
     /*
      * This field is modified only under the BQL, and is part of
diff --git a/include/sysemu/block-backend-global-state.h b/include/sysemu/block-backend-global-state.h
index 2e93a74679..989ec0364b 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -107,7 +107,7 @@ void blk_io_limits_update_group(BlockBackend *blk, const char *group);
 void blk_set_force_allow_inactivate(BlockBackend *blk);
 
 void blk_register_buf(BlockBackend *blk, void *host, size_t size);
-void blk_unregister_buf(BlockBackend *blk, void *host);
+void blk_unregister_buf(BlockBackend *blk, void *host, size_t size);
 
 const BdrvChild *blk_root(BlockBackend *blk);
 
diff --git a/block/block-backend.c b/block/block-backend.c
index e0e1aff4b1..8af00d8a36 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2591,10 +2591,10 @@ void blk_register_buf(BlockBackend *blk, void *host, size_t size)
     bdrv_register_buf(blk_bs(blk), host, size);
 }
 
-void blk_unregister_buf(BlockBackend *blk, void *host)
+void blk_unregister_buf(BlockBackend *blk, void *host, size_t size)
 {
     GLOBAL_STATE_CODE();
-    bdrv_unregister_buf(blk_bs(blk), host);
+    bdrv_unregister_buf(blk_bs(blk), host, size);
 }
 
 int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
diff --git a/block/io.c b/block/io.c
index 3280144a17..a8a7920e29 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3365,16 +3365,16 @@ void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size)
     }
 }
 
-void bdrv_unregister_buf(BlockDriverState *bs, void *host)
+void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size)
 {
     BdrvChild *child;
 
     GLOBAL_STATE_CODE();
     if (bs->drv && bs->drv->bdrv_unregister_buf) {
-        bs->drv->bdrv_unregister_buf(bs, host);
+        bs->drv->bdrv_unregister_buf(bs, host, size);
     }
     QLIST_FOREACH(child, &bs->children, next) {
-        bdrv_unregister_buf(child->bs, host);
+        bdrv_unregister_buf(child->bs, host, size);
     }
 }
 
diff --git a/block/nvme.c b/block/nvme.c
index 552029931d..88485e77f1 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1592,7 +1592,7 @@ static void nvme_register_buf(BlockDriverState *bs, void *host, size_t size)
     }
 }
 
-static void nvme_unregister_buf(BlockDriverState *bs, void *host)
+static void nvme_unregister_buf(BlockDriverState *bs, void *host, size_t size)
 {
     BDRVNVMeState *s = bs->opaque;
 
diff --git a/qemu-img.c b/qemu-img.c
index 1caddfb23a..493fad758a 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4363,7 +4363,7 @@ static int img_bench(int argc, char **argv)
     struct timeval t1, t2;
     int i;
     bool force_share = false;
-    size_t buf_size;
+    size_t buf_size = 0;
 
     for (;;) {
         static const struct option long_options[] = {
@@ -4585,7 +4585,7 @@ static int img_bench(int argc, char **argv)
 
 out:
     if (data.buf) {
-        blk_unregister_buf(blk, data.buf);
+        blk_unregister_buf(blk, data.buf, buf_size);
     }
     qemu_vfree(data.buf);
     blk_unref(blk);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag
  2022-03-23 11:17 [RFC 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2022-03-23 11:17 ` [RFC 3/8] block: pass size to bdrv_unregister_buf() Stefan Hajnoczi
@ 2022-03-23 11:17 ` Stefan Hajnoczi
  2022-03-23 11:17 ` [RFC 5/8] block: add BlockRAMRegistrar Stefan Hajnoczi
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-23 11:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, Philippe Mathieu-Daudé,
	Alberto Faria, Markus Armbruster, Yanan Wang, Eduardo Habkost,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Eric Blake, sgarzare

Block drivers may optimize I/O requests accessing buffers previously
registered with bdrv_register_buf(). Checking whether all elements of a
request's QEMUIOVector are within previously registered buffers is
expensive, so we need a hint from the user to avoid costly checks.

Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all
QEMUIOVector elements in an I/O request are known to be within
previously registered buffers.

bdrv_aligned_preadv() is strict in validating supported read flags and
its assertions fail when it sees BDRV_REQ_REGISTERED_BUF. There is no
harm in passing BDRV_REQ_REGISTERED_BUF to block drivers that do not
support it, so update the assertions to ignore BDRV_REQ_REGISTERED_BUF.

Care must be taken to clear the flag when the block layer or filter
drivers replace QEMUIOVector elements with bounce buffers since these
have not been registered with bdrv_register_buf(). A lot of the changes
in this commit deal with clearing the flag in those cases.

Ensuring that the flag is cleared properly is somewhat invasive to
implement across the block layer and it's hard to spot when future code
changes accidentally break it. Another option might be to add a flag to
QEMUIOVector itself and clear it in qemu_iovec_*() functions that modify
elements. That is more robust but somewhat of a layering violation, so I
haven't attempted that.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/block-common.h |  9 +++++++++
 block/blkverify.c            |  4 ++--
 block/crypto.c               |  2 ++
 block/io.c                   | 30 +++++++++++++++++++++++-------
 block/mirror.c               |  2 ++
 block/raw-format.c           |  2 ++
 6 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index fdb7306e78..061606e867 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -80,6 +80,15 @@ typedef enum {
      */
     BDRV_REQ_MAY_UNMAP          = 0x4,
 
+    /*
+     * An optimization hint when all QEMUIOVector elements are within
+     * previously registered bdrv_register_buf() memory ranges.
+     *
+     * Code that replaces the user's QEMUIOVector elements with bounce buffers
+     * must take care to clear this flag.
+     */
+    BDRV_REQ_REGISTERED_BUF     = 0x8,
+
     BDRV_REQ_FUA                = 0x10,
     BDRV_REQ_WRITE_COMPRESSED   = 0x20,
 
diff --git a/block/blkverify.c b/block/blkverify.c
index e4a37af3b2..d624f4fd05 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -235,8 +235,8 @@ blkverify_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
     qemu_iovec_init(&raw_qiov, qiov->niov);
     qemu_iovec_clone(&raw_qiov, qiov, buf);
 
-    ret = blkverify_co_prwv(bs, &r, offset, bytes, qiov, &raw_qiov, flags,
-                            false);
+    ret = blkverify_co_prwv(bs, &r, offset, bytes, qiov, &raw_qiov,
+                            flags & ~BDRV_REQ_REGISTERED_BUF, false);
 
     cmp_offset = qemu_iovec_compare(qiov, &raw_qiov);
     if (cmp_offset != -1) {
diff --git a/block/crypto.c b/block/crypto.c
index 1ba82984ef..c900355adb 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -473,6 +473,8 @@ block_crypto_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
     uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block);
     uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block);
 
+    flags &= ~BDRV_REQ_REGISTERED_BUF;
+
     assert(!(flags & ~BDRV_REQ_FUA));
     assert(payload_offset < INT64_MAX);
     assert(QEMU_IS_ALIGNED(offset, sector_size));
diff --git a/block/io.c b/block/io.c
index a8a7920e29..139e36c2e1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1556,11 +1556,14 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
     max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
                                    align);
 
-    /* TODO: We would need a per-BDS .supported_read_flags and
+    /*
+     * TODO: We would need a per-BDS .supported_read_flags and
      * potential fallback support, if we ever implement any read flags
      * to pass through to drivers.  For now, there aren't any
-     * passthrough flags.  */
-    assert(!(flags & ~(BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH)));
+     * passthrough flags except the BDRV_REQ_REGISTERED_BUF optimization hint.
+     */
+    assert(!(flags & ~(BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH |
+                       BDRV_REQ_REGISTERED_BUF)));
 
     /* Handle Copy on Read and associated serialisation */
     if (flags & BDRV_REQ_COPY_ON_READ) {
@@ -1601,7 +1604,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
         goto out;
     }
 
-    assert(!(flags & ~bs->supported_read_flags));
+    assert(!(flags & ~(bs->supported_read_flags | BDRV_REQ_REGISTERED_BUF)));
 
     max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
     if (bytes <= max_bytes && bytes <= max_transfer) {
@@ -1790,7 +1793,8 @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
 static int bdrv_pad_request(BlockDriverState *bs,
                             QEMUIOVector **qiov, size_t *qiov_offset,
                             int64_t *offset, int64_t *bytes,
-                            BdrvRequestPadding *pad, bool *padded)
+                            BdrvRequestPadding *pad, bool *padded,
+                            BdrvRequestFlags *flags)
 {
     int ret;
 
@@ -1818,6 +1822,10 @@ static int bdrv_pad_request(BlockDriverState *bs,
     if (padded) {
         *padded = true;
     }
+    if (flags) {
+        /* Can't use optimization hint with bounce buffer */
+        *flags &= ~BDRV_REQ_REGISTERED_BUF;
+    }
 
     return 0;
 }
@@ -1872,7 +1880,7 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
     }
 
     ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
-                           NULL);
+                           NULL, &flags);
     if (ret < 0) {
         goto fail;
     }
@@ -1917,6 +1925,11 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
         return -ENOTSUP;
     }
 
+    /* By definition there is no user buffer so this flag doesn't make sense */
+    if (flags & BDRV_REQ_REGISTERED_BUF) {
+        return -EINVAL;
+    }
+
     /* Invalidate the cached block-status data range if this write overlaps */
     bdrv_bsc_invalidate_range(bs, offset, bytes);
 
@@ -2202,6 +2215,9 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
     bool padding;
     BdrvRequestPadding pad;
 
+    /* This flag doesn't make sense for padding or zero writes */
+    flags &= ~BDRV_REQ_REGISTERED_BUF;
+
     padding = bdrv_init_padding(bs, offset, bytes, &pad);
     if (padding) {
         assert(!(flags & BDRV_REQ_NO_WAIT));
@@ -2319,7 +2335,7 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
          * alignment only if there is no ZERO flag.
          */
         ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
-                               &padded);
+                               &padded, &flags);
         if (ret < 0) {
             return ret;
         }
diff --git a/block/mirror.c b/block/mirror.c
index d8ecb9efa2..3a0773622d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1477,6 +1477,8 @@ static int coroutine_fn bdrv_mirror_top_pwritev(BlockDriverState *bs,
         qemu_iovec_init(&bounce_qiov, 1);
         qemu_iovec_add(&bounce_qiov, bounce_buf, bytes);
         qiov = &bounce_qiov;
+
+        flags &= ~BDRV_REQ_REGISTERED_BUF;
     }
 
     ret = bdrv_mirror_top_do_write(bs, MIRROR_METHOD_COPY, offset, bytes, qiov,
diff --git a/block/raw-format.c b/block/raw-format.c
index 69fd650eaf..9bae3dd7f2 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -258,6 +258,8 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offset,
         qemu_iovec_add(&local_qiov, buf, 512);
         qemu_iovec_concat(&local_qiov, qiov, 512, qiov->size - 512);
         qiov = &local_qiov;
+
+        flags &= ~BDRV_REQ_REGISTERED_BUF;
     }
 
     ret = raw_adjust_offset(bs, &offset, bytes, true);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 5/8] block: add BlockRAMRegistrar
  2022-03-23 11:17 [RFC 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2022-03-23 11:17 ` [RFC 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag Stefan Hajnoczi
@ 2022-03-23 11:17 ` Stefan Hajnoczi
  2022-03-23 11:17 ` [RFC 6/8] stubs: add memory_region_from_host() and memory_region_get_fd() Stefan Hajnoczi
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-23 11:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, Philippe Mathieu-Daudé,
	Alberto Faria, Markus Armbruster, Yanan Wang, Eduardo Habkost,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Eric Blake, sgarzare

Emulated devices and other BlockBackend users wishing to take advantage
of blk_register_buf() all have the same repetitive job: register
RAMBlocks with the BlockBackend using RAMBlockNotifier.

Add a BlockRAMRegistrar API to do this. A later commit will use this
from hw/block/virtio-blk.c.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 MAINTAINERS                          |  1 +
 include/sysemu/block-ram-registrar.h | 30 +++++++++++++++++++++
 block/block-ram-registrar.c          | 39 ++++++++++++++++++++++++++++
 block/meson.build                    |  1 +
 4 files changed, 71 insertions(+)
 create mode 100644 include/sysemu/block-ram-registrar.h
 create mode 100644 block/block-ram-registrar.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 0fb08dd4f7..da6ec4d79b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2462,6 +2462,7 @@ F: block*
 F: block/
 F: hw/block/
 F: include/block/
+F: include/sysemu/block-*.h
 F: qemu-img*
 F: docs/tools/qemu-img.rst
 F: qemu-io*
diff --git a/include/sysemu/block-ram-registrar.h b/include/sysemu/block-ram-registrar.h
new file mode 100644
index 0000000000..09d63f64b2
--- /dev/null
+++ b/include/sysemu/block-ram-registrar.h
@@ -0,0 +1,30 @@
+/*
+ * BlockBackend RAM Registrar
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef BLOCK_RAM_REGISTRAR_H
+#define BLOCK_RAM_REGISTRAR_H
+
+#include "exec/ramlist.h"
+
+/**
+ * struct BlockRAMRegistrar:
+ *
+ * Keeps RAMBlock memory registered with a BlockBackend using
+ * blk_register_buf() including hotplugged memory.
+ *
+ * Emulated devices or other BlockBackend users initialize a BlockRAMRegistrar
+ * with blk_ram_registrar_init() before submitting I/O requests with the
+ * BLK_REQ_REGISTERED_BUF flag set.
+ */
+typedef struct {
+    BlockBackend *blk;
+    RAMBlockNotifier notifier;
+} BlockRAMRegistrar;
+
+void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk);
+void blk_ram_registrar_destroy(BlockRAMRegistrar *r);
+
+#endif /* BLOCK_RAM_REGISTRAR_H */
diff --git a/block/block-ram-registrar.c b/block/block-ram-registrar.c
new file mode 100644
index 0000000000..32a14b69ae
--- /dev/null
+++ b/block/block-ram-registrar.c
@@ -0,0 +1,39 @@
+/*
+ * BlockBackend RAM Registrar
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/block-backend.h"
+#include "sysemu/block-ram-registrar.h"
+
+static void ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+                            size_t max_size)
+{
+    BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier);
+    blk_register_buf(r->blk, host, max_size);
+}
+
+static void ram_block_removed(RAMBlockNotifier *n, void *host, size_t size,
+                              size_t max_size)
+{
+    BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier);
+    blk_unregister_buf(r->blk, host, max_size);
+}
+
+void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk)
+{
+    r->blk = blk;
+    r->notifier = (RAMBlockNotifier){
+        .ram_block_added = ram_block_added,
+        .ram_block_removed = ram_block_removed,
+    };
+
+    ram_block_notifier_add(&r->notifier);
+}
+
+void blk_ram_registrar_destroy(BlockRAMRegistrar *r)
+{
+    ram_block_notifier_remove(&r->notifier);
+}
diff --git a/block/meson.build b/block/meson.build
index 787667384a..b315593054 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -46,6 +46,7 @@ block_ss.add(files(
 ), zstd, zlib, gnutls)
 
 softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
+softmmu_ss.add(files('block-ram-registrar.c'))
 
 if get_option('qcow1').allowed()
   block_ss.add(files('qcow.c'))
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 6/8] stubs: add memory_region_from_host() and memory_region_get_fd()
  2022-03-23 11:17 [RFC 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2022-03-23 11:17 ` [RFC 5/8] block: add BlockRAMRegistrar Stefan Hajnoczi
@ 2022-03-23 11:17 ` Stefan Hajnoczi
  2022-03-23 11:17 ` [RFC 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization Stefan Hajnoczi
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-23 11:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, Philippe Mathieu-Daudé,
	Alberto Faria, Markus Armbruster, Yanan Wang, Eduardo Habkost,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Eric Blake, sgarzare

The blkio block driver will need to look up the file descriptor for a
given pointer. This is possible in softmmu builds where the memory API
is available for querying guest RAM.

Add stubs so tools like qemu-img that link the block layer still build
successfully. In this case there is no guest RAM but that is fine.
Bounce buffers and their file descriptors will be allocated with
libblkio's blkio_alloc_mem_region() so we won't rely on QEMU's
memory_region_get_fd() in that case.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 stubs/memory.c    | 13 +++++++++++++
 stubs/meson.build |  1 +
 2 files changed, 14 insertions(+)
 create mode 100644 stubs/memory.c

diff --git a/stubs/memory.c b/stubs/memory.c
new file mode 100644
index 0000000000..e9ec4e384b
--- /dev/null
+++ b/stubs/memory.c
@@ -0,0 +1,13 @@
+#include "qemu/osdep.h"
+#include "exec/memory.h"
+
+MemoryRegion *memory_region_from_host(void *host, ram_addr_t *offset)
+{
+    return NULL;
+}
+
+int memory_region_get_fd(MemoryRegion *mr)
+{
+    return -1;
+}
+
diff --git a/stubs/meson.build b/stubs/meson.build
index 6f80fec761..1e274d2db2 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -25,6 +25,7 @@ stub_ss.add(files('is-daemonized.c'))
 if libaio.found()
   stub_ss.add(files('linux-aio.c'))
 endif
+stub_ss.add(files('memory.c'))
 stub_ss.add(files('migr-blocker.c'))
 stub_ss.add(files('module-opts.c'))
 stub_ss.add(files('monitor.c'))
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization
  2022-03-23 11:17 [RFC 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2022-03-23 11:17 ` [RFC 6/8] stubs: add memory_region_from_host() and memory_region_get_fd() Stefan Hajnoczi
@ 2022-03-23 11:17 ` Stefan Hajnoczi
  2022-03-29 15:24   ` Stefano Garzarella
  2022-03-23 11:17 ` [RFC 8/8] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint Stefan Hajnoczi
  2022-03-29 15:27 ` [RFC 0/8] blkio: add libblkio BlockDriver Stefano Garzarella
  8 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-23 11:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, Philippe Mathieu-Daudé,
	Alberto Faria, Markus Armbruster, Yanan Wang, Eduardo Habkost,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Eric Blake, sgarzare

Avoid bounce buffers when QEMUIOVector elements are within previously
registered bdrv_register_buf() buffers.

The idea is that emulated storage controllers will register guest RAM
using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
requests. Therefore no blkio_add_mem_region() calls are necessary in the
performance-critical I/O code path.

This optimization doesn't apply if the I/O buffer is internally
allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
path because BDRV_REQ_REGISTERED_BUF is not set.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/blkio.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 104 insertions(+), 4 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index dd2308b967..78f4ca5f49 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -1,7 +1,9 @@
 #include "qemu/osdep.h"
 #include <blkio.h>
 #include "block/block_int.h"
+#include "exec/memory.h"
 #include "qapi/error.h"
+#include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/module.h"
 
@@ -26,6 +28,9 @@ typedef struct {
     /* Can we skip adding/deleting blkio_mem_regions? */
     bool needs_mem_regions;
 
+    /* Are file descriptors necessary for blkio_mem_regions? */
+    bool needs_mem_region_fd;
+
     /*
      * blkio_completion_fd_poll() stashes the next completion for
      * blkio_completion_fd_poll_ready().
@@ -170,6 +175,8 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
         BlockCompletionFunc *cb, void *opaque)
 {
     BDRVBlkioState *s = bs->opaque;
+    bool needs_mem_regions =
+        s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
     struct iovec *iov = qiov->iov;
     int iovcnt = qiov->niov;
     BlkioAIOCB *acb;
@@ -179,7 +186,7 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
 
     acb = blkio_aiocb_get(bs, cb, opaque);
 
-    if (s->needs_mem_regions) {
+    if (needs_mem_regions) {
         if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) {
             qemu_aio_unref(&acb->common);
             return NULL;
@@ -194,7 +201,7 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
 
     ret = blkioq_readv(s->blkioq, offset, iov, iovcnt, acb, 0);
     if (ret < 0) {
-        if (s->needs_mem_regions) {
+        if (needs_mem_regions) {
             blkio_free_mem_region(s->blkio, &acb->mem_region);
             qemu_iovec_destroy(&acb->qiov);
         }
@@ -215,6 +222,8 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset,
 {
     uint32_t blkio_flags = (flags & BDRV_REQ_FUA) ? BLKIO_REQ_FUA : 0;
     BDRVBlkioState *s = bs->opaque;
+    bool needs_mem_regions =
+        s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
     struct iovec *iov = qiov->iov;
     int iovcnt = qiov->niov;
     BlkioAIOCB *acb;
@@ -224,7 +233,7 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset,
 
     acb = blkio_aiocb_get(bs, cb, opaque);
 
-    if (s->needs_mem_regions) {
+    if (needs_mem_regions) {
         if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) {
             qemu_aio_unref(&acb->common);
             return NULL;
@@ -238,7 +247,7 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset,
 
     ret = blkioq_writev(s->blkioq, offset, iov, iovcnt, acb, blkio_flags);
     if (ret < 0) {
-        if (s->needs_mem_regions) {
+        if (needs_mem_regions) {
             blkio_free_mem_region(s->blkio, &acb->mem_region);
         }
         qemu_aio_unref(&acb->common);
@@ -286,6 +295,83 @@ static void blkio_io_unplug(BlockDriverState *bs)
     }
 }
 
+static void blkio_register_buf(BlockDriverState *bs, void *host, size_t size)
+{
+    BDRVBlkioState *s = bs->opaque;
+    char *errmsg;
+    int ret;
+    struct blkio_mem_region region = (struct blkio_mem_region){
+        .addr = host,
+        .len = size,
+        .fd = -1,
+    };
+
+    if (((uintptr_t)host | size) % s->mem_region_alignment) {
+        error_report_once("%s: skipping unaligned buf %p with size %zu",
+                          __func__, host, size);
+        return; /* skip unaligned */
+    }
+
+    /* Attempt to find the fd for a MemoryRegion */
+    if (s->needs_mem_region_fd) {
+        int fd = -1;
+        ram_addr_t offset;
+        MemoryRegion *mr;
+
+        /*
+         * bdrv_register_buf() is called with the BQL held so mr lives at least
+         * until this function returns.
+         */
+        mr = memory_region_from_host(host, &offset);
+        if (mr) {
+            fd = memory_region_get_fd(mr);
+        }
+        if (fd == -1) {
+            error_report_once("%s: skipping fd-less buf %p with size %zu",
+                              __func__, host, size);
+            return; /* skip if there is no fd */
+        }
+
+        region.fd = fd;
+        region.fd_offset = offset;
+    }
+
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        ret = blkio_add_mem_region(s->blkio, &region, &errmsg);
+    }
+
+    if (ret < 0) {
+        error_report_once("Failed to add blkio mem region %p with size %zu: %s",
+                          host, size, errmsg);
+        free(errmsg);
+    }
+}
+
+static void blkio_unregister_buf(BlockDriverState *bs, void *host, size_t size)
+{
+    BDRVBlkioState *s = bs->opaque;
+    char *errmsg;
+    int ret;
+    struct blkio_mem_region region = (struct blkio_mem_region){
+        .addr = host,
+        .len = size,
+    };
+
+    if (((uintptr_t)host | size) % s->mem_region_alignment) {
+        return; /* skip unaligned */
+    }
+
+    WITH_QEMU_LOCK_GUARD(&s->lock) {
+        ret = blkio_del_mem_region(s->blkio, &region, &errmsg);
+    }
+
+    if (ret < 0) {
+        error_report_once("Failed to delete blkio mem region %p with size %zu: %s",
+                          host, size, errmsg);
+        free(errmsg);
+    }
+}
+
 static void blkio_parse_filename_io_uring(const char *filename, QDict *options,
                                           Error **errp)
 {
@@ -356,6 +442,18 @@ static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags,
         return ret;
     }
 
+    ret = blkio_get_bool(s->blkio,
+                         "needs-mem-region-fd",
+                         &s->needs_mem_region_fd,
+                         &errmsg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "failed to get needs-mem-region-fd: %s", errmsg);
+        free(errmsg);
+        blkio_destroy(&s->blkio);
+        return ret;
+    }
+
     ret = blkio_get_uint64(s->blkio,
                            "mem-region-alignment",
                            &s->mem_region_alignment,
@@ -436,6 +534,8 @@ static BlockDriver bdrv_io_uring = {
     .bdrv_aio_pwritev           = blkio_aio_pwritev,
     .bdrv_aio_flush             = blkio_aio_flush,
     .bdrv_io_unplug             = blkio_io_unplug,
+    .bdrv_register_buf          = blkio_register_buf,
+    .bdrv_unregister_buf        = blkio_unregister_buf,
 
     /*
      * TODO
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 8/8] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint
  2022-03-23 11:17 [RFC 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2022-03-23 11:17 ` [RFC 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization Stefan Hajnoczi
@ 2022-03-23 11:17 ` Stefan Hajnoczi
  2022-03-29 15:27 ` [RFC 0/8] blkio: add libblkio BlockDriver Stefano Garzarella
  8 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-23 11:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, Philippe Mathieu-Daudé,
	Alberto Faria, Markus Armbruster, Yanan Wang, Eduardo Habkost,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, Fam Zheng,
	Eric Blake, sgarzare

Register guest RAM using BlockRAMRegistrar and set the
BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory
accesses in I/O requests.

This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely
on DMA mapping/unmapping.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/virtio/virtio-blk.h |  2 ++
 hw/block/virtio-blk.c          | 13 +++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index d311c57cca..7f589b4146 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -19,6 +19,7 @@
 #include "hw/block/block.h"
 #include "sysemu/iothread.h"
 #include "sysemu/block-backend.h"
+#include "sysemu/block-ram-registrar.h"
 #include "qom/object.h"
 
 #define TYPE_VIRTIO_BLK "virtio-blk-device"
@@ -64,6 +65,7 @@ struct VirtIOBlock {
     struct VirtIOBlockDataPlane *dataplane;
     uint64_t host_features;
     size_t config_size;
+    BlockRAMRegistrar blk_ram_registrar;
 };
 
 typedef struct VirtIOBlockReq {
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 540c38f829..a18cf05f14 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -21,6 +21,7 @@
 #include "hw/block/block.h"
 #include "hw/qdev-properties.h"
 #include "sysemu/blockdev.h"
+#include "sysemu/block-ram-registrar.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
 #include "hw/virtio/virtio-blk.h"
@@ -421,11 +422,13 @@ static inline void submit_requests(BlockBackend *blk, MultiReqBuffer *mrb,
     }
 
     if (is_write) {
-        blk_aio_pwritev(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0,
-                        virtio_blk_rw_complete, mrb->reqs[start]);
+        blk_aio_pwritev(blk, sector_num << BDRV_SECTOR_BITS, qiov,
+                        BDRV_REQ_REGISTERED_BUF, virtio_blk_rw_complete,
+                        mrb->reqs[start]);
     } else {
-        blk_aio_preadv(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0,
-                       virtio_blk_rw_complete, mrb->reqs[start]);
+        blk_aio_preadv(blk, sector_num << BDRV_SECTOR_BITS, qiov,
+                       BDRV_REQ_REGISTERED_BUF, virtio_blk_rw_complete,
+                       mrb->reqs[start]);
     }
 }
 
@@ -1228,6 +1231,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
     }
 
     s->change = qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, s);
+    blk_ram_registrar_init(&s->blk_ram_registrar, s->blk);
     blk_set_dev_ops(s->blk, &virtio_block_ops, s);
     blk_set_guest_block_size(s->blk, s->conf.conf.logical_block_size);
 
@@ -1255,6 +1259,7 @@ static void virtio_blk_device_unrealize(DeviceState *dev)
     }
     qemu_coroutine_decrease_pool_batch_size(conf->num_queues * conf->queue_size
                                             / 2);
+    blk_ram_registrar_destroy(&s->blk_ram_registrar);
     qemu_del_vm_change_state_handler(s->change);
     blockdev_mark_auto_del(s->blk);
     virtio_cleanup(vdev);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization
  2022-03-23 11:17 ` [RFC 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization Stefan Hajnoczi
@ 2022-03-29 15:24   ` Stefano Garzarella
  2022-03-30 10:45     ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2022-03-29 15:24 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, qemu-devel, Alberto Faria, Philippe Mathieu-Daudé,
	Yanan Wang, Eduardo Habkost, Hanna Reitz, Paolo Bonzini,
	Fam Zheng, Eric Blake, Markus Armbruster

On Wed, Mar 23, 2022 at 11:17:26AM +0000, Stefan Hajnoczi wrote:
>Avoid bounce buffers when QEMUIOVector elements are within previously
>registered bdrv_register_buf() buffers.
>
>The idea is that emulated storage controllers will register guest RAM
>using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
>requests. Therefore no blkio_add_mem_region() calls are necessary in the
>performance-critical I/O code path.
>
>This optimization doesn't apply if the I/O buffer is internally
>allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
>path because BDRV_REQ_REGISTERED_BUF is not set.
>
>Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>---
> block/blkio.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 104 insertions(+), 4 deletions(-)
>
>diff --git a/block/blkio.c b/block/blkio.c
>index dd2308b967..78f4ca5f49 100644
>--- a/block/blkio.c
>+++ b/block/blkio.c
>@@ -1,7 +1,9 @@
> #include "qemu/osdep.h"
> #include <blkio.h>
> #include "block/block_int.h"
>+#include "exec/memory.h"
> #include "qapi/error.h"
>+#include "qemu/error-report.h"
> #include "qapi/qmp/qdict.h"
> #include "qemu/module.h"
>
>@@ -26,6 +28,9 @@ typedef struct {
>     /* Can we skip adding/deleting blkio_mem_regions? */
>     bool needs_mem_regions;
>
>+    /* Are file descriptors necessary for blkio_mem_regions? */
>+    bool needs_mem_region_fd;
>+
>     /*
>      * blkio_completion_fd_poll() stashes the next completion for
>      * blkio_completion_fd_poll_ready().
>@@ -170,6 +175,8 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
>         BlockCompletionFunc *cb, void *opaque)
> {
>     BDRVBlkioState *s = bs->opaque;
>+    bool needs_mem_regions =
>+        s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
>     struct iovec *iov = qiov->iov;
>     int iovcnt = qiov->niov;
>     BlkioAIOCB *acb;
>@@ -179,7 +186,7 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
>
>     acb = blkio_aiocb_get(bs, cb, opaque);
>
>-    if (s->needs_mem_regions) {
>+    if (needs_mem_regions) {
>         if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) {
>             qemu_aio_unref(&acb->common);
>             return NULL;
>@@ -194,7 +201,7 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
>
>     ret = blkioq_readv(s->blkioq, offset, iov, iovcnt, acb, 0);
>     if (ret < 0) {
>-        if (s->needs_mem_regions) {
>+        if (needs_mem_regions) {
>             blkio_free_mem_region(s->blkio, &acb->mem_region);
>             qemu_iovec_destroy(&acb->qiov);
>         }
>@@ -215,6 +222,8 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset,
> {
>     uint32_t blkio_flags = (flags & BDRV_REQ_FUA) ? BLKIO_REQ_FUA : 0;
>     BDRVBlkioState *s = bs->opaque;
>+    bool needs_mem_regions =
>+        s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
>     struct iovec *iov = qiov->iov;
>     int iovcnt = qiov->niov;
>     BlkioAIOCB *acb;
>@@ -224,7 +233,7 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset,
>
>     acb = blkio_aiocb_get(bs, cb, opaque);
>
>-    if (s->needs_mem_regions) {
>+    if (needs_mem_regions) {
>         if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) {
>             qemu_aio_unref(&acb->common);
>             return NULL;
>@@ -238,7 +247,7 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset,
>
>     ret = blkioq_writev(s->blkioq, offset, iov, iovcnt, acb, blkio_flags);
>     if (ret < 0) {
>-        if (s->needs_mem_regions) {
>+        if (needs_mem_regions) {
>             blkio_free_mem_region(s->blkio, &acb->mem_region);
>         }
>         qemu_aio_unref(&acb->common);
>@@ -286,6 +295,83 @@ static void blkio_io_unplug(BlockDriverState *bs)
>     }
> }
>
>+static void blkio_register_buf(BlockDriverState *bs, void *host, size_t size)
>+{
>+    BDRVBlkioState *s = bs->opaque;
>+    char *errmsg;
>+    int ret;
>+    struct blkio_mem_region region = (struct blkio_mem_region){
>+        .addr = host,
>+        .len = size,
>+        .fd = -1,
>+    };
>+
>+    if (((uintptr_t)host | size) % s->mem_region_alignment) {
>+        error_report_once("%s: skipping unaligned buf %p with size %zu",
>+                          __func__, host, size);
>+        return; /* skip unaligned */
>+    }
>+
>+    /* Attempt to find the fd for a MemoryRegion */
>+    if (s->needs_mem_region_fd) {
>+        int fd = -1;
>+        ram_addr_t offset;
>+        MemoryRegion *mr;
>+
>+        /*
>+         * bdrv_register_buf() is called with the BQL held so mr lives at least
>+         * until this function returns.
>+         */
>+        mr = memory_region_from_host(host, &offset);
>+        if (mr) {
>+            fd = memory_region_get_fd(mr);
>+        }
>+        if (fd == -1) {
>+            error_report_once("%s: skipping fd-less buf %p with size %zu",
>+                              __func__, host, size);
>+            return; /* skip if there is no fd */
>+        }
>+
>+        region.fd = fd;
>+        region.fd_offset = offset;
>+    }
>+
>+    WITH_QEMU_LOCK_GUARD(&s->lock) {
>+        ret = blkio_add_mem_region(s->blkio, &region, &errmsg);
>+    }
>+
>+    if (ret < 0) {
>+        error_report_once("Failed to add blkio mem region %p with size %zu: %s",
>+                          host, size, errmsg);
>+        free(errmsg);
>+    }
>+}
>+
>+static void blkio_unregister_buf(BlockDriverState *bs, void *host, size_t size)
>+{
>+    BDRVBlkioState *s = bs->opaque;
>+    char *errmsg;
>+    int ret;
>+    struct blkio_mem_region region = (struct blkio_mem_region){
>+        .addr = host,
>+        .len = size,
>+    };
>+
>+    if (((uintptr_t)host | size) % s->mem_region_alignment) {
>+        return; /* skip unaligned */
>+    }
>+
>+    WITH_QEMU_LOCK_GUARD(&s->lock) {
>+        ret = blkio_del_mem_region(s->blkio, &region, &errmsg);
>+    }
>+
>+    if (ret < 0) {
>+        error_report_once("Failed to delete blkio mem region %p with size %zu: %s",
>+                          host, size, errmsg);
>+        free(errmsg);
>+    }
>+}
>+
> static void blkio_parse_filename_io_uring(const char *filename, QDict *options,
>                                           Error **errp)
> {
>@@ -356,6 +442,18 @@ static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags,
>         return ret;
>     }
>
>+    ret = blkio_get_bool(s->blkio,
>+                         "needs-mem-region-fd",
>+                         &s->needs_mem_region_fd,
>+                         &errmsg);
>+    if (ret < 0) {
>+        error_setg_errno(errp, -ret,
>+                         "failed to get needs-mem-region-fd: %s", errmsg);
>+        free(errmsg);
>+        blkio_destroy(&s->blkio);
>+        return ret;
>+    }
>+
>     ret = blkio_get_uint64(s->blkio,
>                            "mem-region-alignment",
>                            &s->mem_region_alignment,

I already mentioned on IRC while testing the series, but I'm writing it 
here so we don't forget ;-)

To prevent bdrv_driver_pwritev() from removing the 
BDRV_REQ_REGISTERED_BUF flag from requests, we should add this: 

@@ -474,7 +474,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags,
          return ret;
      }

-    bs->supported_write_flags = BDRV_REQ_FUA;
+    bs->supported_write_flags = BDRV_REQ_FUA | BDRV_REQ_REGISTERED_BUF;

      qemu_mutex_init(&s->lock);
      s->blkioq = blkio_get_queue(s->blkio, 0);


>@@ -436,6 +534,8 @@ static BlockDriver bdrv_io_uring = {
>     .bdrv_aio_pwritev           = blkio_aio_pwritev,
>     .bdrv_aio_flush             = blkio_aio_flush,
>     .bdrv_io_unplug             = blkio_io_unplug,
>+    .bdrv_register_buf          = blkio_register_buf,
>+    .bdrv_unregister_buf        = blkio_unregister_buf,
>
>     /*
>      * TODO
>-- 
>2.35.1
>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 0/8] blkio: add libblkio BlockDriver
  2022-03-23 11:17 [RFC 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2022-03-23 11:17 ` [RFC 8/8] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint Stefan Hajnoczi
@ 2022-03-29 15:27 ` Stefano Garzarella
  8 siblings, 0 replies; 13+ messages in thread
From: Stefano Garzarella @ 2022-03-29 15:27 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, qemu-devel, Alberto Faria, Philippe Mathieu-Daudé,
	Yanan Wang, Eduardo Habkost, Hanna Reitz, Paolo Bonzini,
	Fam Zheng, Eric Blake, Markus Armbruster

On Wed, Mar 23, 2022 at 11:17:19AM +0000, Stefan Hajnoczi wrote:
>This patch series adds a QEMU BlockDriver for libblkio
>(https://gitlab.com/libblkio/libblkio/), a library for high-performance block
>device I/O. Currently libblkio has basic io_uring support with additional
>drivers in development.
>
>The first patch adds the core BlockDriver and most of the libblkio API usage.
>The remainder of the patch series reworks the existing QEMU bdrv_register_buf()
>API so virtio-blk emulation efficiently map guest RAM for libblkio - some
>libblkio drivers require that I/O buffer memory is pre-registered (think VFIO,
>vhost, etc).
>
>This block driver is incomplete because bdrv_refresh_limits() and several other
>APIs are not yet implemented. You can already boot a guest though. Once the
>missing gaps have been filled in I will send a non-RFC patch series.
>
>Regarding the design: each libblkio driver is a separately named BlockDriver.
>That means there is an "io_uring" BlockDriver and not a generic "libblkio"
>BlockDriver. In the future there will be additional BlockDrivers, all defined
>in block/blkio.c. This way QAPI and open parameters are type-safe and mandatory
>parameters can be checked by QEMU.

Indeed it was a great design. Adding the "vhost_vdpa" driver was quick 
and easy.

I've been reviewing and testing it. Aside from the problem in patch 7, 
the rest already looks good to me.

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/8] numa: call ->ram_block_removed() in ram_block_notifer_remove()
  2022-03-23 11:17 ` [RFC 2/8] numa: call ->ram_block_removed() in ram_block_notifer_remove() Stefan Hajnoczi
@ 2022-03-30  8:42   ` David Hildenbrand
  0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2022-03-30  8:42 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, Philippe Mathieu-Daudé,
	Alberto Faria, Markus Armbruster, Yanan Wang, Eduardo Habkost,
	Hanna Reitz, Paolo Bonzini, Fam Zheng, Eric Blake, sgarzare

On 23.03.22 12:17, Stefan Hajnoczi wrote:
> When a RAMBlockNotifier is added, ->ram_block_added() is called with all
> existing RAMBlocks. There is no equivalent ->ram_block_removed() call
> when a RAMBlockNotifier is removed.
> 
> The util/vfio-helpers.c code (the sole user of RAMBlockNotifier) is fine
> with this asymmetry because it does not rely on RAMBlockNotifier for
> cleanup. It walks its internal list of DMA mappings and unmaps them by
> itself.
> 
> Future users of RAMBlockNotifier may not have an internal data structure
> that records added RAMBlocks so they will need ->ram_block_removed()
> callbacks.
> 
> This patch makes ram_block_notifier_remove() symmetric with respect to
> callbacks. Now util/vfio-helpers.c needs to unmap remaining DMA mappings
> after ram_block_notifier_remove() has been called. This is necessary
> since users like block/nvme.c may create additional DMA mappings that do
> not originate from the RAMBlockNotifier.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Reviewed-by: David Hildenbrand <david@redhat.com>


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization
  2022-03-29 15:24   ` Stefano Garzarella
@ 2022-03-30 10:45     ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-30 10:45 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Laurent Vivier, Kevin Wolf, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, qemu-block, Michael S. Tsirkin,
	John Snow, qemu-devel, Alberto Faria, Philippe Mathieu-Daudé,
	Yanan Wang, Eduardo Habkost, Hanna Reitz, Paolo Bonzini,
	Fam Zheng, Eric Blake, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 8421 bytes --]

On Tue, Mar 29, 2022 at 05:24:30PM +0200, Stefano Garzarella wrote:
> On Wed, Mar 23, 2022 at 11:17:26AM +0000, Stefan Hajnoczi wrote:
> > Avoid bounce buffers when QEMUIOVector elements are within previously
> > registered bdrv_register_buf() buffers.
> > 
> > The idea is that emulated storage controllers will register guest RAM
> > using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
> > requests. Therefore no blkio_add_mem_region() calls are necessary in the
> > performance-critical I/O code path.
> > 
> > This optimization doesn't apply if the I/O buffer is internally
> > allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
> > path because BDRV_REQ_REGISTERED_BUF is not set.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > block/blkio.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++--
> > 1 file changed, 104 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/blkio.c b/block/blkio.c
> > index dd2308b967..78f4ca5f49 100644
> > --- a/block/blkio.c
> > +++ b/block/blkio.c
> > @@ -1,7 +1,9 @@
> > #include "qemu/osdep.h"
> > #include <blkio.h>
> > #include "block/block_int.h"
> > +#include "exec/memory.h"
> > #include "qapi/error.h"
> > +#include "qemu/error-report.h"
> > #include "qapi/qmp/qdict.h"
> > #include "qemu/module.h"
> > 
> > @@ -26,6 +28,9 @@ typedef struct {
> >     /* Can we skip adding/deleting blkio_mem_regions? */
> >     bool needs_mem_regions;
> > 
> > +    /* Are file descriptors necessary for blkio_mem_regions? */
> > +    bool needs_mem_region_fd;
> > +
> >     /*
> >      * blkio_completion_fd_poll() stashes the next completion for
> >      * blkio_completion_fd_poll_ready().
> > @@ -170,6 +175,8 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
> >         BlockCompletionFunc *cb, void *opaque)
> > {
> >     BDRVBlkioState *s = bs->opaque;
> > +    bool needs_mem_regions =
> > +        s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
> >     struct iovec *iov = qiov->iov;
> >     int iovcnt = qiov->niov;
> >     BlkioAIOCB *acb;
> > @@ -179,7 +186,7 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
> > 
> >     acb = blkio_aiocb_get(bs, cb, opaque);
> > 
> > -    if (s->needs_mem_regions) {
> > +    if (needs_mem_regions) {
> >         if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) {
> >             qemu_aio_unref(&acb->common);
> >             return NULL;
> > @@ -194,7 +201,7 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset,
> > 
> >     ret = blkioq_readv(s->blkioq, offset, iov, iovcnt, acb, 0);
> >     if (ret < 0) {
> > -        if (s->needs_mem_regions) {
> > +        if (needs_mem_regions) {
> >             blkio_free_mem_region(s->blkio, &acb->mem_region);
> >             qemu_iovec_destroy(&acb->qiov);
> >         }
> > @@ -215,6 +222,8 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset,
> > {
> >     uint32_t blkio_flags = (flags & BDRV_REQ_FUA) ? BLKIO_REQ_FUA : 0;
> >     BDRVBlkioState *s = bs->opaque;
> > +    bool needs_mem_regions =
> > +        s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
> >     struct iovec *iov = qiov->iov;
> >     int iovcnt = qiov->niov;
> >     BlkioAIOCB *acb;
> > @@ -224,7 +233,7 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset,
> > 
> >     acb = blkio_aiocb_get(bs, cb, opaque);
> > 
> > -    if (s->needs_mem_regions) {
> > +    if (needs_mem_regions) {
> >         if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) {
> >             qemu_aio_unref(&acb->common);
> >             return NULL;
> > @@ -238,7 +247,7 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset,
> > 
> >     ret = blkioq_writev(s->blkioq, offset, iov, iovcnt, acb, blkio_flags);
> >     if (ret < 0) {
> > -        if (s->needs_mem_regions) {
> > +        if (needs_mem_regions) {
> >             blkio_free_mem_region(s->blkio, &acb->mem_region);
> >         }
> >         qemu_aio_unref(&acb->common);
> > @@ -286,6 +295,83 @@ static void blkio_io_unplug(BlockDriverState *bs)
> >     }
> > }
> > 
> > +static void blkio_register_buf(BlockDriverState *bs, void *host, size_t size)
> > +{
> > +    BDRVBlkioState *s = bs->opaque;
> > +    char *errmsg;
> > +    int ret;
> > +    struct blkio_mem_region region = (struct blkio_mem_region){
> > +        .addr = host,
> > +        .len = size,
> > +        .fd = -1,
> > +    };
> > +
> > +    if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > +        error_report_once("%s: skipping unaligned buf %p with size %zu",
> > +                          __func__, host, size);
> > +        return; /* skip unaligned */
> > +    }
> > +
> > +    /* Attempt to find the fd for a MemoryRegion */
> > +    if (s->needs_mem_region_fd) {
> > +        int fd = -1;
> > +        ram_addr_t offset;
> > +        MemoryRegion *mr;
> > +
> > +        /*
> > +         * bdrv_register_buf() is called with the BQL held so mr lives at least
> > +         * until this function returns.
> > +         */
> > +        mr = memory_region_from_host(host, &offset);
> > +        if (mr) {
> > +            fd = memory_region_get_fd(mr);
> > +        }
> > +        if (fd == -1) {
> > +            error_report_once("%s: skipping fd-less buf %p with size %zu",
> > +                              __func__, host, size);
> > +            return; /* skip if there is no fd */
> > +        }
> > +
> > +        region.fd = fd;
> > +        region.fd_offset = offset;
> > +    }
> > +
> > +    WITH_QEMU_LOCK_GUARD(&s->lock) {
> > +        ret = blkio_add_mem_region(s->blkio, &region, &errmsg);
> > +    }
> > +
> > +    if (ret < 0) {
> > +        error_report_once("Failed to add blkio mem region %p with size %zu: %s",
> > +                          host, size, errmsg);
> > +        free(errmsg);
> > +    }
> > +}
> > +
> > +static void blkio_unregister_buf(BlockDriverState *bs, void *host, size_t size)
> > +{
> > +    BDRVBlkioState *s = bs->opaque;
> > +    char *errmsg;
> > +    int ret;
> > +    struct blkio_mem_region region = (struct blkio_mem_region){
> > +        .addr = host,
> > +        .len = size,
> > +    };
> > +
> > +    if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > +        return; /* skip unaligned */
> > +    }
> > +
> > +    WITH_QEMU_LOCK_GUARD(&s->lock) {
> > +        ret = blkio_del_mem_region(s->blkio, &region, &errmsg);
> > +    }
> > +
> > +    if (ret < 0) {
> > +        error_report_once("Failed to delete blkio mem region %p with size %zu: %s",
> > +                          host, size, errmsg);
> > +        free(errmsg);
> > +    }
> > +}
> > +
> > static void blkio_parse_filename_io_uring(const char *filename, QDict *options,
> >                                           Error **errp)
> > {
> > @@ -356,6 +442,18 @@ static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags,
> >         return ret;
> >     }
> > 
> > +    ret = blkio_get_bool(s->blkio,
> > +                         "needs-mem-region-fd",
> > +                         &s->needs_mem_region_fd,
> > +                         &errmsg);
> > +    if (ret < 0) {
> > +        error_setg_errno(errp, -ret,
> > +                         "failed to get needs-mem-region-fd: %s", errmsg);
> > +        free(errmsg);
> > +        blkio_destroy(&s->blkio);
> > +        return ret;
> > +    }
> > +
> >     ret = blkio_get_uint64(s->blkio,
> >                            "mem-region-alignment",
> >                            &s->mem_region_alignment,
> 
> I already mentioned on IRC while testing the series, but I'm writing it here
> so we don't forget ;-)
> 
> To prevent bdrv_driver_pwritev() from removing the BDRV_REQ_REGISTERED_BUF
> flag from requests, we should add this:
> 
> @@ -474,7 +474,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags,
>          return ret;
>      }
> 
> -    bs->supported_write_flags = BDRV_REQ_FUA;
> +    bs->supported_write_flags = BDRV_REQ_FUA | BDRV_REQ_REGISTERED_BUF;
> 
>      qemu_mutex_init(&s->lock);
>      s->blkioq = blkio_get_queue(s->blkio, 0);

Thanks, will fix.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-03-30 10:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-23 11:17 [RFC 0/8] blkio: add libblkio BlockDriver Stefan Hajnoczi
2022-03-23 11:17 ` [RFC 1/8] blkio: add io_uring block driver using libblkio Stefan Hajnoczi
2022-03-23 11:17 ` [RFC 2/8] numa: call ->ram_block_removed() in ram_block_notifer_remove() Stefan Hajnoczi
2022-03-30  8:42   ` David Hildenbrand
2022-03-23 11:17 ` [RFC 3/8] block: pass size to bdrv_unregister_buf() Stefan Hajnoczi
2022-03-23 11:17 ` [RFC 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag Stefan Hajnoczi
2022-03-23 11:17 ` [RFC 5/8] block: add BlockRAMRegistrar Stefan Hajnoczi
2022-03-23 11:17 ` [RFC 6/8] stubs: add memory_region_from_host() and memory_region_get_fd() Stefan Hajnoczi
2022-03-23 11:17 ` [RFC 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization Stefan Hajnoczi
2022-03-29 15:24   ` Stefano Garzarella
2022-03-30 10:45     ` Stefan Hajnoczi
2022-03-23 11:17 ` [RFC 8/8] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint Stefan Hajnoczi
2022-03-29 15:27 ` [RFC 0/8] blkio: add libblkio BlockDriver Stefano Garzarella

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.