All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 0/5] Support Archipelago as a QEMU block backend
@ 2014-06-27  8:24 Chrysostomos Nanakos
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 1/5] block: " Chrysostomos Nanakos
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Chrysostomos Nanakos @ 2014-06-27  8:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Chrysostomos Nanakos, stefanha

v6:
 - Split v5 1/4 patch into two different patches. First one implements
   QMP structured options and the second one implements bdrv_parse_filename().

v5:
 - Remove useless qemu_aio_count variable from BDRVArchipelagoState struct.
 - Cleanup xseg signal descriptor, call xseg_quit_local_signal() when closing
   block device.
 - Fix ds and volname leaks.
 - Make xseg request handler thread joinable and wait until exits before
   destroying condition variables and mutexes. Thanks to Stefan Hajnoczi for
   pointing this out.
 - Remove error_propagate() useless call.
 - Use memcpy instead of strncpy.
 - Remove check after trying to allocate memory with g_malloc().
 - Remove pipe code and complete AIO by introducing QEMU "bottom-half".
 - Add Archipelago shared memory segment name in options list and QMP.
 - Remove functions archipelago_aio_read()/_write() and introduce new
   and simpler function, __archipelago_submit_request().
   Refactor archipelago_aio_segmented_rw() function.
 - Enable Archipelago support in qemu-iotests

v4:
 - Move Archipelago QMP support from qapi-schema.json file to
   qapi/block-core.json. Fixe various typographic errors, thanks to
   Kevin Wolf and Eric Blake.
 - Use new .create_opts format, define new QemuOptsList structure and refactor
   qemu_archipelago_create function.

v3:
 - Break down initial patch from one to three. First patch implements
   Archipelago QEMU block backend with read/write functionality.
   Second patch implements .bdrv_create() and adds support for creating
   Archipelago images. Third patch adds QMP support.
 - Remove global variable g_xseg_init, make xseg_initialize(), xseg_join()
   and xseg_leave() reentrant and thread-safe.
 - Introduce new enum BlockdevOptionsArchipelago for the QMP support.

v2:
 - Implement .bdrv_parse_filename() function to convert the shortuct version
   with a single string to the individual options.
 - Remove global variables and move relevant fields to ArchipelagoAIOCB struct.
 - Remove ArchipelagoConf struct and use the relevant fields as individual
   arguments.
 - Remove ArchipelagoCB struct and use ArchipelagoAIOCB instead.
 - Remove ArchipelagoThread struct and move relevant fields to
   ArchipelagoAIOCB instead. Now an I/O thread is spawned for per-device to
   handle all async I/O requests.
 - Remove double data copy, use qemu_iovec_from_buf() and copy data directly
   to the destination buffer.
 - Remove archipelago_aio_bh_cb() function, a full request is completed in
   qemu_archipelago_complete_aio() instead.
 - Resolve proposed changes from Kevin Wolf and miscellaneous style issues.


Chrysostomos Nanakos (5):
  block: Support Archipelago as a QEMU block backend
  block/archipelago: Implement bdrv_parse_filename()
  block/archipelago: Add support for creating images
  QMP: Add support for Archipelago
  qemu-iotests: add support for Archipelago protocol

 MAINTAINERS                  |    6 +
 block/Makefile.objs          |    2 +
 block/archipelago.c          | 1103 ++++++++++++++++++++++++++++++++++++++++++
 configure                    |   40 ++
 qapi/block-core.json         |   39 +-
 tests/qemu-iotests/common    |    6 +
 tests/qemu-iotests/common.rc |    9 +-
 7 files changed, 1201 insertions(+), 4 deletions(-)
 create mode 100644 block/archipelago.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v6 1/5] block: Support Archipelago as a QEMU block backend
  2014-06-27  8:24 [Qemu-devel] [PATCH v6 0/5] Support Archipelago as a QEMU block backend Chrysostomos Nanakos
@ 2014-06-27  8:24 ` Chrysostomos Nanakos
  2014-07-02 13:59   ` Eric Blake
                     ` (2 more replies)
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 2/5] block/archipelago: Implement bdrv_parse_filename() Chrysostomos Nanakos
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 22+ messages in thread
From: Chrysostomos Nanakos @ 2014-06-27  8:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Chrysostomos Nanakos, stefanha

VM Image on Archipelago volume is specified like this:

file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
file.vport=<vlmcd_port>][,file.segment=<segment_name>]]

'archipelago' is the protocol.

'mport' is the port number on which mapperd is listening. This is optional
and if not specified, QEMU will make Archipelago to use the default port.

'vport' is the port number on which vlmcd is listening. This is optional
and if not specified, QEMU will make Archipelago to use the default port.

'segment' is the name of the shared memory segment Archipelago stack is using.
This is optional and if not specified, QEMU will make Archipelago to use the
default value, 'archipelago'.

Examples:

file.driver=archipelago,file.volume=my_vm_volume
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
file.vport=1234
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
file.vport=1234,file.segment=my_segment

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
---
 MAINTAINERS         |    6 +
 block/Makefile.objs |    2 +
 block/archipelago.c |  819 +++++++++++++++++++++++++++++++++++++++++++++++++++
 configure           |   40 +++
 4 files changed, 867 insertions(+)
 create mode 100644 block/archipelago.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 9b93edd..58ef1e3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -999,3 +999,9 @@ SSH
 M: Richard W.M. Jones <rjones@redhat.com>
 S: Supported
 F: block/ssh.c
+
+ARCHIPELAGO
+M: Chrysostomos Nanakos <cnanakos@grnet.gr>
+M: Chrysostomos Nanakos <chris@include.gr>
+S: Maintained
+F: block/archipelago.c
diff --git a/block/Makefile.objs b/block/Makefile.objs
index fd88c03..858d2b3 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -17,6 +17,7 @@ block-obj-$(CONFIG_LIBNFS) += nfs.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 block-obj-$(CONFIG_GLUSTERFS) += gluster.o
+block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 endif
 
@@ -35,5 +36,6 @@ gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
 gluster.o-libs     := $(GLUSTERFS_LIBS)
 ssh.o-cflags       := $(LIBSSH2_CFLAGS)
 ssh.o-libs         := $(LIBSSH2_LIBS)
+archipelago.o-libs := $(ARCHIPELAGO_LIBS)
 qcow.o-libs        := -lz
 linux-aio.o-libs   := -laio
diff --git a/block/archipelago.c b/block/archipelago.c
new file mode 100644
index 0000000..c56826a
--- /dev/null
+++ b/block/archipelago.c
@@ -0,0 +1,819 @@
+/*
+ * QEMU Block driver for Archipelago
+ *
+ * Copyright 2014 GRNET S.A. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *   1. Redistributions of source code must retain the above
+ *      copyright notice, this list of conditions and the following
+ *      disclaimer.
+ *   2. Redistributions in binary form must reproduce the above
+ *      copyright notice, this list of conditions and the following
+ *      disclaimer in the documentation and/or other materials
+ *      provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY GRNET S.A. ``AS IS'' AND ANY EXPRESS
+ * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GRNET S.A OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
+ * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+ * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ * The views and conclusions contained in the software and
+ * documentation are those of the authors and should not be
+ * interpreted as representing official policies, either expressed
+ * or implied, of GRNET S.A.
+ */
+
+/*
+* VM Image on Archipelago volume is specified like this:
+*
+* file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
+* file.vport=<vlmcd_port>][,file.segment=<segment_name>]]
+*
+* 'archipelago' is the protocol.
+*
+* 'mport' is the port number on which mapperd is listening. This is optional
+* and if not specified, QEMU will make Archipelago to use the default port.
+*
+* 'vport' is the port number on which vlmcd is listening. This is optional
+* and if not specified, QEMU will make Archipelago to use the default port.
+*
+* 'segment' is the name of the shared memory segment Archipelago stack is using.
+* This is optional and if not specified, QEMU will make Archipelago to use the
+* default value, 'archipelago'.
+*
+* Examples:
+*
+* file.driver=archipelago,file.volume=my_vm_volume
+* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
+* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
+* file.vport=1234
+* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
+* file.vport=1234,file.segment=my_segment
+*/
+
+#include "block/block_int.h"
+#include "qemu/error-report.h"
+#include "qemu/thread.h"
+#include "qapi/qmp/qint.h"
+#include "qapi/qmp/qstring.h"
+#include "qapi/qmp/qjson.h"
+
+#include <inttypes.h>
+#include <xseg/xseg.h>
+#include <xseg/protocol.h>
+
+#define ARCHIP_FD_READ      0
+#define ARCHIP_FD_WRITE     1
+#define MAX_REQUEST_SIZE    524288
+
+#define ARCHIPELAGO_OPT_VOLUME      "volume"
+#define ARCHIPELAGO_OPT_SEGMENT     "segment"
+#define ARCHIPELAGO_OPT_MPORT       "mport"
+#define ARCHIPELAGO_OPT_VPORT       "vport"
+
+#define archipelagolog(fmt, ...) \
+    do {                         \
+        fprintf(stderr, "archipelago\t%-24s: " fmt, __func__, ##__VA_ARGS__); \
+    } while (0)
+
+typedef enum {
+    ARCHIP_OP_READ,
+    ARCHIP_OP_WRITE,
+    ARCHIP_OP_FLUSH,
+    ARCHIP_OP_VOLINFO,
+} ARCHIPCmd;
+
+typedef struct ArchipelagoAIOCB {
+    BlockDriverAIOCB common;
+    QEMUBH *bh;
+    struct BDRVArchipelagoState *s;
+    QEMUIOVector *qiov;
+    void *buffer;
+    ARCHIPCmd cmd;
+    bool cancelled;
+    int status;
+    int64_t size;
+    int64_t ret;
+} ArchipelagoAIOCB;
+
+typedef struct BDRVArchipelagoState {
+    ArchipelagoAIOCB *event_acb;
+    char *volname;
+    char *segment_name;
+    uint64_t size;
+    /* Archipelago specific */
+    struct xseg *xseg;
+    struct xseg_port *port;
+    xport srcport;
+    xport sport;
+    xport mportno;
+    xport vportno;
+    QemuMutex archip_mutex;
+    QemuCond archip_cond;
+    bool is_signaled;
+    /* Request handler specific */
+    QemuThread request_th;
+    QemuCond request_cond;
+    QemuMutex request_mutex;
+    bool th_is_signaled;
+    bool stopping;
+} BDRVArchipelagoState;
+
+typedef struct ArchipelagoSegmentedRequest {
+    size_t count;
+    size_t total;
+    int ref;
+    int failed;
+} ArchipelagoSegmentedRequest;
+
+typedef struct AIORequestData {
+    const char *volname;
+    off_t offset;
+    size_t size;
+    uint64_t bufidx;
+    int ret;
+    int op;
+    ArchipelagoAIOCB *aio_cb;
+    ArchipelagoSegmentedRequest *segreq;
+} AIORequestData;
+
+static void qemu_archipelago_complete_aio(void *opaque);
+
+static void init_local_signal(struct xseg *xseg, xport sport, xport srcport)
+{
+    if (xseg && (sport != srcport)) {
+        xseg_init_local_signal(xseg, srcport);
+        sport = srcport;
+    }
+}
+
+static void archipelago_finish_aiocb(AIORequestData *reqdata)
+{
+    if (reqdata->aio_cb->ret != reqdata->segreq->total) {
+        reqdata->aio_cb->ret = -EIO;
+    } else if (reqdata->aio_cb->ret == reqdata->segreq->total) {
+        reqdata->aio_cb->ret = 0;
+    }
+    reqdata->aio_cb->bh = aio_bh_new(
+                        bdrv_get_aio_context(reqdata->aio_cb->common.bs),
+                        qemu_archipelago_complete_aio, reqdata
+                        );
+    qemu_bh_schedule(reqdata->aio_cb->bh);
+}
+
+static int wait_reply(struct xseg *xseg, xport srcport, struct xseg_port *port,
+                      struct xseg_request *expected_req)
+{
+    struct xseg_request *req;
+    xseg_prepare_wait(xseg, srcport);
+    void *psd = xseg_get_signal_desc(xseg, port);
+    while (1) {
+        req = xseg_receive(xseg, srcport, 0);
+        if (req) {
+            if (req != expected_req) {
+                archipelagolog("Unknown received request\n");
+                xseg_put_request(xseg, req, srcport);
+            } else if (!(req->state & XS_SERVED)) {
+                return -1;
+            } else {
+                break;
+            }
+        }
+        xseg_wait_signal(xseg, psd, 100000UL);
+    }
+    xseg_cancel_wait(xseg, srcport);
+    return 0;
+}
+
+static void xseg_request_handler(void *state)
+{
+    BDRVArchipelagoState *s = (BDRVArchipelagoState *) state;
+    void *psd = xseg_get_signal_desc(s->xseg, s->port);
+    qemu_mutex_lock(&s->request_mutex);
+
+    while (!s->stopping) {
+        struct xseg_request *req;
+        void *data;
+        xseg_prepare_wait(s->xseg, s->srcport);
+        req = xseg_receive(s->xseg, s->srcport, 0);
+        if (req) {
+            AIORequestData *reqdata;
+            ArchipelagoSegmentedRequest *segreq;
+            xseg_get_req_data(s->xseg, req, (void **)&reqdata);
+
+            switch (reqdata->op) {
+            case ARCHIP_OP_READ:
+                    data = xseg_get_data(s->xseg, req);
+                    segreq = reqdata->segreq;
+                    segreq->count += req->serviced;
+
+                    qemu_iovec_from_buf(reqdata->aio_cb->qiov, reqdata->bufidx,
+                                        data,
+                                        req->serviced);
+
+                    xseg_put_request(s->xseg, req, s->srcport);
+
+                    if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {
+                        if (!segreq->failed) {
+                            reqdata->aio_cb->ret = segreq->count;
+                            archipelago_finish_aiocb(reqdata);
+                            g_free(segreq);
+                        } else {
+                            g_free(segreq);
+                            g_free(reqdata);
+                        }
+                    } else {
+                        g_free(reqdata);
+                    }
+                    break;
+            case ARCHIP_OP_WRITE:
+                    segreq = reqdata->segreq;
+                    segreq->count += req->serviced;
+                    xseg_put_request(s->xseg, req, s->srcport);
+
+                    if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {
+                        if (!segreq->failed) {
+                            reqdata->aio_cb->ret = segreq->count;
+                            archipelago_finish_aiocb(reqdata);
+                            g_free(segreq);
+                        } else {
+                            g_free(segreq);
+                            g_free(reqdata);
+                        }
+                    } else {
+                        g_free(reqdata);
+                    }
+                    break;
+            case ARCHIP_OP_VOLINFO:
+                    s->is_signaled = true;
+                    qemu_cond_signal(&s->archip_cond);
+                    break;
+            }
+        } else {
+            xseg_wait_signal(s->xseg, psd, 100000UL);
+        }
+        xseg_cancel_wait(s->xseg, s->srcport);
+    }
+
+    s->th_is_signaled = true;
+    qemu_cond_signal(&s->request_cond);
+    qemu_mutex_unlock(&s->request_mutex);
+    qemu_thread_exit(NULL);
+}
+
+static int qemu_archipelago_xseg_init(BDRVArchipelagoState *s)
+{
+    if (xseg_initialize()) {
+        archipelagolog("Cannot initialize XSEG\n");
+        goto err_exit;
+    }
+
+    s->xseg = xseg_join((char *)"posix", s->segment_name,
+                        (char *)"posixfd", NULL);
+    if (!s->xseg) {
+        archipelagolog("Cannot join XSEG shared memory segment\n");
+        goto err_exit;
+    }
+    s->port = xseg_bind_dynport(s->xseg);
+    s->srcport = s->port->portno;
+    init_local_signal(s->xseg, s->sport, s->srcport);
+    return 0;
+
+err_exit:
+    return -1;
+}
+
+static int qemu_archipelago_init(BDRVArchipelagoState *s)
+{
+    int ret;
+
+    ret = qemu_archipelago_xseg_init(s);
+    if (ret < 0) {
+        error_report("Cannot initialize XSEG. Aborting...\n");
+        goto err_exit;
+    }
+
+    qemu_cond_init(&s->archip_cond);
+    qemu_mutex_init(&s->archip_mutex);
+    qemu_cond_init(&s->request_cond);
+    qemu_mutex_init(&s->request_mutex);
+    s->th_is_signaled = false;
+    qemu_thread_create(&s->request_th, "xseg_io_th",
+                       (void *) xseg_request_handler,
+                       (void *) s, QEMU_THREAD_JOINABLE);
+
+err_exit:
+    return ret;
+}
+
+static void qemu_archipelago_complete_aio(void *opaque)
+{
+    AIORequestData *reqdata = (AIORequestData *) opaque;
+    ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) reqdata->aio_cb;
+
+    qemu_bh_delete(aio_cb->bh);
+    qemu_vfree(aio_cb->buffer);
+    aio_cb->common.cb(aio_cb->common.opaque, aio_cb->ret);
+    aio_cb->status = 0;
+
+    if (!aio_cb->cancelled) {
+        qemu_aio_release(aio_cb);
+    }
+    g_free(reqdata);
+}
+
+static QemuOptsList archipelago_runtime_opts = {
+    .name = "archipelago",
+    .head = QTAILQ_HEAD_INITIALIZER(archipelago_runtime_opts.head),
+    .desc = {
+        {
+            .name = ARCHIPELAGO_OPT_VOLUME,
+            .type = QEMU_OPT_STRING,
+            .help = "Name of the volume image",
+        },
+        {
+            .name = ARCHIPELAGO_OPT_SEGMENT,
+            .type = QEMU_OPT_STRING,
+            .help = "Name of the Archipelago shared memory segment",
+        },
+        {
+            .name = ARCHIPELAGO_OPT_MPORT,
+            .type = QEMU_OPT_NUMBER,
+            .help = "Archipelago mapperd port number"
+        },
+        {
+            .name = ARCHIPELAGO_OPT_VPORT,
+            .type = QEMU_OPT_NUMBER,
+            .help = "Archipelago vlmcd port number"
+
+        },
+        { /* end of list */ }
+    },
+};
+
+static int qemu_archipelago_open(BlockDriverState *bs,
+                                 QDict *options,
+                                 int bdrv_flags,
+                                 Error **errp)
+{
+    int ret = 0;
+    const char *volume, *segment_name;
+    QemuOpts *opts;
+    Error *local_err = NULL;
+    BDRVArchipelagoState *s = bs->opaque;
+
+    opts = qemu_opts_create(&archipelago_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        qemu_opts_del(opts);
+        return -EINVAL;
+    }
+
+    s->mportno = qemu_opt_get_number(opts, ARCHIPELAGO_OPT_MPORT, 1001);
+    s->vportno = qemu_opt_get_number(opts, ARCHIPELAGO_OPT_VPORT, 501);
+
+    segment_name = qemu_opt_get(opts, ARCHIPELAGO_OPT_SEGMENT);
+    if (segment_name == NULL) {
+        s->segment_name = g_strdup("archipelago");
+    } else {
+        s->segment_name = g_strdup(segment_name);
+    }
+
+    volume = qemu_opt_get(opts, ARCHIPELAGO_OPT_VOLUME);
+    if (volume == NULL) {
+        error_setg(errp, "archipelago block driver requires the 'volume'"
+                   " option");
+        qemu_opts_del(opts);
+        return -EINVAL;
+    }
+    s->volname = g_strdup(volume);
+
+    /* Initialize XSEG, join shared memory segment */
+    ret = qemu_archipelago_init(s);
+    if (ret < 0) {
+        error_setg(errp, "cannot initialize XSEG and join shared "
+                   "memory segment");
+        goto err_exit;
+    }
+
+    qemu_opts_del(opts);
+    return 0;
+
+err_exit:
+    g_free(s->volname);
+    g_free(s->segment_name);
+    qemu_opts_del(opts);
+    return ret;
+}
+
+static void qemu_archipelago_close(BlockDriverState *bs)
+{
+    int r, targetlen;
+    char *target;
+    struct xseg_request *req;
+    BDRVArchipelagoState *s = bs->opaque;
+
+    s->stopping = true;
+
+    qemu_mutex_lock(&s->request_mutex);
+    while (!s->th_is_signaled) {
+        qemu_cond_wait(&s->request_cond,
+                       &s->request_mutex);
+    }
+    qemu_mutex_unlock(&s->request_mutex);
+    qemu_thread_join(&s->request_th);
+    qemu_cond_destroy(&s->request_cond);
+    qemu_mutex_destroy(&s->request_mutex);
+
+    qemu_cond_destroy(&s->archip_cond);
+    qemu_mutex_destroy(&s->archip_mutex);
+
+    targetlen = strlen(s->volname);
+    req = xseg_get_request(s->xseg, s->srcport, s->vportno, X_ALLOC);
+    if (!req) {
+        archipelagolog("Cannot get XSEG request\n");
+        goto err_exit;
+    }
+    r = xseg_prep_request(s->xseg, req, targetlen, 0);
+    if (r < 0) {
+        xseg_put_request(s->xseg, req, s->srcport);
+        archipelagolog("Cannot prepare XSEG close request\n");
+        goto err_exit;
+    }
+
+    target = xseg_get_target(s->xseg, req);
+    memcpy(target, s->volname, targetlen);
+    req->size = req->datalen;
+    req->offset = 0;
+    req->op = X_CLOSE;
+
+    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
+    if (p == NoPort) {
+        xseg_put_request(s->xseg, req, s->srcport);
+        archipelagolog("Cannot submit XSEG close request\n");
+        goto err_exit;
+    }
+
+    xseg_signal(s->xseg, p);
+    wait_reply(s->xseg, s->srcport, s->port, req);
+
+    xseg_put_request(s->xseg, req, s->srcport);
+
+err_exit:
+    g_free(s->volname);
+    g_free(s->segment_name);
+    xseg_quit_local_signal(s->xseg, s->srcport);
+    xseg_leave_dynport(s->xseg, s->port);
+    xseg_leave(s->xseg);
+}
+
+static void qemu_archipelago_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+    ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) blockacb;
+    aio_cb->cancelled = true;
+    while (aio_cb->status == -EINPROGRESS) {
+        qemu_aio_wait();
+    }
+    qemu_aio_release(aio_cb);
+}
+
+static const AIOCBInfo archipelago_aiocb_info = {
+    .aiocb_size = sizeof(ArchipelagoAIOCB),
+    .cancel = qemu_archipelago_aio_cancel,
+};
+
+static int __archipelago_submit_request(BDRVArchipelagoState *s,
+                                        uint64_t bufidx,
+                                        size_t count,
+                                        off_t offset,
+                                        ArchipelagoAIOCB *aio_cb,
+                                        ArchipelagoSegmentedRequest *segreq,
+                                        int op)
+{
+    int ret, targetlen;
+    char *target;
+    void *data = NULL;
+    struct xseg_request *req;
+    AIORequestData *reqdata = g_malloc(sizeof(AIORequestData));
+
+    targetlen = strlen(s->volname);
+    req = xseg_get_request(s->xseg, s->srcport, s->vportno, X_ALLOC);
+    if (!req) {
+        archipelagolog("Cannot get XSEG request\n");
+        goto err_exit2;
+    }
+    ret = xseg_prep_request(s->xseg, req, targetlen, count);
+    if (ret < 0) {
+        archipelagolog("Cannot prepare XSEG request\n");
+        goto err_exit;
+    }
+    target = xseg_get_target(s->xseg, req);
+    if (!target) {
+        archipelagolog("Cannot get XSEG target\n");
+        goto err_exit;
+    }
+    memcpy(target, s->volname, targetlen);
+    req->size = count;
+    req->offset = offset;
+
+    switch (op) {
+    case ARCHIP_OP_READ:
+        req->op = X_READ;
+        break;
+    case ARCHIP_OP_WRITE:
+        req->op = X_WRITE;
+        break;
+    }
+    reqdata->volname = s->volname;
+    reqdata->offset = offset;
+    reqdata->size = count;
+    reqdata->bufidx = bufidx;
+    reqdata->aio_cb = aio_cb;
+    reqdata->segreq = segreq;
+    reqdata->op = op;
+
+    xseg_set_req_data(s->xseg, req, reqdata);
+    if (op == ARCHIP_OP_WRITE) {
+        data = xseg_get_data(s->xseg, req);
+        if (!data) {
+            archipelagolog("Cannot get XSEG data\n");
+            goto err_exit;
+        }
+        memcpy(data, aio_cb->buffer + bufidx, count);
+    }
+
+    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
+    if (p == NoPort) {
+        archipelagolog("Could not submit XSEG request\n");
+        goto err_exit;
+    }
+    xseg_signal(s->xseg, p);
+    return 0;
+
+err_exit:
+    g_free(reqdata);
+    xseg_put_request(s->xseg, req, s->srcport);
+    return -EIO;
+err_exit2:
+    g_free(reqdata);
+    return -EIO;
+}
+
+static int archipelago_aio_segmented_rw(BDRVArchipelagoState *s,
+                                        size_t count,
+                                        off_t offset,
+                                        ArchipelagoAIOCB *aio_cb,
+                                        int op)
+{
+    int i, ret, segments_nr, last_segment_size;
+    ArchipelagoSegmentedRequest *segreq;
+
+    segreq = g_malloc(sizeof(ArchipelagoSegmentedRequest));
+
+    if (op == ARCHIP_OP_FLUSH) {
+        segments_nr = 1;
+        segreq->ref = segments_nr;
+        segreq->total = count;
+        segreq->count = 0;
+        segreq->failed = 0;
+        ret = __archipelago_submit_request(s, 0, count, offset, aio_cb,
+                                           segreq, ARCHIP_OP_WRITE);
+        if (ret < 0) {
+            goto err_exit;
+        }
+        return 0;
+    }
+
+    segments_nr = (int)(count / MAX_REQUEST_SIZE) + \
+                  ((count % MAX_REQUEST_SIZE) ? 1 : 0);
+    last_segment_size = (int)(count % MAX_REQUEST_SIZE);
+
+    segreq->ref = segments_nr;
+    segreq->total = count;
+    segreq->count = 0;
+    segreq->failed = 0;
+
+    for (i = 0; i < segments_nr - 1; i++) {
+        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
+                                           MAX_REQUEST_SIZE,
+                                           offset + i * MAX_REQUEST_SIZE,
+                                           aio_cb, segreq, op);
+
+        if (ret < 0) {
+            goto err_exit;
+        }
+    }
+
+    if ((segments_nr > 1) && last_segment_size) {
+        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
+                                           last_segment_size,
+                                           offset + i * MAX_REQUEST_SIZE,
+                                           aio_cb, segreq, op);
+    } else if ((segments_nr > 1) && !last_segment_size) {
+        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
+                                           MAX_REQUEST_SIZE,
+                                           offset + i * MAX_REQUEST_SIZE,
+                                           aio_cb, segreq, op);
+    } else if (segments_nr == 1) {
+            ret = __archipelago_submit_request(s, 0, count, offset, aio_cb,
+                                               segreq, op);
+    }
+
+    if (ret < 0) {
+        goto err_exit;
+    }
+
+    return 0;
+
+err_exit:
+    __sync_add_and_fetch(&segreq->failed, 1);
+    if (segments_nr == 1) {
+        if (__sync_add_and_fetch(&segreq->ref, -1) == 0) {
+            g_free(segreq);
+        }
+    } else {
+        if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i)) == 0) {
+            g_free(segreq);
+        }
+    }
+
+    return ret;
+}
+
+static BlockDriverAIOCB *qemu_archipelago_aio_rw(BlockDriverState *bs,
+                                                 int64_t sector_num,
+                                                 QEMUIOVector *qiov,
+                                                 int nb_sectors,
+                                                 BlockDriverCompletionFunc *cb,
+                                                 void *opaque,
+                                                 int op)
+{
+    ArchipelagoAIOCB *aio_cb;
+    BDRVArchipelagoState *s = bs->opaque;
+    int64_t size, off;
+    int ret;
+
+    aio_cb = qemu_aio_get(&archipelago_aiocb_info, bs, cb, opaque);
+    aio_cb->cmd = op;
+    aio_cb->qiov = qiov;
+
+    if (op != ARCHIP_OP_FLUSH) {
+        aio_cb->buffer = qemu_blockalign(bs, qiov->size);
+    } else {
+        aio_cb->buffer = NULL;
+    }
+
+    aio_cb->ret = 0;
+    aio_cb->s = s;
+    aio_cb->cancelled = false;
+    aio_cb->status = -EINPROGRESS;
+
+    if (op == ARCHIP_OP_WRITE) {
+        qemu_iovec_to_buf(aio_cb->qiov, 0, aio_cb->buffer, qiov->size);
+    }
+
+    off = sector_num * BDRV_SECTOR_SIZE;
+    size = nb_sectors * BDRV_SECTOR_SIZE;
+    aio_cb->size = size;
+
+    ret = archipelago_aio_segmented_rw(s, size, off,
+                                       aio_cb, op);
+    if (ret < 0) {
+        goto err_exit;
+    }
+    return &aio_cb->common;
+
+err_exit:
+    error_report("qemu_archipelago_aio_rw(): I/O Error\n");
+    qemu_vfree(aio_cb->buffer);
+    qemu_aio_release(aio_cb);
+    return NULL;
+}
+
+static BlockDriverAIOCB *qemu_archipelago_aio_readv(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    return qemu_archipelago_aio_rw(bs, sector_num, qiov, nb_sectors, cb,
+                                   opaque, ARCHIP_OP_READ);
+}
+
+static BlockDriverAIOCB *qemu_archipelago_aio_writev(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    return qemu_archipelago_aio_rw(bs, sector_num, qiov, nb_sectors, cb,
+                                   opaque, ARCHIP_OP_WRITE);
+}
+
+static int64_t archipelago_volume_info(BDRVArchipelagoState *s)
+{
+    uint64_t size;
+    int ret, targetlen;
+    struct xseg_request *req;
+    struct xseg_reply_info *xinfo;
+    AIORequestData *reqdata = g_malloc(sizeof(AIORequestData));
+
+    const char *volname = s->volname;
+    targetlen = strlen(volname);
+    req = xseg_get_request(s->xseg, s->srcport, s->mportno, X_ALLOC);
+    if (!req) {
+        archipelagolog("Cannot get XSEG request\n");
+        goto err_exit2;
+    }
+    ret = xseg_prep_request(s->xseg, req, targetlen,
+                            sizeof(struct xseg_reply_info));
+    if (ret < 0) {
+        archipelagolog("Cannot prepare XSEG request\n");
+        goto err_exit;
+    }
+    char *target = xseg_get_target(s->xseg, req);
+    if (!target) {
+        archipelagolog("Cannot get XSEG target\n");
+        goto err_exit;
+    }
+    memcpy(target, volname, targetlen);
+    req->size = req->datalen;
+    req->offset = 0;
+    req->op = X_INFO;
+
+    reqdata->op = ARCHIP_OP_VOLINFO;
+    reqdata->volname = volname;
+    xseg_set_req_data(s->xseg, req, reqdata);
+
+    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
+    if (p == NoPort) {
+        archipelagolog("Cannot submit XSEG request\n");
+        goto err_exit;
+    }
+    xseg_signal(s->xseg, p);
+    qemu_mutex_lock(&s->archip_mutex);
+    while (!s->is_signaled) {
+        qemu_cond_wait(&s->archip_cond, &s->archip_mutex);
+    }
+    s->is_signaled = false;
+    qemu_mutex_unlock(&s->archip_mutex);
+
+    xinfo = (struct xseg_reply_info *) xseg_get_data(s->xseg, req);
+    size = xinfo->size;
+    xseg_put_request(s->xseg, req, s->srcport);
+    g_free(reqdata);
+    s->size = size;
+    return size;
+
+err_exit:
+    g_free(reqdata);
+    xseg_put_request(s->xseg, req, s->srcport);
+    return -1;
+err_exit2:
+    g_free(reqdata);
+    return -1;
+}
+
+static int64_t qemu_archipelago_getlength(BlockDriverState *bs)
+{
+    int64_t ret;
+    BDRVArchipelagoState *s = bs->opaque;
+
+    ret = archipelago_volume_info(s);
+    return ret;
+}
+
+static BlockDriverAIOCB *qemu_archipelago_aio_flush(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    return qemu_archipelago_aio_rw(bs, 0, NULL, 0, cb, opaque,
+                                   ARCHIP_OP_FLUSH);
+}
+
+static BlockDriver bdrv_archipelago = {
+    .format_name         = "archipelago",
+    .protocol_name       = "archipelago",
+    .instance_size       = sizeof(BDRVArchipelagoState),
+    .bdrv_file_open      = qemu_archipelago_open,
+    .bdrv_close          = qemu_archipelago_close,
+    .bdrv_getlength      = qemu_archipelago_getlength,
+    .bdrv_aio_readv      = qemu_archipelago_aio_readv,
+    .bdrv_aio_writev     = qemu_archipelago_aio_writev,
+    .bdrv_aio_flush      = qemu_archipelago_aio_flush,
+    .bdrv_has_zero_init  = bdrv_has_zero_init_1,
+};
+
+static void bdrv_archipelago_init(void)
+{
+    bdrv_register(&bdrv_archipelago);
+}
+
+block_init(bdrv_archipelago_init);
diff --git a/configure b/configure
index 7102964..e4acd9c 100755
--- a/configure
+++ b/configure
@@ -326,6 +326,7 @@ seccomp=""
 glusterfs=""
 glusterfs_discard="no"
 glusterfs_zerofill="no"
+archipelago=""
 virtio_blk_data_plane=""
 gtk=""
 gtkabi=""
@@ -1087,6 +1088,10 @@ for opt do
   ;;
   --enable-glusterfs) glusterfs="yes"
   ;;
+  --disable-archipelago) archipelago="no"
+  ;;
+  --enable-archipelago) archipelago="yes"
+  ;;
   --disable-virtio-blk-data-plane) virtio_blk_data_plane="no"
   ;;
   --enable-virtio-blk-data-plane) virtio_blk_data_plane="yes"
@@ -1382,6 +1387,8 @@ Advanced options (experts only):
   --enable-coroutine-pool  enable coroutine freelist (better performance)
   --enable-glusterfs       enable GlusterFS backend
   --disable-glusterfs      disable GlusterFS backend
+  --enable-archipelago     enable Archipelago backend
+  --disable-archipelago    disable Archipelago backend
   --enable-gcov            enable test coverage analysis with gcov
   --gcov=GCOV              use specified gcov [$gcov_tool]
   --disable-tpm            disable TPM support
@@ -3051,6 +3058,33 @@ EOF
   fi
 fi
 
+
+##########################################
+# archipelago probe
+if test "$archipelago" != "no" ; then
+    cat > $TMPC <<EOF
+#include <stdio.h>
+#include <xseg/xseg.h>
+#include <xseg/protocol.h>
+int main(void) {
+    xseg_initialize();
+    return 0;
+}
+EOF
+    archipelago_libs=-lxseg
+    if compile_prog "" "$archipelago_libs"; then
+        archipelago="yes"
+        libs_tools="$archipelago_libs $libs_tools"
+        libs_softmmu="$archipelago_libs $libs_softmmu"
+    else
+      if test "$archipelago" = "yes" ; then
+        feature_not_found "Archipelago backend support" "Install libxseg devel"
+      fi
+      archipelago="no"
+    fi
+fi
+
+
 ##########################################
 # glusterfs probe
 if test "$glusterfs" != "no" ; then
@@ -4230,6 +4264,7 @@ echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine"
 echo "coroutine pool    $coroutine_pool"
 echo "GlusterFS support $glusterfs"
+echo "Archipelago support $archipelago"
 echo "virtio-blk-data-plane $virtio_blk_data_plane"
 echo "gcov              $gcov_tool"
 echo "gcov enabled      $gcov"
@@ -4665,6 +4700,11 @@ if test "$glusterfs_zerofill" = "yes" ; then
   echo "CONFIG_GLUSTERFS_ZEROFILL=y" >> $config_host_mak
 fi
 
+if test "$archipelago" = "yes" ; then
+  echo "CONFIG_ARCHIPELAGO=m" >> $config_host_mak
+  echo "ARCHIPELAGO_LIBS=$archipelago_libs" >> $config_host_mak
+fi
+
 if test "$libssh2" = "yes" ; then
   echo "CONFIG_LIBSSH2=m" >> $config_host_mak
   echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v6 2/5] block/archipelago: Implement bdrv_parse_filename()
  2014-06-27  8:24 [Qemu-devel] [PATCH v6 0/5] Support Archipelago as a QEMU block backend Chrysostomos Nanakos
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 1/5] block: " Chrysostomos Nanakos
@ 2014-06-27  8:24 ` Chrysostomos Nanakos
  2014-07-21 15:55   ` Stefan Hajnoczi
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 3/5] block/archipelago: Add support for creating images Chrysostomos Nanakos
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Chrysostomos Nanakos @ 2014-06-27  8:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Chrysostomos Nanakos, stefanha

VM Image on Archipelago volume can also be specified like this:

file=archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>][:
segment=<segment_name>]]

Examples:

file=archipelago:my_vm_volume
file=archipelago:my_vm_volume/mport=123
file=archipelago:my_vm_volume/mport=123:vport=1234
file=archipelago:my_vm_volume/mport=123:vport=1234:segment=my_segment

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
---
 block/archipelago.c |  139 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 137 insertions(+), 2 deletions(-)

diff --git a/block/archipelago.c b/block/archipelago.c
index c56826a..3549454 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -40,6 +40,11 @@
 * file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
 * file.vport=<vlmcd_port>][,file.segment=<segment_name>]]
 *
+* or
+*
+* file=archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>][:
+* segment=<segment_name>]]
+*
 * 'archipelago' is the protocol.
 *
 * 'mport' is the port number on which mapperd is listening. This is optional
@@ -57,11 +62,20 @@
 * file.driver=archipelago,file.volume=my_vm_volume
 * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
 * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
-* file.vport=1234
+*  file.vport=1234
 * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
-* file.vport=1234,file.segment=my_segment
+*  file.vport=1234,file.segment=my_segment
+*
+* or
+*
+* file=archipelago:my_vm_volume
+* file=archipelago:my_vm_volume/mport=123
+* file=archipelago:my_vm_volume/mport=123:vport=1234
+* file=archipelago:my_vm_volume/mport=123:vport=1234:segment=my_segment
+*
 */
 
+#include "qemu-common.h"
 #include "block/block_int.h"
 #include "qemu/error-report.h"
 #include "qemu/thread.h"
@@ -333,6 +347,126 @@ static void qemu_archipelago_complete_aio(void *opaque)
     g_free(reqdata);
 }
 
+static void xseg_find_port(char *pstr, const char *needle, xport *aport)
+{
+    const char *a;
+    char *endptr = NULL;
+    unsigned long port;
+    if (strstart(pstr, needle, &a)) {
+        if (strlen(a) > 0) {
+            port = strtoul(a, &endptr, 10);
+            if (strlen(endptr)) {
+                *aport = -2;
+                return;
+            }
+            *aport = (xport) port;
+        }
+    }
+}
+
+static void xseg_find_segment(char *pstr, const char *needle,
+                              char **segment_name)
+{
+    const char *a;
+    if (strstart(pstr, needle, &a)) {
+        if (strlen(a) > 0) {
+            *segment_name = g_strdup(a);
+        }
+    }
+}
+
+static void parse_filename_opts(const char *filename, Error **errp,
+                                char **volume, char **segment_name,
+                                xport *mport, xport *vport)
+{
+    const char *start;
+    char *tokens[4], *ds;
+    int idx;
+    xport lmport = NoPort, lvport = NoPort;
+
+    strstart(filename, "archipelago:", &start);
+
+    ds = g_strdup(start);
+    tokens[0] = strtok(ds, "/");
+    tokens[1] = strtok(NULL, ":");
+    tokens[2] = strtok(NULL, ":");
+    tokens[3] = strtok(NULL, "\0");
+
+    if (!strlen(tokens[0])) {
+        error_setg(errp, "volume name must be specified first");
+        g_free(ds);
+        return;
+    }
+
+    for (idx = 1; idx < 4; idx++) {
+        if (tokens[idx] != NULL) {
+            if (strstart(tokens[idx], "mport=", NULL)) {
+                xseg_find_port(tokens[idx], "mport=", &lmport);
+            }
+            if (strstart(tokens[idx], "vport=", NULL)) {
+                xseg_find_port(tokens[idx], "vport=", &lvport);
+            }
+            if (strstart(tokens[idx], "segment=", NULL)) {
+                xseg_find_segment(tokens[idx], "segment=", segment_name);
+            }
+        }
+    }
+
+    if ((lmport == -2) || (lvport == -2)) {
+        error_setg(errp, "mport and/or vport must be set");
+        g_free(ds);
+        return;
+    }
+    *volume = g_strdup(tokens[0]);
+    *mport = lmport;
+    *vport = lvport;
+    g_free(ds);
+}
+
+static void archipelago_parse_filename(const char *filename, QDict *options,
+                                       Error **errp)
+{
+    const char *start;
+    char *volume = NULL, *segment_name = NULL;
+    xport mport = NoPort, vport = NoPort;
+
+    if (qdict_haskey(options, ARCHIPELAGO_OPT_VOLUME)
+            || qdict_haskey(options, ARCHIPELAGO_OPT_SEGMENT)
+            || qdict_haskey(options, ARCHIPELAGO_OPT_MPORT)
+            || qdict_haskey(options, ARCHIPELAGO_OPT_VPORT)) {
+        error_setg(errp, "volume/mport/vport/segment and a file name may not be "
+                         "specified at the same time");
+        return;
+    }
+
+    if (!strstart(filename, "archipelago:", &start)) {
+        error_setg(errp, "File name must start with 'archipelago:'");
+        return;
+    }
+
+    if (!strlen(start) || strstart(start, "/", NULL)) {
+        error_setg(errp, "volume name must be specified");
+        return;
+    }
+
+    parse_filename_opts(filename, errp, &volume, &segment_name, &mport, &vport);
+
+    if (volume) {
+        qdict_put(options, ARCHIPELAGO_OPT_VOLUME, qstring_from_str(volume));
+        g_free(volume);
+    }
+    if (segment_name) {
+        qdict_put(options, ARCHIPELAGO_OPT_SEGMENT, qstring_from_str(segment_name));
+        g_free(segment_name);
+    }
+    if (mport != NoPort) {
+        qdict_put(options, ARCHIPELAGO_OPT_MPORT, qint_from_int(mport));
+    }
+    if (vport != NoPort) {
+        qdict_put(options, ARCHIPELAGO_OPT_VPORT, qint_from_int(vport));
+    }
+}
+
 static QemuOptsList archipelago_runtime_opts = {
     .name = "archipelago",
     .head = QTAILQ_HEAD_INITIALIZER(archipelago_runtime_opts.head),
@@ -802,6 +936,7 @@ static BlockDriver bdrv_archipelago = {
     .format_name         = "archipelago",
     .protocol_name       = "archipelago",
     .instance_size       = sizeof(BDRVArchipelagoState),
+    .bdrv_parse_filename = archipelago_parse_filename,
     .bdrv_file_open      = qemu_archipelago_open,
     .bdrv_close          = qemu_archipelago_close,
     .bdrv_getlength      = qemu_archipelago_getlength,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v6 3/5] block/archipelago: Add support for creating images
  2014-06-27  8:24 [Qemu-devel] [PATCH v6 0/5] Support Archipelago as a QEMU block backend Chrysostomos Nanakos
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 1/5] block: " Chrysostomos Nanakos
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 2/5] block/archipelago: Implement bdrv_parse_filename() Chrysostomos Nanakos
@ 2014-06-27  8:24 ` Chrysostomos Nanakos
  2014-07-02 14:01   ` Eric Blake
  2014-07-21 16:01   ` Stefan Hajnoczi
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 4/5] QMP: Add support for Archipelago Chrysostomos Nanakos
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 5/5] qemu-iotests: add support for Archipelago protocol Chrysostomos Nanakos
  4 siblings, 2 replies; 22+ messages in thread
From: Chrysostomos Nanakos @ 2014-06-27  8:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Chrysostomos Nanakos, stefanha

qemu-img archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>]
 [:segment=<segment_name>]] [size]

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
---
 block/archipelago.c |  149 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 149 insertions(+)

diff --git a/block/archipelago.c b/block/archipelago.c
index 3549454..3d5aff1 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -613,6 +613,140 @@ err_exit:
     xseg_leave(s->xseg);
 }
 
+static int qemu_archipelago_create_volume(Error **errp, const char *volname,
+                                          char *segment_name,
+                                          uint64_t size, xport mportno,
+                                          xport vportno)
+{
+    int ret, targetlen;
+    struct xseg *xseg = NULL;
+    struct xseg_request *req;
+    struct xseg_request_clone *xclone;
+    struct xseg_port *port;
+    xport srcport = NoPort, sport = NoPort;
+    char *target;
+
+    /* Try default values if none has been set */
+    if (mportno == (xport) -1) {
+        mportno = 1001;
+    }
+
+    if (vportno == (xport) -1) {
+        vportno = 501;
+    }
+
+    if (segment_name == NULL) {
+        segment_name = g_strdup("archipelago");
+    }
+
+    if (xseg_initialize()) {
+        error_setg(errp, "Cannot initialize XSEG");
+        return -1;
+    }
+
+    xseg = xseg_join((char *)"posix", segment_name,
+                     (char *)"posixfd", NULL);
+
+    if (!xseg) {
+        error_setg(errp, "Cannot join XSEG shared memory segment");
+        return -1;
+    }
+
+    port = xseg_bind_dynport(xseg);
+    srcport = port->portno;
+    init_local_signal(xseg, sport, srcport);
+
+    req = xseg_get_request(xseg, srcport, mportno, X_ALLOC);
+    if (!req) {
+        error_setg(errp, "Cannot get XSEG request");
+        return -1;
+    }
+
+    targetlen = strlen(volname);
+    ret = xseg_prep_request(xseg, req, targetlen,
+                            sizeof(struct xseg_request_clone));
+    if (ret < 0) {
+        error_setg(errp, "Cannot prepare XSEG request");
+        goto err_exit;
+    }
+
+    target = xseg_get_target(xseg, req);
+    if (!target) {
+        error_setg(errp, "Cannot get XSEG target.\n");
+        goto err_exit;
+    }
+    memcpy(target, volname, targetlen);
+    xclone = (struct xseg_request_clone *) xseg_get_data(xseg, req);
+    memset(xclone->target, 0 , XSEG_MAX_TARGETLEN);
+    xclone->targetlen = 0;
+    xclone->size = size;
+    req->offset = 0;
+    req->size = req->datalen;
+    req->op = X_CLONE;
+
+    xport p = xseg_submit(xseg, req, srcport, X_ALLOC);
+    if (p == NoPort) {
+        error_setg(errp, "Could not submit XSEG request");
+        goto err_exit;
+    }
+    xseg_signal(xseg, p);
+
+    ret = wait_reply(xseg, srcport, port, req);
+    if (ret < 0) {
+        error_setg(errp, "wait_reply() error.");
+    }
+
+    xseg_put_request(xseg, req, srcport);
+    xseg_quit_local_signal(xseg, srcport);
+    xseg_leave_dynport(xseg, port);
+    xseg_leave(xseg);
+    return ret;
+
+err_exit:
+    xseg_put_request(xseg, req, srcport);
+    xseg_quit_local_signal(xseg, srcport);
+    xseg_leave_dynport(xseg, port);
+    xseg_leave(xseg);
+    return -1;
+}
+
+static int qemu_archipelago_create(const char *filename,
+                                   QemuOpts *options,
+                                   Error **errp)
+{
+    int ret = 0;
+    uint64_t total_size = 0;
+    char *volname = NULL, *segment_name = NULL;
+    const char *start;
+    xport mport = NoPort, vport = NoPort;
+
+    if (!strstart(filename, "archipelago:", &start)) {
+        error_setg(errp, "File name must start with 'archipelago:'");
+        return -1;
+    }
+
+    if (!strlen(start) || strstart(start, "/", NULL)) {
+        error_setg(errp, "volume name must be specified");
+        return -1;
+    }
+
+    parse_filename_opts(filename, errp, &volname, &segment_name, &mport, &vport);
+    total_size = qemu_opt_get_size_del(options, BLOCK_OPT_SIZE, 0);
+
+    /* Create an Archipelago volume */
+    ret = qemu_archipelago_create_volume(errp, volname, segment_name,
+                                         total_size, mport,
+                                         vport);
+
+    if (volname) {
+        g_free(volname);
+    }
+    if (segment_name) {
+        g_free(segment_name);
+    }
+    return ret;
+}
+
 static void qemu_archipelago_aio_cancel(BlockDriverAIOCB *blockacb)
 {
     ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) blockacb;
@@ -925,6 +1059,19 @@ static int64_t qemu_archipelago_getlength(BlockDriverState *bs)
     return ret;
 }
 
+static QemuOptsList qemu_archipelago_create_opts = {
+    .name = "archipelago-create-opts",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_archipelago_create_opts.head),
+    .desc = {
+        {
+            .name = BLOCK_OPT_SIZE,
+            .type = QEMU_OPT_SIZE,
+            .help = "Virtual disk size"
+        },
+        { /* end of list */ }
+    }
+};
+
 static BlockDriverAIOCB *qemu_archipelago_aio_flush(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque)
 {
@@ -939,11 +1086,13 @@ static BlockDriver bdrv_archipelago = {
     .bdrv_parse_filename = archipelago_parse_filename,
     .bdrv_file_open      = qemu_archipelago_open,
     .bdrv_close          = qemu_archipelago_close,
+    .bdrv_create         = qemu_archipelago_create,
     .bdrv_getlength      = qemu_archipelago_getlength,
     .bdrv_aio_readv      = qemu_archipelago_aio_readv,
     .bdrv_aio_writev     = qemu_archipelago_aio_writev,
     .bdrv_aio_flush      = qemu_archipelago_aio_flush,
     .bdrv_has_zero_init  = bdrv_has_zero_init_1,
+    .create_opts         = &qemu_archipelago_create_opts,
 };
 
 static void bdrv_archipelago_init(void)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v6 4/5] QMP: Add support for Archipelago
  2014-06-27  8:24 [Qemu-devel] [PATCH v6 0/5] Support Archipelago as a QEMU block backend Chrysostomos Nanakos
                   ` (2 preceding siblings ...)
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 3/5] block/archipelago: Add support for creating images Chrysostomos Nanakos
@ 2014-06-27  8:24 ` Chrysostomos Nanakos
  2014-07-02 13:58   ` Eric Blake
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 5/5] qemu-iotests: add support for Archipelago protocol Chrysostomos Nanakos
  4 siblings, 1 reply; 22+ messages in thread
From: Chrysostomos Nanakos @ 2014-06-27  8:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Chrysostomos Nanakos, stefanha

Introduce new enum BlockdevOptionsArchipelago.

@volume:              #Name of the Archipelago volume image

@mport:               #'mport' is the port number on which mapperd is
                      listening. This is optional and if not specified,
                      QEMU will make Archipelago to use the default port.

@vport:               #'vport' is the port number on which vlmcd is
                      listening. This is optional and if not specified,
                      QEMU will make Archipelago to use the default port.

@segment:             #optional The name of the shared memory segment
                      Archipelago stack is using. This is optional
                      and if not specified, QEMU will make Archipelago
                      use the default value, 'archipelago'.

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
---
 qapi/block-core.json |   39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index af6b436..55eb152 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -190,8 +190,8 @@
 # @ro: true if the backing device was open read-only
 #
 # @drv: the name of the block format used to open the backing device. As of
-#       0.14.0 this can be: 'blkdebug', 'bochs', 'cloop', 'cow', 'dmg',
-#       'file', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
+#       0.14.0 this can be: 'archipelago', 'blkdebug', 'bochs', 'cloop', 'cow',
+#       'dmg', 'file', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
 #       'host_floppy', 'http', 'https', 'nbd', 'parallels', 'qcow',
 #       'qcow2', 'raw', 'tftp', 'vdi', 'vmdk', 'vpc', 'vvfat'
 #
@@ -1077,7 +1077,7 @@
 # Since: 2.0
 ##
 { 'enum': 'BlockdevDriver',
-  'data': [ 'file', 'host_device', 'host_cdrom', 'host_floppy',
+  'data': [ 'archipelago', 'file', 'host_device', 'host_cdrom', 'host_floppy',
             'http', 'https', 'ftp', 'ftps', 'tftp', 'vvfat', 'blkdebug',
             'blkverify', 'bochs', 'cloop', 'cow', 'dmg', 'parallels', 'qcow',
             'qcow2', 'qed', 'raw', 'vdi', 'vhdx', 'vmdk', 'vpc', 'quorum' ] }
@@ -1207,6 +1207,38 @@
             '*pass-discard-snapshot': 'bool',
             '*pass-discard-other': 'bool' } }
 
+
+##
+# @BlockdevOptionsArchipelago
+#
+# Driver specific block device options for Archipelago.
+#
+# @volume:              Name of the Archipelago volume image
+#
+#
+# @mport:               #optional The port number on which mapperd is
+#                       listening. This is optional
+#                       and if not specified, QEMU will make Archipelago
+#                       use the default port.
+#
+# @vport:               #optional The port number on which vlmcd is
+#                       listening. This is optional
+#                       and if not specified, QEMU will make Archipelago
+#                       use the default port.
+#
+# @segment:             #optional The name of the shared memory segment
+#                       Archipelago stack is using. This is optional
+#                       and if not specified, QEMU will make Archipelago
+#                       use the default value, 'archipelago'.
+# Since: 2.1
+##
+{ 'type': 'BlockdevOptionsArchipelago',
+  'data': { 'volume': 'str',
+            '*mport': 'int',
+            '*vport': 'int',
+            '*segment': 'str' } }
+
+
 ##
 # @BlkdebugEvent
 #
@@ -1347,6 +1379,7 @@
   'base': 'BlockdevOptionsBase',
   'discriminator': 'driver',
   'data': {
+      'archipelago':'BlockdevOptionsArchipelago',
       'file':       'BlockdevOptionsFile',
       'host_device':'BlockdevOptionsFile',
       'host_cdrom': 'BlockdevOptionsFile',
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v6 5/5] qemu-iotests: add support for Archipelago protocol
  2014-06-27  8:24 [Qemu-devel] [PATCH v6 0/5] Support Archipelago as a QEMU block backend Chrysostomos Nanakos
                   ` (3 preceding siblings ...)
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 4/5] QMP: Add support for Archipelago Chrysostomos Nanakos
@ 2014-06-27  8:24 ` Chrysostomos Nanakos
  2014-07-21 16:02   ` Stefan Hajnoczi
  4 siblings, 1 reply; 22+ messages in thread
From: Chrysostomos Nanakos @ 2014-06-27  8:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Chrysostomos Nanakos, stefanha

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
---
 tests/qemu-iotests/common    |    6 ++++++
 tests/qemu-iotests/common.rc |    9 ++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index 0aaf84d..a0e35c4 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -153,6 +153,7 @@ check options
     -nbd                test nbd
     -ssh                test ssh
     -nfs                test nfs
+    -archipelago        test archipelago
     -xdiff              graphical mode diff
     -nocache            use O_DIRECT on backing file
     -misalign           misalign memory allocations
@@ -264,6 +265,11 @@ testlist options
             xpand=false
             ;;
 
+        -archipelago)
+            IMGPROTO=archipelago
+            xpand=false
+            ;;
+
         -nocache)
             CACHEMODE="none"
             CACHEMODE_IS_DEFAULT=false
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 195c564..8ef1a52 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -64,6 +64,8 @@ elif [ "$IMGPROTO" = "ssh" ]; then
 elif [ "$IMGPROTO" = "nfs" ]; then
     TEST_DIR="nfs://127.0.0.1/$TEST_DIR"
     TEST_IMG=$TEST_DIR/t.$IMGFMT
+elif [ "$IMGPROTO" = "archipelago" ]; then
+    TEST_IMG="archipelago:at.$IMGFMT"
 else
     TEST_IMG=$IMGPROTO:$TEST_DIR/t.$IMGFMT
 fi
@@ -163,7 +165,8 @@ _make_test_img()
             -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
             -e "s# block_size=[0-9]\\+##g" \
             -e "s# block_state_zero=\\(on\\|off\\)##g" \
-            -e "s# log_size=[0-9]\\+##g"
+            -e "s# log_size=[0-9]\\+##g" \
+            -e "s/archipelago:a/TEST_DIR\//g"
 
     # Start an NBD server on the image file, which is what we'll be talking to
     if [ $IMGPROTO = "nbd" ]; then
@@ -206,6 +209,10 @@ _cleanup_test_img()
             rbd --no-progress rm "$TEST_DIR/t.$IMGFMT" > /dev/null
             ;;
 
+        archipelago)
+            vlmc remove "at.$IMGFMT" > /dev/null
+            ;;
+
         sheepdog)
             collie vdi delete "$TEST_DIR/t.$IMGFMT"
             ;;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 4/5] QMP: Add support for Archipelago
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 4/5] QMP: Add support for Archipelago Chrysostomos Nanakos
@ 2014-07-02 13:58   ` Eric Blake
  2014-07-02 14:11     ` Chrysostomos Nanakos
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Blake @ 2014-07-02 13:58 UTC (permalink / raw)
  To: Chrysostomos Nanakos, qemu-devel; +Cc: kwolf, stefanha

[-- Attachment #1: Type: text/plain, Size: 4440 bytes --]

On 06/27/2014 02:24 AM, Chrysostomos Nanakos wrote:
> Introduce new enum BlockdevOptionsArchipelago.
> 
> @volume:              #Name of the Archipelago volume image
> 
> @mport:               #'mport' is the port number on which mapperd is
>                       listening. This is optional and if not specified,
>                       QEMU will make Archipelago to use the default port.
> 
> @vport:               #'vport' is the port number on which vlmcd is
>                       listening. This is optional and if not specified,
>                       QEMU will make Archipelago to use the default port.
> 
> @segment:             #optional The name of the shared memory segment
>                       Archipelago stack is using. This is optional
>                       and if not specified, QEMU will make Archipelago
>                       use the default value, 'archipelago'.
> 
> Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
> ---
>  qapi/block-core.json |   39 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index af6b436..55eb152 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -190,8 +190,8 @@
>  # @ro: true if the backing device was open read-only
>  #
>  # @drv: the name of the block format used to open the backing device. As of
> -#       0.14.0 this can be: 'blkdebug', 'bochs', 'cloop', 'cow', 'dmg',
> -#       'file', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
> +#       0.14.0 this can be: 'archipelago', 'blkdebug', 'bochs', 'cloop', 'cow',
> +#       'dmg', 'file', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
>  #       'host_floppy', 'http', 'https', 'nbd', 'parallels', 'qcow',
>  #       'qcow2', 'raw', 'tftp', 'vdi', 'vmdk', 'vpc', 'vvfat'

The comment is wrong (although not necessarily your fault).  It should
really look more like:

As of 0.14.0, the following are supported:...
...probably some other releases mentioned, as I'm fairly certain not all
of the list was in 0.14.0...
As of 2.2, the following are also supported: 'archipelago'

>  #

> +##
> +# @BlockdevOptionsArchipelago
> +#
> +# Driver specific block device options for Archipelago.
> +#
> +# @volume:              Name of the Archipelago volume image
> +#
> +#
> +# @mport:               #optional The port number on which mapperd is

Why two blank lines?

> +#                       listening. This is optional
> +#                       and if not specified, QEMU will make Archipelago
> +#                       use the default port.

and that port number is?

> +#
> +# @vport:               #optional The port number on which vlmcd is
> +#                       listening. This is optional
> +#                       and if not specified, QEMU will make Archipelago
> +#                       use the default port.

and that port number is?

> +#
> +# @segment:             #optional The name of the shared memory segment
> +#                       Archipelago stack is using. This is optional
> +#                       and if not specified, QEMU will make Archipelago
> +#                       use the default value, 'archipelago'.
> +# Since: 2.1

At this point, I'm not sure if your series will make 2.1.  We've already
entered hard freeze, so even though your first post was before soft
freeze, it's hard to justify taking a new feature this late in the game
that still hasn't passed review.  You will probably have to change this
to 2.2 when rebasing.

> +##
> +{ 'type': 'BlockdevOptionsArchipelago',
> +  'data': { 'volume': 'str',
> +            '*mport': 'int',
> +            '*vport': 'int',
> +            '*segment': 'str' } }
> +

Looks reasonable.

> +
>  ##
>  # @BlkdebugEvent
>  #
> @@ -1347,6 +1379,7 @@
>    'base': 'BlockdevOptionsBase',
>    'discriminator': 'driver',
>    'data': {
> +      'archipelago':'BlockdevOptionsArchipelago',

Again, not necessarily your fault, but we ought to do a better job of
documenting when new union branches are added in later releases.

>        'file':       'BlockdevOptionsFile',
>        'host_device':'BlockdevOptionsFile',
>        'host_cdrom': 'BlockdevOptionsFile',
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/5] block: Support Archipelago as a QEMU block backend
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 1/5] block: " Chrysostomos Nanakos
@ 2014-07-02 13:59   ` Eric Blake
  2014-07-02 14:18     ` Chrysostomos Nanakos
  2014-07-10  0:23   ` Jeff Cody
  2014-07-22 12:35   ` Stefan Hajnoczi
  2 siblings, 1 reply; 22+ messages in thread
From: Eric Blake @ 2014-07-02 13:59 UTC (permalink / raw)
  To: Chrysostomos Nanakos, qemu-devel; +Cc: kwolf, stefanha

[-- Attachment #1: Type: text/plain, Size: 2480 bytes --]

On 06/27/2014 02:24 AM, Chrysostomos Nanakos wrote:
> VM Image on Archipelago volume is specified like this:
> 
> file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
> file.vport=<vlmcd_port>][,file.segment=<segment_name>]]
> 
> 'archipelago' is the protocol.
> 
> 'mport' is the port number on which mapperd is listening. This is optional
> and if not specified, QEMU will make Archipelago to use the default port.
> 
> 'vport' is the port number on which vlmcd is listening. This is optional
> and if not specified, QEMU will make Archipelago to use the default port.
> 
> 'segment' is the name of the shared memory segment Archipelago stack is using.
> This is optional and if not specified, QEMU will make Archipelago to use the
> default value, 'archipelago'.
> 
> Examples:
> 
> file.driver=archipelago,file.volume=my_vm_volume
> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
> file.vport=1234
> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
> file.vport=1234,file.segment=my_segment
> 
> Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
> ---

Just a high-level glance through, and not a thorough review.

The command line approach here looks reasonable, although it might be
easier to add the QAPI types first (patch 4/5) and then use that type in
this patch, instead of open-coding things.

> +++ b/block/archipelago.c
> @@ -0,0 +1,819 @@
> +/*
> + * QEMU Block driver for Archipelago
> + *
> + * Copyright 2014 GRNET S.A. All rights reserved.

Is it still legally open source if you reserve all rights, or does this
statement contradict the rest of your header?  (Not that you are the
first person to attempt this; the phrase "All rights reserved" appears
in a number of other files in qemu.git, including the mis-spelled
disas/libvixl/LICENCE)

> +
> +            switch (reqdata->op) {
> +            case ARCHIP_OP_READ:
> +                    data = xseg_get_data(s->xseg, req);
> +                    segreq = reqdata->segreq;

Coding style - indent by 4 spaces, not 8.


> +
> +static int __archipelago_submit_request(BDRVArchipelagoState *s,

Please don't name internal functions with leading __ - that namespace is
reserved for the compiler and libc.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 3/5] block/archipelago: Add support for creating images
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 3/5] block/archipelago: Add support for creating images Chrysostomos Nanakos
@ 2014-07-02 14:01   ` Eric Blake
  2014-07-02 14:06     ` Chrysostomos Nanakos
  2014-07-21 16:01   ` Stefan Hajnoczi
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Blake @ 2014-07-02 14:01 UTC (permalink / raw)
  To: Chrysostomos Nanakos, qemu-devel; +Cc: kwolf, stefanha

[-- Attachment #1: Type: text/plain, Size: 1267 bytes --]

On 06/27/2014 02:24 AM, Chrysostomos Nanakos wrote:
> qemu-img archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>]
>  [:segment=<segment_name>]] [size]
> 
> Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
> ---
>  block/archipelago.c |  149 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 149 insertions(+)
> 

> +
> +    /* Try default values if none has been set */
> +    if (mportno == (xport) -1) {
> +        mportno = 1001;
> +    }
> +
> +    if (vportno == (xport) -1) {
> +        vportno = 501;
> +    }

I've now seen these magic numbers in more than one patch; it's worth a
#define or enum name.

> +
> +    if (segment_name == NULL) {
> +        segment_name = g_strdup("archipelago");
> +    }
> +
> +    if (xseg_initialize()) {
> +        error_setg(errp, "Cannot initialize XSEG");
> +        return -1;
> +    }
> +
> +    xseg = xseg_join((char *)"posix", segment_name,
> +                     (char *)"posixfd", NULL);

Do you really have to cast away const to use this interface?  Have you
reported that as a design bug to the authors of xseg_join?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 3/5] block/archipelago: Add support for creating images
  2014-07-02 14:01   ` Eric Blake
@ 2014-07-02 14:06     ` Chrysostomos Nanakos
  0 siblings, 0 replies; 22+ messages in thread
From: Chrysostomos Nanakos @ 2014-07-02 14:06 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, stefanha

On 07/02/2014 05:01 PM, Eric Blake wrote:
> On 06/27/2014 02:24 AM, Chrysostomos Nanakos wrote:
>> qemu-img archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>]
>>   [:segment=<segment_name>]] [size]
>>
>> Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
>> ---
>>   block/archipelago.c |  149 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 149 insertions(+)
>>
>> +
>> +    /* Try default values if none has been set */
>> +    if (mportno == (xport) -1) {
>> +        mportno = 1001;
>> +    }
>> +
>> +    if (vportno == (xport) -1) {
>> +        vportno = 501;
>> +    }
> I've now seen these magic numbers in more than one patch; it's worth a
> #define or enum name.

Yes for sure. I will use two #define's for that.

>> +
>> +    if (segment_name == NULL) {
>> +        segment_name = g_strdup("archipelago");
>> +    }
>> +
>> +    if (xseg_initialize()) {
>> +        error_setg(errp, "Cannot initialize XSEG");
>> +        return -1;
>> +    }
>> +
>> +    xseg = xseg_join((char *)"posix", segment_name,
>> +                     (char *)"posixfd", NULL);
> Do you really have to cast away const to use this interface?  Have you
> reported that as a design bug to the authors of xseg_join?

I will remove casting and fix xseg_join.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 4/5] QMP: Add support for Archipelago
  2014-07-02 13:58   ` Eric Blake
@ 2014-07-02 14:11     ` Chrysostomos Nanakos
  2014-07-02 14:22       ` Eric Blake
  0 siblings, 1 reply; 22+ messages in thread
From: Chrysostomos Nanakos @ 2014-07-02 14:11 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, stefanha

On 07/02/2014 04:58 PM, Eric Blake wrote:
> On 06/27/2014 02:24 AM, Chrysostomos Nanakos wrote:
>> Introduce new enum BlockdevOptionsArchipelago.
>>
>> @volume:              #Name of the Archipelago volume image
>>
>> @mport:               #'mport' is the port number on which mapperd is
>>                        listening. This is optional and if not specified,
>>                        QEMU will make Archipelago to use the default port.
>>
>> @vport:               #'vport' is the port number on which vlmcd is
>>                        listening. This is optional and if not specified,
>>                        QEMU will make Archipelago to use the default port.
>>
>> @segment:             #optional The name of the shared memory segment
>>                        Archipelago stack is using. This is optional
>>                        and if not specified, QEMU will make Archipelago
>>                        use the default value, 'archipelago'.
>>
>> Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
>> ---
>>   qapi/block-core.json |   39 ++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 36 insertions(+), 3 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index af6b436..55eb152 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -190,8 +190,8 @@
>>   # @ro: true if the backing device was open read-only
>>   #
>>   # @drv: the name of the block format used to open the backing device. As of
>> -#       0.14.0 this can be: 'blkdebug', 'bochs', 'cloop', 'cow', 'dmg',
>> -#       'file', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
>> +#       0.14.0 this can be: 'archipelago', 'blkdebug', 'bochs', 'cloop', 'cow',
>> +#       'dmg', 'file', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
>>   #       'host_floppy', 'http', 'https', 'nbd', 'parallels', 'qcow',
>>   #       'qcow2', 'raw', 'tftp', 'vdi', 'vmdk', 'vpc', 'vvfat'
> The comment is wrong (although not necessarily your fault).  It should
> really look more like:
>
> As of 0.14.0, the following are supported:...
> ...probably some other releases mentioned, as I'm fairly certain not all
> of the list was in 0.14.0...
> As of 2.2, the following are also supported: 'archipelago'

After rebasing I can add 'archipelago' in the 2.2 list. Do you agree?

>
>>   #
>> +##
>> +# @BlockdevOptionsArchipelago
>> +#
>> +# Driver specific block device options for Archipelago.
>> +#
>> +# @volume:              Name of the Archipelago volume image
>> +#
>> +#
>> +# @mport:               #optional The port number on which mapperd is
> Why two blank lines?
>
>> +#                       listening. This is optional
>> +#                       and if not specified, QEMU will make Archipelago
>> +#                       use the default port.
> and that port number is?
>
>> +#
>> +# @vport:               #optional The port number on which vlmcd is
>> +#                       listening. This is optional
>> +#                       and if not specified, QEMU will make Archipelago
>> +#                       use the default port.
> and that port number is?

Sorry for that, I will add the default ports.

>
>> +#
>> +# @segment:             #optional The name of the shared memory segment
>> +#                       Archipelago stack is using. This is optional
>> +#                       and if not specified, QEMU will make Archipelago
>> +#                       use the default value, 'archipelago'.
>> +# Since: 2.1
> At this point, I'm not sure if your series will make 2.1.  We've already
> entered hard freeze, so even though your first post was before soft
> freeze, it's hard to justify taking a new feature this late in the game
> that still hasn't passed review.  You will probably have to change this
> to 2.2 when rebasing.
>
>> +##
>> +{ 'type': 'BlockdevOptionsArchipelago',
>> +  'data': { 'volume': 'str',
>> +            '*mport': 'int',
>> +            '*vport': 'int',
>> +            '*segment': 'str' } }
>> +
> Looks reasonable.
>
>> +
>>   ##
>>   # @BlkdebugEvent
>>   #
>> @@ -1347,6 +1379,7 @@
>>     'base': 'BlockdevOptionsBase',
>>     'discriminator': 'driver',
>>     'data': {
>> +      'archipelago':'BlockdevOptionsArchipelago',
> Again, not necessarily your fault, but we ought to do a better job of
> documenting when new union branches are added in later releases.

Any proposition for this one?

>
>>         'file':       'BlockdevOptionsFile',
>>         'host_device':'BlockdevOptionsFile',
>>         'host_cdrom': 'BlockdevOptionsFile',
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/5] block: Support Archipelago as a QEMU block backend
  2014-07-02 13:59   ` Eric Blake
@ 2014-07-02 14:18     ` Chrysostomos Nanakos
  2014-07-02 14:30       ` Eric Blake
  0 siblings, 1 reply; 22+ messages in thread
From: Chrysostomos Nanakos @ 2014-07-02 14:18 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, stefanha

On 07/02/2014 04:59 PM, Eric Blake wrote:
> On 06/27/2014 02:24 AM, Chrysostomos Nanakos wrote:
>> VM Image on Archipelago volume is specified like this:
>>
>> file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
>> file.vport=<vlmcd_port>][,file.segment=<segment_name>]]
>>
>> 'archipelago' is the protocol.
>>
>> 'mport' is the port number on which mapperd is listening. This is optional
>> and if not specified, QEMU will make Archipelago to use the default port.
>>
>> 'vport' is the port number on which vlmcd is listening. This is optional
>> and if not specified, QEMU will make Archipelago to use the default port.
>>
>> 'segment' is the name of the shared memory segment Archipelago stack is using.
>> This is optional and if not specified, QEMU will make Archipelago to use the
>> default value, 'archipelago'.
>>
>> Examples:
>>
>> file.driver=archipelago,file.volume=my_vm_volume
>> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
>> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
>> file.vport=1234
>> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
>> file.vport=1234,file.segment=my_segment
>>
>> Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
>> ---
> Just a high-level glance through, and not a thorough review.
>
> The command line approach here looks reasonable, although it might be
> easier to add the QAPI types first (patch 4/5) and then use that type in
> this patch, instead of open-coding things.

If I understand well the order of the commit should change? Patch 4/5 
should be first (1/5) and then 1/5 should be 2/5 and so on?

>
>> +++ b/block/archipelago.c
>> @@ -0,0 +1,819 @@
>> +/*
>> + * QEMU Block driver for Archipelago
>> + *
>> + * Copyright 2014 GRNET S.A. All rights reserved.
> Is it still legally open source if you reserve all rights, or does this
> statement contradict the rest of your header?  (Not that you are the
> first person to attempt this; the phrase "All rights reserved" appears
> in a number of other files in qemu.git, including the mis-spelled
> disas/libvixl/LICENCE)

This is an error. I apologize. I will remove "All rights reserved" and 
let the rest as is in the next patch series. Do you agree?

>
>> +
>> +            switch (reqdata->op) {
>> +            case ARCHIP_OP_READ:
>> +                    data = xseg_get_data(s->xseg, req);
>> +                    segreq = reqdata->segreq;
> Coding style - indent by 4 spaces, not 8.
>
>
>> +
>> +static int __archipelago_submit_request(BDRVArchipelagoState *s,
> Please don't name internal functions with leading __ - that namespace is
> reserved for the compiler and libc.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 4/5] QMP: Add support for Archipelago
  2014-07-02 14:11     ` Chrysostomos Nanakos
@ 2014-07-02 14:22       ` Eric Blake
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Blake @ 2014-07-02 14:22 UTC (permalink / raw)
  To: Chrysostomos Nanakos, qemu-devel; +Cc: kwolf, stefanha

[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]

On 07/02/2014 08:11 AM, Chrysostomos Nanakos wrote:
> On 07/02/2014 04:58 PM, Eric Blake wrote:
>> On 06/27/2014 02:24 AM, Chrysostomos Nanakos wrote:
>>> Introduce new enum BlockdevOptionsArchipelago.
>>>

>>> @@ -1347,6 +1379,7 @@
>>>     'base': 'BlockdevOptionsBase',
>>>     'discriminator': 'driver',
>>>     'data': {
>>> +      'archipelago':'BlockdevOptionsArchipelago',
>> Again, not necessarily your fault, but we ought to do a better job of
>> documenting when new union branches are added in later releases.
> 
> Any proposition for this one?

Here's what was done for NetClientOptions:

# Since 1.2
#
# 'l2tpv3' - since 2.1

although doing it properly would best be done as a separate patch that
audits all qapi types.  I'm not trying to force you to take on the
documentation cleanup patch, although getting just your addition to
mention when it was added will make that cleanup audit easier for
whoever takes that task on.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/5] block: Support Archipelago as a QEMU block backend
  2014-07-02 14:18     ` Chrysostomos Nanakos
@ 2014-07-02 14:30       ` Eric Blake
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Blake @ 2014-07-02 14:30 UTC (permalink / raw)
  To: Chrysostomos Nanakos, qemu-devel; +Cc: kwolf, stefanha

[-- Attachment #1: Type: text/plain, Size: 2119 bytes --]

On 07/02/2014 08:18 AM, Chrysostomos Nanakos wrote:
> On 07/02/2014 04:59 PM, Eric Blake wrote:
>> On 06/27/2014 02:24 AM, Chrysostomos Nanakos wrote:
>>> VM Image on Archipelago volume is specified like this:
>>>
>>> file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
>>>
>>> file.vport=<vlmcd_port>][,file.segment=<segment_name>]]
>>>
>>> 'archipelago' is the protocol.
>>>

>> Just a high-level glance through, and not a thorough review.
>>
>> The command line approach here looks reasonable, although it might be
>> easier to add the QAPI types first (patch 4/5) and then use that type in
>> this patch, instead of open-coding things.
> 
> If I understand well the order of the commit should change? Patch 4/5
> should be first (1/5) and then 1/5 should be 2/5 and so on?

'git rebase -i' can be used to reorder patches; my question is whether
the current patch 4/5 should be hoisted earlier into the series (either
squashed in with the current patch 1, or at least adjacent to it - but
maybe moving it to 2/5 rather than 1/5).  What I'm less sure of is
whether the auto-generated types created by declaring your structure in
the .json file will make life in this patch easier by reusing that
struct, instead of open-coding your QemuOpts.  So it is more of a
question on my end on how much code can be shared between the QemuOpts
of the command line and the QMP additions (since I haven't actually
written a block device myself).  On looking a bit closer, it looks like
you have done things in the correct order.  After all, the command line
parsing was implemented first on most other block devices, then we added
QMP blockdev-add, and its implementation is merely taking a generated
QAPI struct, and converting it into a QemuOpts data structure that can
then be fed to the pre-existing command line code.

At any rate, the important part is that each patch compiles in
isolation, and that you aren't duplicating large portions of code.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/5] block: Support Archipelago as a QEMU block backend
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 1/5] block: " Chrysostomos Nanakos
  2014-07-02 13:59   ` Eric Blake
@ 2014-07-10  0:23   ` Jeff Cody
  2014-07-10 10:04     ` Chrysostomos Nanakos
  2014-07-22 12:35   ` Stefan Hajnoczi
  2 siblings, 1 reply; 22+ messages in thread
From: Jeff Cody @ 2014-07-10  0:23 UTC (permalink / raw)
  To: Chrysostomos Nanakos; +Cc: kwolf, qemu-devel, stefanha

On Fri, Jun 27, 2014 at 11:24:08AM +0300, Chrysostomos Nanakos wrote:
> VM Image on Archipelago volume is specified like this:
> 
> file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
> file.vport=<vlmcd_port>][,file.segment=<segment_name>]]
> 
> 'archipelago' is the protocol.
> 
> 'mport' is the port number on which mapperd is listening. This is optional
> and if not specified, QEMU will make Archipelago to use the default port.
> 
> 'vport' is the port number on which vlmcd is listening. This is optional
> and if not specified, QEMU will make Archipelago to use the default port.
> 
> 'segment' is the name of the shared memory segment Archipelago stack is using.
> This is optional and if not specified, QEMU will make Archipelago to use the
> default value, 'archipelago'.
> 
> Examples:
> 
> file.driver=archipelago,file.volume=my_vm_volume
> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
> file.vport=1234
> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
> file.vport=1234,file.segment=my_segment
> 
> Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>

This is just a superficial review, because I don't have a good idea of
what archipelago or libxseg really does (I didn't even compile it or
these patches).  But I scanned through this patch, and found a few
things, and had a few questions.

> ---
>  MAINTAINERS         |    6 +
>  block/Makefile.objs |    2 +
>  block/archipelago.c |  819 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  configure           |   40 +++
>  4 files changed, 867 insertions(+)
>  create mode 100644 block/archipelago.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9b93edd..58ef1e3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -999,3 +999,9 @@ SSH
>  M: Richard W.M. Jones <rjones@redhat.com>
>  S: Supported
>  F: block/ssh.c
> +
> +ARCHIPELAGO
> +M: Chrysostomos Nanakos <cnanakos@grnet.gr>
> +M: Chrysostomos Nanakos <chris@include.gr>
> +S: Maintained
> +F: block/archipelago.c
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index fd88c03..858d2b3 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -17,6 +17,7 @@ block-obj-$(CONFIG_LIBNFS) += nfs.o
>  block-obj-$(CONFIG_CURL) += curl.o
>  block-obj-$(CONFIG_RBD) += rbd.o
>  block-obj-$(CONFIG_GLUSTERFS) += gluster.o
> +block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
>  block-obj-$(CONFIG_LIBSSH2) += ssh.o
>  endif
>  
> @@ -35,5 +36,6 @@ gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
>  gluster.o-libs     := $(GLUSTERFS_LIBS)
>  ssh.o-cflags       := $(LIBSSH2_CFLAGS)
>  ssh.o-libs         := $(LIBSSH2_LIBS)
> +archipelago.o-libs := $(ARCHIPELAGO_LIBS)
>  qcow.o-libs        := -lz
>  linux-aio.o-libs   := -laio
> diff --git a/block/archipelago.c b/block/archipelago.c
> new file mode 100644
> index 0000000..c56826a
> --- /dev/null
> +++ b/block/archipelago.c
> @@ -0,0 +1,819 @@
> +/*
> + * QEMU Block driver for Archipelago
> + *
> + * Copyright 2014 GRNET S.A. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + *   1. Redistributions of source code must retain the above
> + *      copyright notice, this list of conditions and the following
> + *      disclaimer.
> + *   2. Redistributions in binary form must reproduce the above
> + *      copyright notice, this list of conditions and the following
> + *      disclaimer in the documentation and/or other materials
> + *      provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY GRNET S.A. ``AS IS'' AND ANY EXPRESS
> + * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
> + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GRNET S.A OR
> + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> + * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
> + * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE.
> + *
> + * The views and conclusions contained in the software and
> + * documentation are those of the authors and should not be
> + * interpreted as representing official policies, either expressed
> + * or implied, of GRNET S.A.
> + */
> +
> +/*
> +* VM Image on Archipelago volume is specified like this:
> +*
> +* file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
> +* file.vport=<vlmcd_port>][,file.segment=<segment_name>]]
> +*
> +* 'archipelago' is the protocol.
> +*
> +* 'mport' is the port number on which mapperd is listening. This is optional
> +* and if not specified, QEMU will make Archipelago to use the default port.
> +*
> +* 'vport' is the port number on which vlmcd is listening. This is optional
> +* and if not specified, QEMU will make Archipelago to use the default port.
> +*
> +* 'segment' is the name of the shared memory segment Archipelago stack is using.
> +* This is optional and if not specified, QEMU will make Archipelago to use the
> +* default value, 'archipelago'.
> +*
> +* Examples:
> +*
> +* file.driver=archipelago,file.volume=my_vm_volume
> +* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
> +* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
> +* file.vport=1234
> +* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
> +* file.vport=1234,file.segment=my_segment
> +*/
> +
> +#include "block/block_int.h"
> +#include "qemu/error-report.h"
> +#include "qemu/thread.h"
> +#include "qapi/qmp/qint.h"
> +#include "qapi/qmp/qstring.h"
> +#include "qapi/qmp/qjson.h"
> +
> +#include <inttypes.h>
> +#include <xseg/xseg.h>
> +#include <xseg/protocol.h>
> +
> +#define ARCHIP_FD_READ      0
> +#define ARCHIP_FD_WRITE     1
> +#define MAX_REQUEST_SIZE    524288
> +
> +#define ARCHIPELAGO_OPT_VOLUME      "volume"
> +#define ARCHIPELAGO_OPT_SEGMENT     "segment"
> +#define ARCHIPELAGO_OPT_MPORT       "mport"
> +#define ARCHIPELAGO_OPT_VPORT       "vport"
> +
> +#define archipelagolog(fmt, ...) \
> +    do {                         \
> +        fprintf(stderr, "archipelago\t%-24s: " fmt, __func__, ##__VA_ARGS__); \
> +    } while (0)
> +
> +typedef enum {
> +    ARCHIP_OP_READ,
> +    ARCHIP_OP_WRITE,
> +    ARCHIP_OP_FLUSH,
> +    ARCHIP_OP_VOLINFO,
> +} ARCHIPCmd;
> +
> +typedef struct ArchipelagoAIOCB {
> +    BlockDriverAIOCB common;
> +    QEMUBH *bh;
> +    struct BDRVArchipelagoState *s;
> +    QEMUIOVector *qiov;
> +    void *buffer;
> +    ARCHIPCmd cmd;
> +    bool cancelled;
> +    int status;
> +    int64_t size;
> +    int64_t ret;
> +} ArchipelagoAIOCB;
> +
> +typedef struct BDRVArchipelagoState {
> +    ArchipelagoAIOCB *event_acb;
> +    char *volname;
> +    char *segment_name;
> +    uint64_t size;
> +    /* Archipelago specific */
> +    struct xseg *xseg;

I assume s->xseg is allocated in xseg_join() - is it ever freed?  In
_close(), there is a final call to xseg_leave(s->xseg), but from what
I found in libxseg, it does not appear to be freed:
https://github.com/cnanakos/libxseg/blob/develop/src/xseg.c#L975

Is it up to libxseg to free xseg, or the caller?


> +    struct xseg_port *port;
> +    xport srcport;
> +    xport sport;
> +    xport mportno;
> +    xport vportno;
> +    QemuMutex archip_mutex;
> +    QemuCond archip_cond;
> +    bool is_signaled;
> +    /* Request handler specific */
> +    QemuThread request_th;
> +    QemuCond request_cond;
> +    QemuMutex request_mutex;
> +    bool th_is_signaled;
> +    bool stopping;
> +} BDRVArchipelagoState;
> +
> +typedef struct ArchipelagoSegmentedRequest {
> +    size_t count;
> +    size_t total;
> +    int ref;
> +    int failed;
> +} ArchipelagoSegmentedRequest;
> +
> +typedef struct AIORequestData {
> +    const char *volname;
> +    off_t offset;
> +    size_t size;
> +    uint64_t bufidx;
> +    int ret;
> +    int op;
> +    ArchipelagoAIOCB *aio_cb;
> +    ArchipelagoSegmentedRequest *segreq;
> +} AIORequestData;
> +
> +static void qemu_archipelago_complete_aio(void *opaque);
> +
> +static void init_local_signal(struct xseg *xseg, xport sport, xport srcport)
> +{
> +    if (xseg && (sport != srcport)) {
> +        xseg_init_local_signal(xseg, srcport);
> +        sport = srcport;
> +    }
> +}
> +
> +static void archipelago_finish_aiocb(AIORequestData *reqdata)
> +{
> +    if (reqdata->aio_cb->ret != reqdata->segreq->total) {
> +        reqdata->aio_cb->ret = -EIO;
> +    } else if (reqdata->aio_cb->ret == reqdata->segreq->total) {
> +        reqdata->aio_cb->ret = 0;
> +    }
> +    reqdata->aio_cb->bh = aio_bh_new(
> +                        bdrv_get_aio_context(reqdata->aio_cb->common.bs),
> +                        qemu_archipelago_complete_aio, reqdata
> +                        );
> +    qemu_bh_schedule(reqdata->aio_cb->bh);
> +}
> +
> +static int wait_reply(struct xseg *xseg, xport srcport, struct xseg_port *port,
> +                      struct xseg_request *expected_req)
> +{
> +    struct xseg_request *req;
> +    xseg_prepare_wait(xseg, srcport);
> +    void *psd = xseg_get_signal_desc(xseg, port);
> +    while (1) {
> +        req = xseg_receive(xseg, srcport, 0);
> +        if (req) {
> +            if (req != expected_req) {
> +                archipelagolog("Unknown received request\n");
> +                xseg_put_request(xseg, req, srcport);
> +            } else if (!(req->state & XS_SERVED)) {
> +                return -1;
> +            } else {
> +                break;
> +            }
> +        }
> +        xseg_wait_signal(xseg, psd, 100000UL);
> +    }
> +    xseg_cancel_wait(xseg, srcport);
> +    return 0;
> +}
> +
> +static void xseg_request_handler(void *state)
> +{
> +    BDRVArchipelagoState *s = (BDRVArchipelagoState *) state;
> +    void *psd = xseg_get_signal_desc(s->xseg, s->port);
> +    qemu_mutex_lock(&s->request_mutex);
> +
> +    while (!s->stopping) {
> +        struct xseg_request *req;
> +        void *data;
> +        xseg_prepare_wait(s->xseg, s->srcport);
> +        req = xseg_receive(s->xseg, s->srcport, 0);

Is this a blocking call?  If so, is there a timeout, and if not, what
scenarios (if any) could cause us to wait here indefinitely?

> +        if (req) {
> +            AIORequestData *reqdata;
> +            ArchipelagoSegmentedRequest *segreq;
> +            xseg_get_req_data(s->xseg, req, (void **)&reqdata);
> +
> +            switch (reqdata->op) {
> +            case ARCHIP_OP_READ:
> +                    data = xseg_get_data(s->xseg, req);
> +                    segreq = reqdata->segreq;
> +                    segreq->count += req->serviced;
> +
> +                    qemu_iovec_from_buf(reqdata->aio_cb->qiov, reqdata->bufidx,
> +                                        data,
> +                                        req->serviced);
> +
> +                    xseg_put_request(s->xseg, req, s->srcport);
> +
> +                    if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {
> +                        if (!segreq->failed) {
> +                            reqdata->aio_cb->ret = segreq->count;
> +                            archipelago_finish_aiocb(reqdata);
> +                            g_free(segreq);
> +                        } else {
> +                            g_free(segreq);
> +                            g_free(reqdata);
> +                        }
> +                    } else {
> +                        g_free(reqdata);
> +                    }
> +                    break;
> +            case ARCHIP_OP_WRITE:
> +                    segreq = reqdata->segreq;
> +                    segreq->count += req->serviced;
> +                    xseg_put_request(s->xseg, req, s->srcport);
> +
> +                    if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {
> +                        if (!segreq->failed) {
> +                            reqdata->aio_cb->ret = segreq->count;
> +                            archipelago_finish_aiocb(reqdata);
> +                            g_free(segreq);
> +                        } else {
> +                            g_free(segreq);
> +                            g_free(reqdata);
> +                        }
> +                    } else {
> +                        g_free(reqdata);
> +                    

This (OP_WRITE / OP_READ) is where I am worried that we leak in error
cases, and a _close() won't clean it up (see later comments).


> +                    break;
> +            case ARCHIP_OP_VOLINFO:
> +                    s->is_signaled = true;
> +                    qemu_cond_signal(&s->archip_cond);
> +                    break;
> +            }
> +        } else {
> +            xseg_wait_signal(s->xseg, psd, 100000UL);
> +        }
> +        xseg_cancel_wait(s->xseg, s->srcport);



> +    }
> +
> +    s->th_is_signaled = true;
> +    qemu_cond_signal(&s->request_cond);
> +    qemu_mutex_unlock(&s->request_mutex);
> +    qemu_thread_exit(NULL);
> +}
> +
> +static int qemu_archipelago_xseg_init(BDRVArchipelagoState *s)
> +{
> +    if (xseg_initialize()) {
> +        archipelagolog("Cannot initialize XSEG\n");
> +        goto err_exit;
> +    }
> +
> +    s->xseg = xseg_join((char *)"posix", s->segment_name,
> +                        (char *)"posixfd", NULL);
> +    if (!s->xseg) {
> +        archipelagolog("Cannot join XSEG shared memory segment\n");
> +        goto err_exit;
> +    }
> +    s->port = xseg_bind_dynport(s->xseg);
> +    s->srcport = s->port->portno;
> +    init_local_signal(s->xseg, s->sport, s->srcport);
> +    return 0;
> +
> +err_exit:
> +    return -1;
> +}
> +
> +static int qemu_archipelago_init(BDRVArchipelagoState *s)
> +{
> +    int ret;
> +
> +    ret = qemu_archipelago_xseg_init(s);
> +    if (ret < 0) {
> +        error_report("Cannot initialize XSEG. Aborting...\n");
> +        goto err_exit;
> +    }
> +
> +    qemu_cond_init(&s->archip_cond);
> +    qemu_mutex_init(&s->archip_mutex);
> +    qemu_cond_init(&s->request_cond);
> +    qemu_mutex_init(&s->request_mutex);
> +    s->th_is_signaled = false;
> +    qemu_thread_create(&s->request_th, "xseg_io_th",
> +                       (void *) xseg_request_handler,
> +                       (void *) s, QEMU_THREAD_JOINABLE);
> +
> +err_exit:
> +    return ret;
> +}
> +
> +static void qemu_archipelago_complete_aio(void *opaque)
> +{
> +    AIORequestData *reqdata = (AIORequestData *) opaque;
> +    ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) reqdata->aio_cb;
> +
> +    qemu_bh_delete(aio_cb->bh);
> +    qemu_vfree(aio_cb->buffer);
> +    aio_cb->common.cb(aio_cb->common.opaque, aio_cb->ret);
> +    aio_cb->status = 0;
> +
> +    if (!aio_cb->cancelled) {
> +        qemu_aio_release(aio_cb);
> +    }
> +    g_free(reqdata);
> +}
> +
> +static QemuOptsList archipelago_runtime_opts = {
> +    .name = "archipelago",
> +    .head = QTAILQ_HEAD_INITIALIZER(archipelago_runtime_opts.head),
> +    .desc = {
> +        {
> +            .name = ARCHIPELAGO_OPT_VOLUME,
> +            .type = QEMU_OPT_STRING,
> +            .help = "Name of the volume image",
> +        },
> +        {
> +            .name = ARCHIPELAGO_OPT_SEGMENT,
> +            .type = QEMU_OPT_STRING,
> +            .help = "Name of the Archipelago shared memory segment",
> +        },
> +        {
> +            .name = ARCHIPELAGO_OPT_MPORT,
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "Archipelago mapperd port number"
> +        },
> +        {
> +            .name = ARCHIPELAGO_OPT_VPORT,
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "Archipelago vlmcd port number"
> +
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +
> +static int qemu_archipelago_open(BlockDriverState *bs,
> +                                 QDict *options,
> +                                 int bdrv_flags,
> +                                 Error **errp)
> +{
> +    int ret = 0;
> +    const char *volume, *segment_name;
> +    QemuOpts *opts;
> +    Error *local_err = NULL;
> +    BDRVArchipelagoState *s = bs->opaque;
> +
> +    opts = qemu_opts_create(&archipelago_runtime_opts, NULL, 0, &error_abort);
> +    qemu_opts_absorb_qdict(opts, options, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        qemu_opts_del(opts);
> +        return -EINVAL;
> +    }
> +
> +    s->mportno = qemu_opt_get_number(opts, ARCHIPELAGO_OPT_MPORT, 1001);
> +    s->vportno = qemu_opt_get_number(opts, ARCHIPELAGO_OPT_VPORT, 501);
> +
> +    segment_name = qemu_opt_get(opts, ARCHIPELAGO_OPT_SEGMENT);
> +    if (segment_name == NULL) {
> +        s->segment_name = g_strdup("archipelago");
> +    } else {
> +        s->segment_name = g_strdup(segment_name);
> +    }
> +
> +    volume = qemu_opt_get(opts, ARCHIPELAGO_OPT_VOLUME);
> +    if (volume == NULL) {
> +        error_setg(errp, "archipelago block driver requires the 'volume'"
> +                   " option");
> +        qemu_opts_del(opts);
> +        return -EINVAL;

s->segment_name is leaked here.

You already have an exit label (err_exit) that cleans everything up,
and g_free() is NULL safe (and bs->opaque is zero-initialized).

You should be able to just set ret, and 'goto err_exit' in each error
instance in qemu_archipelago_open() - this also gets rid of the extra
qemu_opts_del() calls.

> +    }
> +    s->volname = g_strdup(volume);
> +
> +    /* Initialize XSEG, join shared memory segment */
> +    ret = qemu_archipelago_init(s);
> +    if (ret < 0) {
> +        error_setg(errp, "cannot initialize XSEG and join shared "
> +                   "memory segment");
> +        goto err_exit;
> +    }
> +
> +    qemu_opts_del(opts);
> +    return 0;
> +
> +err_exit:
> +    g_free(s->volname);
> +    g_free(s->segment_name);
> +    qemu_opts_del(opts);
> +    return ret;
> +}
> +
> +static void qemu_archipelago_close(BlockDriverState *bs)
> +{
> +    int r, targetlen;
> +    char *target;
> +    struct xseg_request *req;
> +    BDRVArchipelagoState *s = bs->opaque;
> +
> +    s->stopping = true;
> +
> +    qemu_mutex_lock(&s->request_mutex);
> +    while (!s->th_is_signaled) {
> +        qemu_cond_wait(&s->request_cond,
> +                       &s->request_mutex);
> +    }
> +    qemu_mutex_unlock(&s->request_mutex);
> +    qemu_thread_join(&s->request_th);
> +    qemu_cond_destroy(&s->request_cond);
> +    qemu_mutex_destroy(&s->request_mutex);
> +
> +    qemu_cond_destroy(&s->archip_cond);
> +    qemu_mutex_destroy(&s->archip_mutex);
> +
> +    targetlen = strlen(s->volname);

Should this be strlen(s->volname) + 1, to account for the '\0'?  Or
does xseg_prep_request() just need the length of the non-null
terminated string?

> +    req = xseg_get_request(s->xseg, s->srcport, s->vportno, X_ALLOC);
> +    if (!req) {
> +        archipelagolog("Cannot get XSEG request\n");
> +        goto err_exit;
> +    }
> +    r = xseg_prep_request(s->xseg, req, targetlen, 0);
> +    if (r < 0) {
> +        xseg_put_request(s->xseg, req, s->srcport);

What does this do here, if xseg_prep_request() failed?  Is it
essentially a cleanup function?

> +        archipelagolog("Cannot prepare XSEG close request\n");
> +        goto err_exit;
> +    }
> +
> +    target = xseg_get_target(s->xseg, req);
> +    memcpy(target, s->volname, targetlen);
> +    req->size = req->datalen;
> +    req->offset = 0;
> +    req->op = X_CLOSE;
> +
> +    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
> +    if (p == NoPort) {
> +        xseg_put_request(s->xseg, req, s->srcport);
> +        archipelagolog("Cannot submit XSEG close request\n");
> +        goto err_exit;
> +    }
> +
> +    xseg_signal(s->xseg, p);
> +    wait_reply(s->xseg, s->srcport, s->port, req);

This is another spot I am wondering if we could get stuck on a
blocking call that could potentially wait forever... is there a
timeout here?

> +
> +    xseg_put_request(s->xseg, req, s->srcport);
> +
> +err_exit:
> +    g_free(s->volname);
> +    g_free(s->segment_name);
> +    xseg_quit_local_signal(s->xseg, s->srcport);
> +    xseg_leave_dynport(s->xseg, s->port);
> +    xseg_leave(s->xseg);
> +}
> +
> +static void qemu_archipelago_aio_cancel(BlockDriverAIOCB *blockacb)
> +{
> +    ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) blockacb;
> +    aio_cb->cancelled = true;
> +    while (aio_cb->status == -EINPROGRESS) {
> +        qemu_aio_wait();
> +    }
> +    qemu_aio_release(aio_cb);
> +}
> +
> +static const AIOCBInfo archipelago_aiocb_info = {
> +    .aiocb_size = sizeof(ArchipelagoAIOCB),
> +    .cancel = qemu_archipelago_aio_cancel,
> +};
> +
> +static int __archipelago_submit_request(BDRVArchipelagoState *s,
> +                                        uint64_t bufidx,
> +                                        size_t count,
> +                                        off_t offset,
> +                                        ArchipelagoAIOCB *aio_cb,
> +                                        ArchipelagoSegmentedRequest *segreq,
> +                                        int op)
> +{
> +    int ret, targetlen;
> +    char *target;
> +    void *data = NULL;
> +    struct xseg_request *req;
> +    AIORequestData *reqdata = g_malloc(sizeof(AIORequestData));
> +
> +    targetlen = strlen(s->volname);
> +    req = xseg_get_request(s->xseg, s->srcport, s->vportno, X_ALLOC);
> +    if (!req) {
> +        archipelagolog("Cannot get XSEG request\n");
> +        goto err_exit2;
> +    }
> +    ret = xseg_prep_request(s->xseg, req, targetlen, count);
> +    if (ret < 0) {
> +        archipelagolog("Cannot prepare XSEG request\n");
> +        goto err_exit;
> +    }
> +    target = xseg_get_target(s->xseg, req);
> +    if (!target) {
> +        archipelagolog("Cannot get XSEG target\n");
> +        goto err_exit;
> +    }
> +    memcpy(target, s->volname, targetlen);
> +    req->size = count;
> +    req->offset = offset;
> +
> +    switch (op) {
> +    case ARCHIP_OP_READ:
> +        req->op = X_READ;
> +        break;
> +    case ARCHIP_OP_WRITE:
> +        req->op = X_WRITE;
> +        break;
> +    }
> +    reqdata->volname = s->volname;
> +    reqdata->offset = offset;
> +    reqdata->size = count;
> +    reqdata->bufidx = bufidx;
> +    reqdata->aio_cb = aio_cb;
> +    reqdata->segreq = segreq;
> +    reqdata->op = op;
> +
> +    xseg_set_req_data(s->xseg, req, reqdata);
> +    if (op == ARCHIP_OP_WRITE) {
> +        data = xseg_get_data(s->xseg, req);
> +        if (!data) {
> +            archipelagolog("Cannot get XSEG data\n");
> +            goto err_exit;
> +        }
> +        memcpy(data, aio_cb->buffer + bufidx, count);
> +    }
> +
> +    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
> +    if (p == NoPort) {
> +        archipelagolog("Could not submit XSEG request\n");
> +        goto err_exit;
> +    }
> +    xseg_signal(s->xseg, p);
> +    return 0;
> +
> +err_exit:
> +    g_free(reqdata);
> +    xseg_put_request(s->xseg, req, s->srcport);
> +    return -EIO;
> +err_exit2:
> +    g_free(reqdata);
> +    return -EIO;
> +}
> +
> +static int archipelago_aio_segmented_rw(BDRVArchipelagoState *s,
> +                                        size_t count,
> +                                        off_t offset,
> +                                        ArchipelagoAIOCB *aio_cb,
> +                                        int op)
> +{
> +    int i, ret, segments_nr, last_segment_size;
> +    ArchipelagoSegmentedRequest *segreq;
> +
> +    segreq = g_malloc(sizeof(ArchipelagoSegmentedRequest));
> +
> +    if (op == ARCHIP_OP_FLUSH) {
> +        segments_nr = 1;
> +        segreq->ref = segments_nr;
> +        segreq->total = count;
> +        segreq->count = 0;
> +        segreq->failed = 0;
> +        ret = __archipelago_submit_request(s, 0, count, offset, aio_cb,
> +                                           segreq, ARCHIP_OP_WRITE);
> +        if (ret < 0) {
> +            goto err_exit;
> +        }
> +        return 0;
> +    }
> +
> +    segments_nr = (int)(count / MAX_REQUEST_SIZE) + \
> +                  ((count % MAX_REQUEST_SIZE) ? 1 : 0);
> +    last_segment_size = (int)(count % MAX_REQUEST_SIZE);
> +
> +    segreq->ref = segments_nr;
> +    segreq->total = count;
> +    segreq->count = 0;
> +    segreq->failed = 0;
> +
> +    for (i = 0; i < segments_nr - 1; i++) {
> +        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
> +                                           MAX_REQUEST_SIZE,
> +                                           offset + i * MAX_REQUEST_SIZE,
> +                                           aio_cb, segreq, op);
> +
> +        if (ret < 0) {
> +            goto err_exit;
> +        }
> +    }
> +
> +    if ((segments_nr > 1) && last_segment_size) {
> +        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
> +                                           last_segment_size,
> +                                           offset + i * MAX_REQUEST_SIZE,
> +                                           aio_cb, segreq, op);
> +    } else if ((segments_nr > 1) && !last_segment_size) {
> +        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
> +                                           MAX_REQUEST_SIZE,
> +                                           offset + i * MAX_REQUEST_SIZE,
> +                                           aio_cb, segreq, op);
> +    } else if (segments_nr == 1) {
> +            ret = __archipelago_submit_request(s, 0, count, offset, aio_cb,
> +                                               segreq, op);
> +    }
> +
> +    if (ret < 0) {
> +        goto err_exit;
> +    }
> +
> +    return 0;
> +
> +err_exit:
> +    __sync_add_and_fetch(&segreq->failed, 1);
> +    if (segments_nr == 1) {
> +        if (__sync_add_and_fetch(&segreq->ref, -1) == 0) {
> +            g_free(segreq);
> +        }
> +    } else {
> +        if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i)) == 0) {
> +            g_free(segreq);
> +        }
> +    }

Don't we run the risk of leaking segreq here?  The other place this is
freed is in xseg_request_handler(), but could we run into a race
condition where 's->stopping' is true, or even xseg_receive() just does not
return a request? 

> +
> +    return ret;
> +}
> +
> +static BlockDriverAIOCB *qemu_archipelago_aio_rw(BlockDriverState *bs,
> +                                                 int64_t sector_num,
> +                                                 QEMUIOVector *qiov,
> +                                                 int nb_sectors,
> +                                                 BlockDriverCompletionFunc *cb,
> +                                                 void *opaque,
> +                                                 int op)
> +{
> +    ArchipelagoAIOCB *aio_cb;
> +    BDRVArchipelagoState *s = bs->opaque;
> +    int64_t size, off;
> +    int ret;
> +
> +    aio_cb = qemu_aio_get(&archipelago_aiocb_info, bs, cb, opaque);
> +    aio_cb->cmd = op;
> +    aio_cb->qiov = qiov;
> +
> +    if (op != ARCHIP_OP_FLUSH) {
> +        aio_cb->buffer = qemu_blockalign(bs, qiov->size);
> +    } else {
> +        aio_cb->buffer = NULL;
> +    }
> +
> +    aio_cb->ret = 0;
> +    aio_cb->s = s;
> +    aio_cb->cancelled = false;
> +    aio_cb->status = -EINPROGRESS;
> +
> +    if (op == ARCHIP_OP_WRITE) {
> +        qemu_iovec_to_buf(aio_cb->qiov, 0, aio_cb->buffer, qiov->size);
> +    }
> +
> +    off = sector_num * BDRV_SECTOR_SIZE;
> +    size = nb_sectors * BDRV_SECTOR_SIZE;
> +    aio_cb->size = size;
> +
> +    ret = archipelago_aio_segmented_rw(s, size, off,
> +                                       aio_cb, op);
> +    if (ret < 0) {
> +        goto err_exit;
> +    }
> +    return &aio_cb->common;
> +
> +err_exit:
> +    error_report("qemu_archipelago_aio_rw(): I/O Error\n");
> +    qemu_vfree(aio_cb->buffer);
> +    qemu_aio_release(aio_cb);
> +    return NULL;
> +}
> +
> +static BlockDriverAIOCB *qemu_archipelago_aio_readv(BlockDriverState *bs,
> +        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> +        BlockDriverCompletionFunc *cb, void *opaque)
> +{
> +    return qemu_archipelago_aio_rw(bs, sector_num, qiov, nb_sectors, cb,
> +                                   opaque, ARCHIP_OP_READ);
> +}
> +
> +static BlockDriverAIOCB *qemu_archipelago_aio_writev(BlockDriverState *bs,
> +        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> +        BlockDriverCompletionFunc *cb, void *opaque)
> +{
> +    return qemu_archipelago_aio_rw(bs, sector_num, qiov, nb_sectors, cb,
> +                                   opaque, ARCHIP_OP_WRITE);
> +}
> +
> +static int64_t archipelago_volume_info(BDRVArchipelagoState *s)
> +{
> +    uint64_t size;
> +    int ret, targetlen;
> +    struct xseg_request *req;
> +    struct xseg_reply_info *xinfo;
> +    AIORequestData *reqdata = g_malloc(sizeof(AIORequestData));
> +
> +    const char *volname = s->volname;
> +    targetlen = strlen(volname);
> +    req = xseg_get_request(s->xseg, s->srcport, s->mportno, X_ALLOC);
> +    if (!req) {
> +        archipelagolog("Cannot get XSEG request\n");
> +        goto err_exit2;
> +    }
> +    ret = xseg_prep_request(s->xseg, req, targetlen,
> +                            sizeof(struct xseg_reply_info));
> +    if (ret < 0) {
> +        archipelagolog("Cannot prepare XSEG request\n");
> +        goto err_exit;
> +    }
> +    char *target = xseg_get_target(s->xseg, req);
> +    if (!target) {
> +        archipelagolog("Cannot get XSEG target\n");
> +        goto err_exit;
> +    }
> +    memcpy(target, volname, targetlen);
> +    req->size = req->datalen;
> +    req->offset = 0;
> +    req->op = X_INFO;
> +
> +    reqdata->op = ARCHIP_OP_VOLINFO;
> +    reqdata->volname = volname;
> +    xseg_set_req_data(s->xseg, req, reqdata);
> +
> +    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
> +    if (p == NoPort) {
> +        archipelagolog("Cannot submit XSEG request\n");
> +        goto err_exit;
> +    }
> +    xseg_signal(s->xseg, p);
> +    qemu_mutex_lock(&s->archip_mutex);
> +    while (!s->is_signaled) {
> +        qemu_cond_wait(&s->archip_cond, &s->archip_mutex);
> +    }
> +    s->is_signaled = false;
> +    qemu_mutex_unlock(&s->archip_mutex);
> +
> +    xinfo = (struct xseg_reply_info *) xseg_get_data(s->xseg, req);
> +    size = xinfo->size;
> +    xseg_put_request(s->xseg, req, s->srcport);
> +    g_free(reqdata);
> +    s->size = size;
> +    return size;
> +



> +err_exit:
> +    g_free(reqdata);
> +    xseg_put_request(s->xseg, req, s->srcport);
> +    return -1;
> +err_exit2:
> +    g_free(reqdata);
> +    return -1;
> +}

This could be simplified to just:

 err_exit:
     xseg_put_request(s->xseg, req, s->srcport);
 err_exit2:
     g_free(reqdata);
     return -1;
 }

Maybe it'd also be best to return -EIO (or other meaningful error
value) instead of just -1, as this value gets passed along to
.bdrv_getlength().

> +
> +static int64_t qemu_archipelago_getlength(BlockDriverState *bs)
> +{
> +    int64_t ret;
> +    BDRVArchipelagoState *s = bs->opaque;
> +
> +    ret = archipelago_volume_info(s);

(This is where I am talking about an error value such as -EIO may be
better)

> +    return ret;
> +}
> +
> +static BlockDriverAIOCB *qemu_archipelago_aio_flush(BlockDriverState *bs,
> +        BlockDriverCompletionFunc *cb, void *opaque)
> +{
> +    return qemu_archipelago_aio_rw(bs, 0, NULL, 0, cb, opaque,
> +                                   ARCHIP_OP_FLUSH);
> +}
> +
> +static BlockDriver bdrv_archipelago = {
> +    .format_name         = "archipelago",
> +    .protocol_name       = "archipelago",
> +    .instance_size       = sizeof(BDRVArchipelagoState),
> +    .bdrv_file_open      = qemu_archipelago_open,
> +    .bdrv_close          = qemu_archipelago_close,
> +    .bdrv_getlength      = qemu_archipelago_getlength,
> +    .bdrv_aio_readv      = qemu_archipelago_aio_readv,
> +    .bdrv_aio_writev     = qemu_archipelago_aio_writev,
> +    .bdrv_aio_flush      = qemu_archipelago_aio_flush,
> +    .bdrv_has_zero_init  = bdrv_has_zero_init_1,
> +};
> +
> +static void bdrv_archipelago_init(void)
> +{
> +    bdrv_register(&bdrv_archipelago);
> +}
> +
> +block_init(bdrv_archipelago_init);
> diff --git a/configure b/configure
> index 7102964..e4acd9c 100755
> --- a/configure
> +++ b/configure
> @@ -326,6 +326,7 @@ seccomp=""
>  glusterfs=""
>  glusterfs_discard="no"
>  glusterfs_zerofill="no"
> +archipelago=""
>  virtio_blk_data_plane=""
>  gtk=""
>  gtkabi=""
> @@ -1087,6 +1088,10 @@ for opt do
>    ;;
>    --enable-glusterfs) glusterfs="yes"
>    ;;
> +  --disable-archipelago) archipelago="no"
> +  ;;
> +  --enable-archipelago) archipelago="yes"
> +  ;;
>    --disable-virtio-blk-data-plane) virtio_blk_data_plane="no"
>    ;;
>    --enable-virtio-blk-data-plane) virtio_blk_data_plane="yes"
> @@ -1382,6 +1387,8 @@ Advanced options (experts only):
>    --enable-coroutine-pool  enable coroutine freelist (better performance)
>    --enable-glusterfs       enable GlusterFS backend
>    --disable-glusterfs      disable GlusterFS backend
> +  --enable-archipelago     enable Archipelago backend
> +  --disable-archipelago    disable Archipelago backend
>    --enable-gcov            enable test coverage analysis with gcov
>    --gcov=GCOV              use specified gcov [$gcov_tool]
>    --disable-tpm            disable TPM support
> @@ -3051,6 +3058,33 @@ EOF
>    fi
>  fi
>  
> +
> +##########################################
> +# archipelago probe
> +if test "$archipelago" != "no" ; then
> +    cat > $TMPC <<EOF
> +#include <stdio.h>
> +#include <xseg/xseg.h>
> +#include <xseg/protocol.h>
> +int main(void) {
> +    xseg_initialize();
> +    return 0;
> +}
> +EOF
> +    archipelago_libs=-lxseg
> +    if compile_prog "" "$archipelago_libs"; then
> +        archipelago="yes"
> +        libs_tools="$archipelago_libs $libs_tools"
> +        libs_softmmu="$archipelago_libs $libs_softmmu"
> +    else
> +      if test "$archipelago" = "yes" ; then
> +        feature_not_found "Archipelago backend support" "Install libxseg devel"
> +      fi
> +      archipelago="no"
> +    fi
> +fi
> +
> +
>  ##########################################
>  # glusterfs probe
>  if test "$glusterfs" != "no" ; then
> @@ -4230,6 +4264,7 @@ echo "seccomp support   $seccomp"
>  echo "coroutine backend $coroutine"
>  echo "coroutine pool    $coroutine_pool"
>  echo "GlusterFS support $glusterfs"
> +echo "Archipelago support $archipelago"
>  echo "virtio-blk-data-plane $virtio_blk_data_plane"
>  echo "gcov              $gcov_tool"
>  echo "gcov enabled      $gcov"
> @@ -4665,6 +4700,11 @@ if test "$glusterfs_zerofill" = "yes" ; then
>    echo "CONFIG_GLUSTERFS_ZEROFILL=y" >> $config_host_mak
>  fi
>  
> +if test "$archipelago" = "yes" ; then
> +  echo "CONFIG_ARCHIPELAGO=m" >> $config_host_mak
> +  echo "ARCHIPELAGO_LIBS=$archipelago_libs" >> $config_host_mak
> +fi
> +
>  if test "$libssh2" = "yes" ; then
>    echo "CONFIG_LIBSSH2=m" >> $config_host_mak
>    echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
> -- 
> 1.7.10.4
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/5] block: Support Archipelago as a QEMU block backend
  2014-07-10  0:23   ` Jeff Cody
@ 2014-07-10 10:04     ` Chrysostomos Nanakos
  2014-07-10 14:02       ` Chrysostomos Nanakos
  2014-07-22 12:40       ` Stefan Hajnoczi
  0 siblings, 2 replies; 22+ messages in thread
From: Chrysostomos Nanakos @ 2014-07-10 10:04 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha

On 07/10/2014 03:23 AM, Jeff Cody wrote:
> On Fri, Jun 27, 2014 at 11:24:08AM +0300, Chrysostomos Nanakos wrote:
>> VM Image on Archipelago volume is specified like this:
>>
>> file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
>> file.vport=<vlmcd_port>][,file.segment=<segment_name>]]
>>
>> 'archipelago' is the protocol.
>>
>> 'mport' is the port number on which mapperd is listening. This is optional
>> and if not specified, QEMU will make Archipelago to use the default port.
>>
>> 'vport' is the port number on which vlmcd is listening. This is optional
>> and if not specified, QEMU will make Archipelago to use the default port.
>>
>> 'segment' is the name of the shared memory segment Archipelago stack is using.
>> This is optional and if not specified, QEMU will make Archipelago to use the
>> default value, 'archipelago'.
>>
>> Examples:
>>
>> file.driver=archipelago,file.volume=my_vm_volume
>> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
>> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
>> file.vport=1234
>> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
>> file.vport=1234,file.segment=my_segment
>>
>> Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
> This is just a superficial review, because I don't have a good idea of
> what archipelago or libxseg really does (I didn't even compile it or
> these patches).  But I scanned through this patch, and found a few
> things, and had a few questions.

No worries, every review is more than welcome.

>
>> ---
>>   MAINTAINERS         |    6 +
>>   block/Makefile.objs |    2 +
>>   block/archipelago.c |  819 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   configure           |   40 +++
>>   4 files changed, 867 insertions(+)
>>   create mode 100644 block/archipelago.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 9b93edd..58ef1e3 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -999,3 +999,9 @@ SSH
>>   M: Richard W.M. Jones <rjones@redhat.com>
>>   S: Supported
>>   F: block/ssh.c
>> +
>> +ARCHIPELAGO
>> +M: Chrysostomos Nanakos <cnanakos@grnet.gr>
>> +M: Chrysostomos Nanakos <chris@include.gr>
>> +S: Maintained
>> +F: block/archipelago.c
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index fd88c03..858d2b3 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -17,6 +17,7 @@ block-obj-$(CONFIG_LIBNFS) += nfs.o
>>   block-obj-$(CONFIG_CURL) += curl.o
>>   block-obj-$(CONFIG_RBD) += rbd.o
>>   block-obj-$(CONFIG_GLUSTERFS) += gluster.o
>> +block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
>>   block-obj-$(CONFIG_LIBSSH2) += ssh.o
>>   endif
>>   
>> @@ -35,5 +36,6 @@ gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
>>   gluster.o-libs     := $(GLUSTERFS_LIBS)
>>   ssh.o-cflags       := $(LIBSSH2_CFLAGS)
>>   ssh.o-libs         := $(LIBSSH2_LIBS)
>> +archipelago.o-libs := $(ARCHIPELAGO_LIBS)
>>   qcow.o-libs        := -lz
>>   linux-aio.o-libs   := -laio
>> diff --git a/block/archipelago.c b/block/archipelago.c
>> new file mode 100644
>> index 0000000..c56826a
>> --- /dev/null
>> +++ b/block/archipelago.c
>> @@ -0,0 +1,819 @@
>> +/*
>> + * QEMU Block driver for Archipelago
>> + *
>> + * Copyright 2014 GRNET S.A. All rights reserved.
>> + *
>> + * Redistribution and use in source and binary forms, with or
>> + * without modification, are permitted provided that the following
>> + * conditions are met:
>> + *
>> + *   1. Redistributions of source code must retain the above
>> + *      copyright notice, this list of conditions and the following
>> + *      disclaimer.
>> + *   2. Redistributions in binary form must reproduce the above
>> + *      copyright notice, this list of conditions and the following
>> + *      disclaimer in the documentation and/or other materials
>> + *      provided with the distribution.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY GRNET S.A. ``AS IS'' AND ANY EXPRESS
>> + * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
>> + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
>> + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GRNET S.A OR
>> + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
>> + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
>> + * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
>> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
>> + * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>> + * POSSIBILITY OF SUCH DAMAGE.
>> + *
>> + * The views and conclusions contained in the software and
>> + * documentation are those of the authors and should not be
>> + * interpreted as representing official policies, either expressed
>> + * or implied, of GRNET S.A.
>> + */
>> +
>> +/*
>> +* VM Image on Archipelago volume is specified like this:
>> +*
>> +* file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
>> +* file.vport=<vlmcd_port>][,file.segment=<segment_name>]]
>> +*
>> +* 'archipelago' is the protocol.
>> +*
>> +* 'mport' is the port number on which mapperd is listening. This is optional
>> +* and if not specified, QEMU will make Archipelago to use the default port.
>> +*
>> +* 'vport' is the port number on which vlmcd is listening. This is optional
>> +* and if not specified, QEMU will make Archipelago to use the default port.
>> +*
>> +* 'segment' is the name of the shared memory segment Archipelago stack is using.
>> +* This is optional and if not specified, QEMU will make Archipelago to use the
>> +* default value, 'archipelago'.
>> +*
>> +* Examples:
>> +*
>> +* file.driver=archipelago,file.volume=my_vm_volume
>> +* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
>> +* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
>> +* file.vport=1234
>> +* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
>> +* file.vport=1234,file.segment=my_segment
>> +*/
>> +
>> +#include "block/block_int.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/thread.h"
>> +#include "qapi/qmp/qint.h"
>> +#include "qapi/qmp/qstring.h"
>> +#include "qapi/qmp/qjson.h"
>> +
>> +#include <inttypes.h>
>> +#include <xseg/xseg.h>
>> +#include <xseg/protocol.h>
>> +
>> +#define ARCHIP_FD_READ      0
>> +#define ARCHIP_FD_WRITE     1
>> +#define MAX_REQUEST_SIZE    524288
>> +
>> +#define ARCHIPELAGO_OPT_VOLUME      "volume"
>> +#define ARCHIPELAGO_OPT_SEGMENT     "segment"
>> +#define ARCHIPELAGO_OPT_MPORT       "mport"
>> +#define ARCHIPELAGO_OPT_VPORT       "vport"
>> +
>> +#define archipelagolog(fmt, ...) \
>> +    do {                         \
>> +        fprintf(stderr, "archipelago\t%-24s: " fmt, __func__, ##__VA_ARGS__); \
>> +    } while (0)
>> +
>> +typedef enum {
>> +    ARCHIP_OP_READ,
>> +    ARCHIP_OP_WRITE,
>> +    ARCHIP_OP_FLUSH,
>> +    ARCHIP_OP_VOLINFO,
>> +} ARCHIPCmd;
>> +
>> +typedef struct ArchipelagoAIOCB {
>> +    BlockDriverAIOCB common;
>> +    QEMUBH *bh;
>> +    struct BDRVArchipelagoState *s;
>> +    QEMUIOVector *qiov;
>> +    void *buffer;
>> +    ARCHIPCmd cmd;
>> +    bool cancelled;
>> +    int status;
>> +    int64_t size;
>> +    int64_t ret;
>> +} ArchipelagoAIOCB;
>> +
>> +typedef struct BDRVArchipelagoState {
>> +    ArchipelagoAIOCB *event_acb;
>> +    char *volname;
>> +    char *segment_name;
>> +    uint64_t size;
>> +    /* Archipelago specific */
>> +    struct xseg *xseg;
> I assume s->xseg is allocated in xseg_join() - is it ever freed?  In
> _close(), there is a final call to xseg_leave(s->xseg), but from what
> I found in libxseg, it does not appear to be freed:
> https://github.com/cnanakos/libxseg/blob/develop/src/xseg.c#L975
>
> Is it up to libxseg to free xseg, or the caller?

libxseg allocated xseg and should free it also. I will fix that in 
libxseg. Thanks!

>> +    struct xseg_port *port;
>> +    xport srcport;
>> +    xport sport;
>> +    xport mportno;
>> +    xport vportno;
>> +    QemuMutex archip_mutex;
>> +    QemuCond archip_cond;
>> +    bool is_signaled;
>> +    /* Request handler specific */
>> +    QemuThread request_th;
>> +    QemuCond request_cond;
>> +    QemuMutex request_mutex;
>> +    bool th_is_signaled;
>> +    bool stopping;
>> +} BDRVArchipelagoState;
>> +
>> +typedef struct ArchipelagoSegmentedRequest {
>> +    size_t count;
>> +    size_t total;
>> +    int ref;
>> +    int failed;
>> +} ArchipelagoSegmentedRequest;
>> +
>> +typedef struct AIORequestData {
>> +    const char *volname;
>> +    off_t offset;
>> +    size_t size;
>> +    uint64_t bufidx;
>> +    int ret;
>> +    int op;
>> +    ArchipelagoAIOCB *aio_cb;
>> +    ArchipelagoSegmentedRequest *segreq;
>> +} AIORequestData;
>> +
>> +static void qemu_archipelago_complete_aio(void *opaque);
>> +
>> +static void init_local_signal(struct xseg *xseg, xport sport, xport srcport)
>> +{
>> +    if (xseg && (sport != srcport)) {
>> +        xseg_init_local_signal(xseg, srcport);
>> +        sport = srcport;
>> +    }
>> +}
>> +
>> +static void archipelago_finish_aiocb(AIORequestData *reqdata)
>> +{
>> +    if (reqdata->aio_cb->ret != reqdata->segreq->total) {
>> +        reqdata->aio_cb->ret = -EIO;
>> +    } else if (reqdata->aio_cb->ret == reqdata->segreq->total) {
>> +        reqdata->aio_cb->ret = 0;
>> +    }
>> +    reqdata->aio_cb->bh = aio_bh_new(
>> +                        bdrv_get_aio_context(reqdata->aio_cb->common.bs),
>> +                        qemu_archipelago_complete_aio, reqdata
>> +                        );
>> +    qemu_bh_schedule(reqdata->aio_cb->bh);
>> +}
>> +
>> +static int wait_reply(struct xseg *xseg, xport srcport, struct xseg_port *port,
>> +                      struct xseg_request *expected_req)
>> +{
>> +    struct xseg_request *req;
>> +    xseg_prepare_wait(xseg, srcport);
>> +    void *psd = xseg_get_signal_desc(xseg, port);
>> +    while (1) {
>> +        req = xseg_receive(xseg, srcport, 0);
>> +        if (req) {
>> +            if (req != expected_req) {
>> +                archipelagolog("Unknown received request\n");
>> +                xseg_put_request(xseg, req, srcport);
>> +            } else if (!(req->state & XS_SERVED)) {
>> +                return -1;
>> +            } else {
>> +                break;
>> +            }
>> +        }
>> +        xseg_wait_signal(xseg, psd, 100000UL);
>> +    }
>> +    xseg_cancel_wait(xseg, srcport);
>> +    return 0;
>> +}
>> +
>> +static void xseg_request_handler(void *state)
>> +{
>> +    BDRVArchipelagoState *s = (BDRVArchipelagoState *) state;
>> +    void *psd = xseg_get_signal_desc(s->xseg, s->port);
>> +    qemu_mutex_lock(&s->request_mutex);
>> +
>> +    while (!s->stopping) {
>> +        struct xseg_request *req;
>> +        void *data;
>> +        xseg_prepare_wait(s->xseg, s->srcport);
>> +        req = xseg_receive(s->xseg, s->srcport, 0);
> Is this a blocking call?  If so, is there a timeout, and if not, what
> scenarios (if any) could cause us to wait here indefinitely?

xseg_receive won't block until a request is received but It will wait 
until it takes
the volume lock, check if there is or not a request and will return 
after giving back the volume lock.
If it can't take the lock, it could wait indefinitely, there is no 
timeout for that at the moment
but I could easily add this kind of functionality.

On the other hand xseg_receive won't wait for the volume lock if the 
caller sets X_NONBLOCK.

After that, I believe I should set here and in wait_reply() also, the 
X_NONBLOCK flag.



>
>> +        if (req) {
>> +            AIORequestData *reqdata;
>> +            ArchipelagoSegmentedRequest *segreq;
>> +            xseg_get_req_data(s->xseg, req, (void **)&reqdata);
>> +
>> +            switch (reqdata->op) {
>> +            case ARCHIP_OP_READ:
>> +                    data = xseg_get_data(s->xseg, req);
>> +                    segreq = reqdata->segreq;
>> +                    segreq->count += req->serviced;
>> +
>> +                    qemu_iovec_from_buf(reqdata->aio_cb->qiov, reqdata->bufidx,
>> +                                        data,
>> +                                        req->serviced);
>> +
>> +                    xseg_put_request(s->xseg, req, s->srcport);
>> +
>> +                    if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {
>> +                        if (!segreq->failed) {
>> +                            reqdata->aio_cb->ret = segreq->count;
>> +                            archipelago_finish_aiocb(reqdata);
>> +                            g_free(segreq);
>> +                        } else {
>> +                            g_free(segreq);
>> +                            g_free(reqdata);
>> +                        }
>> +                    } else {
>> +                        g_free(reqdata);
>> +                    }
>> +                    break;
>> +            case ARCHIP_OP_WRITE:
>> +                    segreq = reqdata->segreq;
>> +                    segreq->count += req->serviced;
>> +                    xseg_put_request(s->xseg, req, s->srcport);
>> +
>> +                    if ((__sync_add_and_fetch(&segreq->ref, -1)) == 0) {
>> +                        if (!segreq->failed) {
>> +                            reqdata->aio_cb->ret = segreq->count;
>> +                            archipelago_finish_aiocb(reqdata);
>> +                            g_free(segreq);
>> +                        } else {
>> +                            g_free(segreq);
>> +                            g_free(reqdata);
>> +                        }
>> +                    } else {
>> +                        g_free(reqdata);
>> +
> This (OP_WRITE / OP_READ) is where I am worried that we leak in error
> cases, and a _close() won't clean it up (see later comments).

>> +                    break;
>> +            case ARCHIP_OP_VOLINFO:
>> +                    s->is_signaled = true;
>> +                    qemu_cond_signal(&s->archip_cond);
>> +                    break;
>> +            }
>> +        } else {
>> +            xseg_wait_signal(s->xseg, psd, 100000UL);
>> +        }
>> +        xseg_cancel_wait(s->xseg, s->srcport);
>
>
>> +    }
>> +
>> +    s->th_is_signaled = true;
>> +    qemu_cond_signal(&s->request_cond);
>> +    qemu_mutex_unlock(&s->request_mutex);
>> +    qemu_thread_exit(NULL);
>> +}
>> +
>> +static int qemu_archipelago_xseg_init(BDRVArchipelagoState *s)
>> +{
>> +    if (xseg_initialize()) {
>> +        archipelagolog("Cannot initialize XSEG\n");
>> +        goto err_exit;
>> +    }
>> +
>> +    s->xseg = xseg_join((char *)"posix", s->segment_name,
>> +                        (char *)"posixfd", NULL);
>> +    if (!s->xseg) {
>> +        archipelagolog("Cannot join XSEG shared memory segment\n");
>> +        goto err_exit;
>> +    }
>> +    s->port = xseg_bind_dynport(s->xseg);
>> +    s->srcport = s->port->portno;
>> +    init_local_signal(s->xseg, s->sport, s->srcport);
>> +    return 0;
>> +
>> +err_exit:
>> +    return -1;
>> +}
>> +
>> +static int qemu_archipelago_init(BDRVArchipelagoState *s)
>> +{
>> +    int ret;
>> +
>> +    ret = qemu_archipelago_xseg_init(s);
>> +    if (ret < 0) {
>> +        error_report("Cannot initialize XSEG. Aborting...\n");
>> +        goto err_exit;
>> +    }
>> +
>> +    qemu_cond_init(&s->archip_cond);
>> +    qemu_mutex_init(&s->archip_mutex);
>> +    qemu_cond_init(&s->request_cond);
>> +    qemu_mutex_init(&s->request_mutex);
>> +    s->th_is_signaled = false;
>> +    qemu_thread_create(&s->request_th, "xseg_io_th",
>> +                       (void *) xseg_request_handler,
>> +                       (void *) s, QEMU_THREAD_JOINABLE);
>> +
>> +err_exit:
>> +    return ret;
>> +}
>> +
>> +static void qemu_archipelago_complete_aio(void *opaque)
>> +{
>> +    AIORequestData *reqdata = (AIORequestData *) opaque;
>> +    ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) reqdata->aio_cb;
>> +
>> +    qemu_bh_delete(aio_cb->bh);
>> +    qemu_vfree(aio_cb->buffer);
>> +    aio_cb->common.cb(aio_cb->common.opaque, aio_cb->ret);
>> +    aio_cb->status = 0;
>> +
>> +    if (!aio_cb->cancelled) {
>> +        qemu_aio_release(aio_cb);
>> +    }
>> +    g_free(reqdata);
>> +}
>> +
>> +static QemuOptsList archipelago_runtime_opts = {
>> +    .name = "archipelago",
>> +    .head = QTAILQ_HEAD_INITIALIZER(archipelago_runtime_opts.head),
>> +    .desc = {
>> +        {
>> +            .name = ARCHIPELAGO_OPT_VOLUME,
>> +            .type = QEMU_OPT_STRING,
>> +            .help = "Name of the volume image",
>> +        },
>> +        {
>> +            .name = ARCHIPELAGO_OPT_SEGMENT,
>> +            .type = QEMU_OPT_STRING,
>> +            .help = "Name of the Archipelago shared memory segment",
>> +        },
>> +        {
>> +            .name = ARCHIPELAGO_OPT_MPORT,
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "Archipelago mapperd port number"
>> +        },
>> +        {
>> +            .name = ARCHIPELAGO_OPT_VPORT,
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "Archipelago vlmcd port number"
>> +
>> +        },
>> +        { /* end of list */ }
>> +    },
>> +};
>> +
>> +static int qemu_archipelago_open(BlockDriverState *bs,
>> +                                 QDict *options,
>> +                                 int bdrv_flags,
>> +                                 Error **errp)
>> +{
>> +    int ret = 0;
>> +    const char *volume, *segment_name;
>> +    QemuOpts *opts;
>> +    Error *local_err = NULL;
>> +    BDRVArchipelagoState *s = bs->opaque;
>> +
>> +    opts = qemu_opts_create(&archipelago_runtime_opts, NULL, 0, &error_abort);
>> +    qemu_opts_absorb_qdict(opts, options, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        qemu_opts_del(opts);
>> +        return -EINVAL;
>> +    }
>> +
>> +    s->mportno = qemu_opt_get_number(opts, ARCHIPELAGO_OPT_MPORT, 1001);
>> +    s->vportno = qemu_opt_get_number(opts, ARCHIPELAGO_OPT_VPORT, 501);
>> +
>> +    segment_name = qemu_opt_get(opts, ARCHIPELAGO_OPT_SEGMENT);
>> +    if (segment_name == NULL) {
>> +        s->segment_name = g_strdup("archipelago");
>> +    } else {
>> +        s->segment_name = g_strdup(segment_name);
>> +    }
>> +
>> +    volume = qemu_opt_get(opts, ARCHIPELAGO_OPT_VOLUME);
>> +    if (volume == NULL) {
>> +        error_setg(errp, "archipelago block driver requires the 'volume'"
>> +                   " option");
>> +        qemu_opts_del(opts);
>> +        return -EINVAL;
> s->segment_name is leaked here.
>
> You already have an exit label (err_exit) that cleans everything up,
> and g_free() is NULL safe (and bs->opaque is zero-initialized).
>
> You should be able to just set ret, and 'goto err_exit' in each error
> instance in qemu_archipelago_open() - this also gets rid of the extra
> qemu_opts_del() calls.

Missed it. Thanks for that!
>
>> +    }
>> +    s->volname = g_strdup(volume);
>> +
>> +    /* Initialize XSEG, join shared memory segment */
>> +    ret = qemu_archipelago_init(s);
>> +    if (ret < 0) {
>> +        error_setg(errp, "cannot initialize XSEG and join shared "
>> +                   "memory segment");
>> +        goto err_exit;
>> +    }
>> +
>> +    qemu_opts_del(opts);
>> +    return 0;
>> +
>> +err_exit:
>> +    g_free(s->volname);
>> +    g_free(s->segment_name);
>> +    qemu_opts_del(opts);
>> +    return ret;
>> +}
>> +
>> +static void qemu_archipelago_close(BlockDriverState *bs)
>> +{
>> +    int r, targetlen;
>> +    char *target;
>> +    struct xseg_request *req;
>> +    BDRVArchipelagoState *s = bs->opaque;
>> +
>> +    s->stopping = true;
>> +
>> +    qemu_mutex_lock(&s->request_mutex);
>> +    while (!s->th_is_signaled) {
>> +        qemu_cond_wait(&s->request_cond,
>> +                       &s->request_mutex);
>> +    }
>> +    qemu_mutex_unlock(&s->request_mutex);
>> +    qemu_thread_join(&s->request_th);
>> +    qemu_cond_destroy(&s->request_cond);
>> +    qemu_mutex_destroy(&s->request_mutex);
>> +
>> +    qemu_cond_destroy(&s->archip_cond);
>> +    qemu_mutex_destroy(&s->archip_mutex);
>> +
>> +    targetlen = strlen(s->volname);
> Should this be strlen(s->volname) + 1, to account for the '\0'?  Or
> does xseg_prep_request() just need the length of the non-null
> terminated string?

You are right, xseg_prep_request() needs the length of the non-null 
terminated string.

>
>> +    req = xseg_get_request(s->xseg, s->srcport, s->vportno, X_ALLOC);
>> +    if (!req) {
>> +        archipelagolog("Cannot get XSEG request\n");
>> +        goto err_exit;
>> +    }
>> +    r = xseg_prep_request(s->xseg, req, targetlen, 0);
>> +    if (r < 0) {
>> +        xseg_put_request(s->xseg, req, s->srcport);
> What does this do here, if xseg_prep_request() failed?  Is it
> essentially a cleanup function?

Yes this is a cleanup function.

>> +        archipelagolog("Cannot prepare XSEG close request\n");
>> +        goto err_exit;
>> +    }
>> +
>> +    target = xseg_get_target(s->xseg, req);
>> +    memcpy(target, s->volname, targetlen);
>> +    req->size = req->datalen;
>> +    req->offset = 0;
>> +    req->op = X_CLOSE;
>> +
>> +    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
>> +    if (p == NoPort) {
>> +        xseg_put_request(s->xseg, req, s->srcport);
>> +        archipelagolog("Cannot submit XSEG close request\n");
>> +        goto err_exit;
>> +    }
>> +
>> +    xseg_signal(s->xseg, p);
>> +    wait_reply(s->xseg, s->srcport, s->port, req);
> This is another spot I am wondering if we could get stuck on a
> blocking call that could potentially wait forever... is there a
> timeout here?

For the same reasons I explained in xseg_receive(). We will resolve this 
by setting X_NONBLOCK.

>
>> +
>> +    xseg_put_request(s->xseg, req, s->srcport);
>> +
>> +err_exit:
>> +    g_free(s->volname);
>> +    g_free(s->segment_name);
>> +    xseg_quit_local_signal(s->xseg, s->srcport);
>> +    xseg_leave_dynport(s->xseg, s->port);
>> +    xseg_leave(s->xseg);
>> +}
>> +
>> +static void qemu_archipelago_aio_cancel(BlockDriverAIOCB *blockacb)
>> +{
>> +    ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) blockacb;
>> +    aio_cb->cancelled = true;
>> +    while (aio_cb->status == -EINPROGRESS) {
>> +        qemu_aio_wait();
>> +    }
>> +    qemu_aio_release(aio_cb);
>> +}
>> +
>> +static const AIOCBInfo archipelago_aiocb_info = {
>> +    .aiocb_size = sizeof(ArchipelagoAIOCB),
>> +    .cancel = qemu_archipelago_aio_cancel,
>> +};
>> +
>> +static int __archipelago_submit_request(BDRVArchipelagoState *s,
>> +                                        uint64_t bufidx,
>> +                                        size_t count,
>> +                                        off_t offset,
>> +                                        ArchipelagoAIOCB *aio_cb,
>> +                                        ArchipelagoSegmentedRequest *segreq,
>> +                                        int op)
>> +{
>> +    int ret, targetlen;
>> +    char *target;
>> +    void *data = NULL;
>> +    struct xseg_request *req;
>> +    AIORequestData *reqdata = g_malloc(sizeof(AIORequestData));
>> +
>> +    targetlen = strlen(s->volname);
>> +    req = xseg_get_request(s->xseg, s->srcport, s->vportno, X_ALLOC);
>> +    if (!req) {
>> +        archipelagolog("Cannot get XSEG request\n");
>> +        goto err_exit2;
>> +    }
>> +    ret = xseg_prep_request(s->xseg, req, targetlen, count);
>> +    if (ret < 0) {
>> +        archipelagolog("Cannot prepare XSEG request\n");
>> +        goto err_exit;
>> +    }
>> +    target = xseg_get_target(s->xseg, req);
>> +    if (!target) {
>> +        archipelagolog("Cannot get XSEG target\n");
>> +        goto err_exit;
>> +    }
>> +    memcpy(target, s->volname, targetlen);
>> +    req->size = count;
>> +    req->offset = offset;
>> +
>> +    switch (op) {
>> +    case ARCHIP_OP_READ:
>> +        req->op = X_READ;
>> +        break;
>> +    case ARCHIP_OP_WRITE:
>> +        req->op = X_WRITE;
>> +        break;
>> +    }
>> +    reqdata->volname = s->volname;
>> +    reqdata->offset = offset;
>> +    reqdata->size = count;
>> +    reqdata->bufidx = bufidx;
>> +    reqdata->aio_cb = aio_cb;
>> +    reqdata->segreq = segreq;
>> +    reqdata->op = op;
>> +
>> +    xseg_set_req_data(s->xseg, req, reqdata);
>> +    if (op == ARCHIP_OP_WRITE) {
>> +        data = xseg_get_data(s->xseg, req);
>> +        if (!data) {
>> +            archipelagolog("Cannot get XSEG data\n");
>> +            goto err_exit;
>> +        }
>> +        memcpy(data, aio_cb->buffer + bufidx, count);
>> +    }
>> +
>> +    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
>> +    if (p == NoPort) {
>> +        archipelagolog("Could not submit XSEG request\n");
>> +        goto err_exit;
>> +    }
>> +    xseg_signal(s->xseg, p);
>> +    return 0;
>> +
>> +err_exit:
>> +    g_free(reqdata);
>> +    xseg_put_request(s->xseg, req, s->srcport);
>> +    return -EIO;
>> +err_exit2:
>> +    g_free(reqdata);
>> +    return -EIO;
>> +}
>> +
>> +static int archipelago_aio_segmented_rw(BDRVArchipelagoState *s,
>> +                                        size_t count,
>> +                                        off_t offset,
>> +                                        ArchipelagoAIOCB *aio_cb,
>> +                                        int op)
>> +{
>> +    int i, ret, segments_nr, last_segment_size;
>> +    ArchipelagoSegmentedRequest *segreq;
>> +
>> +    segreq = g_malloc(sizeof(ArchipelagoSegmentedRequest));
>> +
>> +    if (op == ARCHIP_OP_FLUSH) {
>> +        segments_nr = 1;
>> +        segreq->ref = segments_nr;
>> +        segreq->total = count;
>> +        segreq->count = 0;
>> +        segreq->failed = 0;
>> +        ret = __archipelago_submit_request(s, 0, count, offset, aio_cb,
>> +                                           segreq, ARCHIP_OP_WRITE);
>> +        if (ret < 0) {
>> +            goto err_exit;
>> +        }
>> +        return 0;
>> +    }
>> +
>> +    segments_nr = (int)(count / MAX_REQUEST_SIZE) + \
>> +                  ((count % MAX_REQUEST_SIZE) ? 1 : 0);
>> +    last_segment_size = (int)(count % MAX_REQUEST_SIZE);
>> +
>> +    segreq->ref = segments_nr;
>> +    segreq->total = count;
>> +    segreq->count = 0;
>> +    segreq->failed = 0;
>> +
>> +    for (i = 0; i < segments_nr - 1; i++) {
>> +        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
>> +                                           MAX_REQUEST_SIZE,
>> +                                           offset + i * MAX_REQUEST_SIZE,
>> +                                           aio_cb, segreq, op);
>> +
>> +        if (ret < 0) {
>> +            goto err_exit;
>> +        }
>> +    }
>> +
>> +    if ((segments_nr > 1) && last_segment_size) {
>> +        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
>> +                                           last_segment_size,
>> +                                           offset + i * MAX_REQUEST_SIZE,
>> +                                           aio_cb, segreq, op);
>> +    } else if ((segments_nr > 1) && !last_segment_size) {
>> +        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
>> +                                           MAX_REQUEST_SIZE,
>> +                                           offset + i * MAX_REQUEST_SIZE,
>> +                                           aio_cb, segreq, op);
>> +    } else if (segments_nr == 1) {
>> +            ret = __archipelago_submit_request(s, 0, count, offset, aio_cb,
>> +                                               segreq, op);
>> +    }
>> +
>> +    if (ret < 0) {
>> +        goto err_exit;
>> +    }
>> +
>> +    return 0;
>> +
>> +err_exit:
>> +    __sync_add_and_fetch(&segreq->failed, 1);
>> +    if (segments_nr == 1) {
>> +        if (__sync_add_and_fetch(&segreq->ref, -1) == 0) {
>> +            g_free(segreq);
>> +        }
>> +    } else {
>> +        if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i)) == 0) {
>> +            g_free(segreq);
>> +        }
>> +    }
> Don't we run the risk of leaking segreq here?  The other place this is
> freed is in xseg_request_handler(), but could we run into a race
> condition where 's->stopping' is true, or even xseg_receive() just does not
> return a request?

If 's->stopping' is true means that _close() has been invoked. How QEMU 
handles unserviced requests while in the meantime someone invokes 
_close()? Does it wait for the requests to finish and then exits? Or it 
exits silently without checking for pending requests?

If xseg_receive() does not return an already submitted request then the 
problem is located in Archipelago stack. Someone should check why the 
pending requests are not serviced and resolve the problem. The question 
here is the same as before, how QEMU handles pending requests while in 
the meantime invokes _close()?

Until all pending requests are serviced successfully or not, segreq 
allocations will remain and not freed. Another approach could have been 
a linked list that tracks all submitted requests and handle them 
accordingly on _close().

Suggestions here are more than welcome!


>
>> +
>> +    return ret;
>> +}
>> +
>> +static BlockDriverAIOCB *qemu_archipelago_aio_rw(BlockDriverState *bs,
>> +                                                 int64_t sector_num,
>> +                                                 QEMUIOVector *qiov,
>> +                                                 int nb_sectors,
>> +                                                 BlockDriverCompletionFunc *cb,
>> +                                                 void *opaque,
>> +                                                 int op)
>> +{
>> +    ArchipelagoAIOCB *aio_cb;
>> +    BDRVArchipelagoState *s = bs->opaque;
>> +    int64_t size, off;
>> +    int ret;
>> +
>> +    aio_cb = qemu_aio_get(&archipelago_aiocb_info, bs, cb, opaque);
>> +    aio_cb->cmd = op;
>> +    aio_cb->qiov = qiov;
>> +
>> +    if (op != ARCHIP_OP_FLUSH) {
>> +        aio_cb->buffer = qemu_blockalign(bs, qiov->size);
>> +    } else {
>> +        aio_cb->buffer = NULL;
>> +    }
>> +
>> +    aio_cb->ret = 0;
>> +    aio_cb->s = s;
>> +    aio_cb->cancelled = false;
>> +    aio_cb->status = -EINPROGRESS;
>> +
>> +    if (op == ARCHIP_OP_WRITE) {
>> +        qemu_iovec_to_buf(aio_cb->qiov, 0, aio_cb->buffer, qiov->size);
>> +    }
>> +
>> +    off = sector_num * BDRV_SECTOR_SIZE;
>> +    size = nb_sectors * BDRV_SECTOR_SIZE;
>> +    aio_cb->size = size;
>> +
>> +    ret = archipelago_aio_segmented_rw(s, size, off,
>> +                                       aio_cb, op);
>> +    if (ret < 0) {
>> +        goto err_exit;
>> +    }
>> +    return &aio_cb->common;
>> +
>> +err_exit:
>> +    error_report("qemu_archipelago_aio_rw(): I/O Error\n");
>> +    qemu_vfree(aio_cb->buffer);
>> +    qemu_aio_release(aio_cb);
>> +    return NULL;
>> +}
>> +
>> +static BlockDriverAIOCB *qemu_archipelago_aio_readv(BlockDriverState *bs,
>> +        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>> +        BlockDriverCompletionFunc *cb, void *opaque)
>> +{
>> +    return qemu_archipelago_aio_rw(bs, sector_num, qiov, nb_sectors, cb,
>> +                                   opaque, ARCHIP_OP_READ);
>> +}
>> +
>> +static BlockDriverAIOCB *qemu_archipelago_aio_writev(BlockDriverState *bs,
>> +        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>> +        BlockDriverCompletionFunc *cb, void *opaque)
>> +{
>> +    return qemu_archipelago_aio_rw(bs, sector_num, qiov, nb_sectors, cb,
>> +                                   opaque, ARCHIP_OP_WRITE);
>> +}
>> +
>> +static int64_t archipelago_volume_info(BDRVArchipelagoState *s)
>> +{
>> +    uint64_t size;
>> +    int ret, targetlen;
>> +    struct xseg_request *req;
>> +    struct xseg_reply_info *xinfo;
>> +    AIORequestData *reqdata = g_malloc(sizeof(AIORequestData));
>> +
>> +    const char *volname = s->volname;
>> +    targetlen = strlen(volname);
>> +    req = xseg_get_request(s->xseg, s->srcport, s->mportno, X_ALLOC);
>> +    if (!req) {
>> +        archipelagolog("Cannot get XSEG request\n");
>> +        goto err_exit2;
>> +    }
>> +    ret = xseg_prep_request(s->xseg, req, targetlen,
>> +                            sizeof(struct xseg_reply_info));
>> +    if (ret < 0) {
>> +        archipelagolog("Cannot prepare XSEG request\n");
>> +        goto err_exit;
>> +    }
>> +    char *target = xseg_get_target(s->xseg, req);
>> +    if (!target) {
>> +        archipelagolog("Cannot get XSEG target\n");
>> +        goto err_exit;
>> +    }
>> +    memcpy(target, volname, targetlen);
>> +    req->size = req->datalen;
>> +    req->offset = 0;
>> +    req->op = X_INFO;
>> +
>> +    reqdata->op = ARCHIP_OP_VOLINFO;
>> +    reqdata->volname = volname;
>> +    xseg_set_req_data(s->xseg, req, reqdata);
>> +
>> +    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
>> +    if (p == NoPort) {
>> +        archipelagolog("Cannot submit XSEG request\n");
>> +        goto err_exit;
>> +    }
>> +    xseg_signal(s->xseg, p);
>> +    qemu_mutex_lock(&s->archip_mutex);
>> +    while (!s->is_signaled) {
>> +        qemu_cond_wait(&s->archip_cond, &s->archip_mutex);
>> +    }
>> +    s->is_signaled = false;
>> +    qemu_mutex_unlock(&s->archip_mutex);
>> +
>> +    xinfo = (struct xseg_reply_info *) xseg_get_data(s->xseg, req);
>> +    size = xinfo->size;
>> +    xseg_put_request(s->xseg, req, s->srcport);
>> +    g_free(reqdata);
>> +    s->size = size;
>> +    return size;
>> +
>
>
>> +err_exit:
>> +    g_free(reqdata);
>> +    xseg_put_request(s->xseg, req, s->srcport);
>> +    return -1;
>> +err_exit2:
>> +    g_free(reqdata);
>> +    return -1;
>> +}
> This could be simplified to just:
>
>   err_exit:
>       xseg_put_request(s->xseg, req, s->srcport);
>   err_exit2:
>       g_free(reqdata);
>       return -1;
>   }
>
> Maybe it'd also be best to return -EIO (or other meaningful error
> value) instead of just -1, as this value gets passed along to
> .bdrv_getlength().
>
>> +
>> +static int64_t qemu_archipelago_getlength(BlockDriverState *bs)
>> +{
>> +    int64_t ret;
>> +    BDRVArchipelagoState *s = bs->opaque;
>> +
>> +    ret = archipelago_volume_info(s);
> (This is where I am talking about an error value such as -EIO may be
> better)

Yes, it seems a lot better. Thanks!

>
>> +    return ret;
>> +}
>> +
>> +static BlockDriverAIOCB *qemu_archipelago_aio_flush(BlockDriverState *bs,
>> +        BlockDriverCompletionFunc *cb, void *opaque)
>> +{
>> +    return qemu_archipelago_aio_rw(bs, 0, NULL, 0, cb, opaque,
>> +                                   ARCHIP_OP_FLUSH);
>> +}
>> +
>> +static BlockDriver bdrv_archipelago = {
>> +    .format_name         = "archipelago",
>> +    .protocol_name       = "archipelago",
>> +    .instance_size       = sizeof(BDRVArchipelagoState),
>> +    .bdrv_file_open      = qemu_archipelago_open,
>> +    .bdrv_close          = qemu_archipelago_close,
>> +    .bdrv_getlength      = qemu_archipelago_getlength,
>> +    .bdrv_aio_readv      = qemu_archipelago_aio_readv,
>> +    .bdrv_aio_writev     = qemu_archipelago_aio_writev,
>> +    .bdrv_aio_flush      = qemu_archipelago_aio_flush,
>> +    .bdrv_has_zero_init  = bdrv_has_zero_init_1,
>> +};
>> +
>> +static void bdrv_archipelago_init(void)
>> +{
>> +    bdrv_register(&bdrv_archipelago);
>> +}
>> +
>> +block_init(bdrv_archipelago_init);
>> diff --git a/configure b/configure
>> index 7102964..e4acd9c 100755
>> --- a/configure
>> +++ b/configure
>> @@ -326,6 +326,7 @@ seccomp=""
>>   glusterfs=""
>>   glusterfs_discard="no"
>>   glusterfs_zerofill="no"
>> +archipelago=""
>>   virtio_blk_data_plane=""
>>   gtk=""
>>   gtkabi=""
>> @@ -1087,6 +1088,10 @@ for opt do
>>     ;;
>>     --enable-glusterfs) glusterfs="yes"
>>     ;;
>> +  --disable-archipelago) archipelago="no"
>> +  ;;
>> +  --enable-archipelago) archipelago="yes"
>> +  ;;
>>     --disable-virtio-blk-data-plane) virtio_blk_data_plane="no"
>>     ;;
>>     --enable-virtio-blk-data-plane) virtio_blk_data_plane="yes"
>> @@ -1382,6 +1387,8 @@ Advanced options (experts only):
>>     --enable-coroutine-pool  enable coroutine freelist (better performance)
>>     --enable-glusterfs       enable GlusterFS backend
>>     --disable-glusterfs      disable GlusterFS backend
>> +  --enable-archipelago     enable Archipelago backend
>> +  --disable-archipelago    disable Archipelago backend
>>     --enable-gcov            enable test coverage analysis with gcov
>>     --gcov=GCOV              use specified gcov [$gcov_tool]
>>     --disable-tpm            disable TPM support
>> @@ -3051,6 +3058,33 @@ EOF
>>     fi
>>   fi
>>   
>> +
>> +##########################################
>> +# archipelago probe
>> +if test "$archipelago" != "no" ; then
>> +    cat > $TMPC <<EOF
>> +#include <stdio.h>
>> +#include <xseg/xseg.h>
>> +#include <xseg/protocol.h>
>> +int main(void) {
>> +    xseg_initialize();
>> +    return 0;
>> +}
>> +EOF
>> +    archipelago_libs=-lxseg
>> +    if compile_prog "" "$archipelago_libs"; then
>> +        archipelago="yes"
>> +        libs_tools="$archipelago_libs $libs_tools"
>> +        libs_softmmu="$archipelago_libs $libs_softmmu"
>> +    else
>> +      if test "$archipelago" = "yes" ; then
>> +        feature_not_found "Archipelago backend support" "Install libxseg devel"
>> +      fi
>> +      archipelago="no"
>> +    fi
>> +fi
>> +
>> +
>>   ##########################################
>>   # glusterfs probe
>>   if test "$glusterfs" != "no" ; then
>> @@ -4230,6 +4264,7 @@ echo "seccomp support   $seccomp"
>>   echo "coroutine backend $coroutine"
>>   echo "coroutine pool    $coroutine_pool"
>>   echo "GlusterFS support $glusterfs"
>> +echo "Archipelago support $archipelago"
>>   echo "virtio-blk-data-plane $virtio_blk_data_plane"
>>   echo "gcov              $gcov_tool"
>>   echo "gcov enabled      $gcov"
>> @@ -4665,6 +4700,11 @@ if test "$glusterfs_zerofill" = "yes" ; then
>>     echo "CONFIG_GLUSTERFS_ZEROFILL=y" >> $config_host_mak
>>   fi
>>   
>> +if test "$archipelago" = "yes" ; then
>> +  echo "CONFIG_ARCHIPELAGO=m" >> $config_host_mak
>> +  echo "ARCHIPELAGO_LIBS=$archipelago_libs" >> $config_host_mak
>> +fi
>> +
>>   if test "$libssh2" = "yes" ; then
>>     echo "CONFIG_LIBSSH2=m" >> $config_host_mak
>>     echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
>> -- 
>> 1.7.10.4
>>
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/5] block: Support Archipelago as a QEMU block backend
  2014-07-10 10:04     ` Chrysostomos Nanakos
@ 2014-07-10 14:02       ` Chrysostomos Nanakos
  2014-07-22 12:40       ` Stefan Hajnoczi
  1 sibling, 0 replies; 22+ messages in thread
From: Chrysostomos Nanakos @ 2014-07-10 14:02 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha

On 07/10/2014 01:04 PM, Chrysostomos Nanakos wrote:
> On 07/10/2014 03:23 AM, Jeff Cody wrote:
>> On Fri, Jun 27, 2014 at 11:24:08AM +0300, Chrysostomos Nanakos wrote:
>>> VM Image on Archipelago volume is specified like this:
>>>
>>> file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[, 
>>>
>>> file.vport=<vlmcd_port>][,file.segment=<segment_name>]]
>>>
>>> 'archipelago' is the protocol.
>>>
>>> 'mport' is the port number on which mapperd is listening. This is 
>>> optional
>>> and if not specified, QEMU will make Archipelago to use the default 
>>> port.
>>>
>>> 'vport' is the port number on which vlmcd is listening. This is 
>>> optional
>>> and if not specified, QEMU will make Archipelago to use the default 
>>> port.
>>>
>>> 'segment' is the name of the shared memory segment Archipelago stack 
>>> is using.
>>> This is optional and if not specified, QEMU will make Archipelago to 
>>> use the
>>> default value, 'archipelago'.
>>>
>>> Examples:
>>>
>>> file.driver=archipelago,file.volume=my_vm_volume
>>> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
>>> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
>>> file.vport=1234
>>> file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
>>> file.vport=1234,file.segment=my_segment
>>>
>>> Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
>> This is just a superficial review, because I don't have a good idea of
>> what archipelago or libxseg really does (I didn't even compile it or
>> these patches).  But I scanned through this patch, and found a few
>> things, and had a few questions.
>
> No worries, every review is more than welcome.
>
>>
>>> ---
>>>   MAINTAINERS         |    6 +
>>>   block/Makefile.objs |    2 +
>>>   block/archipelago.c |  819 
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   configure           |   40 +++
>>>   4 files changed, 867 insertions(+)
>>>   create mode 100644 block/archipelago.c
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index 9b93edd..58ef1e3 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -999,3 +999,9 @@ SSH
>>>   M: Richard W.M. Jones <rjones@redhat.com>
>>>   S: Supported
>>>   F: block/ssh.c
>>> +
>>> +ARCHIPELAGO
>>> +M: Chrysostomos Nanakos <cnanakos@grnet.gr>
>>> +M: Chrysostomos Nanakos <chris@include.gr>
>>> +S: Maintained
>>> +F: block/archipelago.c
>>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>>> index fd88c03..858d2b3 100644
>>> --- a/block/Makefile.objs
>>> +++ b/block/Makefile.objs
>>> @@ -17,6 +17,7 @@ block-obj-$(CONFIG_LIBNFS) += nfs.o
>>>   block-obj-$(CONFIG_CURL) += curl.o
>>>   block-obj-$(CONFIG_RBD) += rbd.o
>>>   block-obj-$(CONFIG_GLUSTERFS) += gluster.o
>>> +block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
>>>   block-obj-$(CONFIG_LIBSSH2) += ssh.o
>>>   endif
>>>   @@ -35,5 +36,6 @@ gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
>>>   gluster.o-libs     := $(GLUSTERFS_LIBS)
>>>   ssh.o-cflags       := $(LIBSSH2_CFLAGS)
>>>   ssh.o-libs         := $(LIBSSH2_LIBS)
>>> +archipelago.o-libs := $(ARCHIPELAGO_LIBS)
>>>   qcow.o-libs        := -lz
>>>   linux-aio.o-libs   := -laio
>>> diff --git a/block/archipelago.c b/block/archipelago.c
>>> new file mode 100644
>>> index 0000000..c56826a
>>> --- /dev/null
>>> +++ b/block/archipelago.c
>>> @@ -0,0 +1,819 @@
>>> +/*
>>> + * QEMU Block driver for Archipelago
>>> + *
>>> + * Copyright 2014 GRNET S.A. All rights reserved.
>>> + *
>>> + * Redistribution and use in source and binary forms, with or
>>> + * without modification, are permitted provided that the following
>>> + * conditions are met:
>>> + *
>>> + *   1. Redistributions of source code must retain the above
>>> + *      copyright notice, this list of conditions and the following
>>> + *      disclaimer.
>>> + *   2. Redistributions in binary form must reproduce the above
>>> + *      copyright notice, this list of conditions and the following
>>> + *      disclaimer in the documentation and/or other materials
>>> + *      provided with the distribution.
>>> + *
>>> + * THIS SOFTWARE IS PROVIDED BY GRNET S.A. ``AS IS'' AND ANY EXPRESS
>>> + * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
>>> + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
>>> + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GRNET S.A OR
>>> + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>>> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
>>> + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
>>> + * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
>>> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
>>> + * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>>> + * POSSIBILITY OF SUCH DAMAGE.
>>> + *
>>> + * The views and conclusions contained in the software and
>>> + * documentation are those of the authors and should not be
>>> + * interpreted as representing official policies, either expressed
>>> + * or implied, of GRNET S.A.
>>> + */
>>> +
>>> +/*
>>> +* VM Image on Archipelago volume is specified like this:
>>> +*
>>> +* 
>>> file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
>>> +* file.vport=<vlmcd_port>][,file.segment=<segment_name>]]
>>> +*
>>> +* 'archipelago' is the protocol.
>>> +*
>>> +* 'mport' is the port number on which mapperd is listening. This is 
>>> optional
>>> +* and if not specified, QEMU will make Archipelago to use the 
>>> default port.
>>> +*
>>> +* 'vport' is the port number on which vlmcd is listening. This is 
>>> optional
>>> +* and if not specified, QEMU will make Archipelago to use the 
>>> default port.
>>> +*
>>> +* 'segment' is the name of the shared memory segment Archipelago 
>>> stack is using.
>>> +* This is optional and if not specified, QEMU will make Archipelago 
>>> to use the
>>> +* default value, 'archipelago'.
>>> +*
>>> +* Examples:
>>> +*
>>> +* file.driver=archipelago,file.volume=my_vm_volume
>>> +* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
>>> +* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
>>> +* file.vport=1234
>>> +* file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
>>> +* file.vport=1234,file.segment=my_segment
>>> +*/
>>> +
>>> +#include "block/block_int.h"
>>> +#include "qemu/error-report.h"
>>> +#include "qemu/thread.h"
>>> +#include "qapi/qmp/qint.h"
>>> +#include "qapi/qmp/qstring.h"
>>> +#include "qapi/qmp/qjson.h"
>>> +
>>> +#include <inttypes.h>
>>> +#include <xseg/xseg.h>
>>> +#include <xseg/protocol.h>
>>> +
>>> +#define ARCHIP_FD_READ      0
>>> +#define ARCHIP_FD_WRITE     1
>>> +#define MAX_REQUEST_SIZE    524288
>>> +
>>> +#define ARCHIPELAGO_OPT_VOLUME      "volume"
>>> +#define ARCHIPELAGO_OPT_SEGMENT     "segment"
>>> +#define ARCHIPELAGO_OPT_MPORT       "mport"
>>> +#define ARCHIPELAGO_OPT_VPORT       "vport"
>>> +
>>> +#define archipelagolog(fmt, ...) \
>>> +    do {                         \
>>> +        fprintf(stderr, "archipelago\t%-24s: " fmt, __func__, 
>>> ##__VA_ARGS__); \
>>> +    } while (0)
>>> +
>>> +typedef enum {
>>> +    ARCHIP_OP_READ,
>>> +    ARCHIP_OP_WRITE,
>>> +    ARCHIP_OP_FLUSH,
>>> +    ARCHIP_OP_VOLINFO,
>>> +} ARCHIPCmd;
>>> +
>>> +typedef struct ArchipelagoAIOCB {
>>> +    BlockDriverAIOCB common;
>>> +    QEMUBH *bh;
>>> +    struct BDRVArchipelagoState *s;
>>> +    QEMUIOVector *qiov;
>>> +    void *buffer;
>>> +    ARCHIPCmd cmd;
>>> +    bool cancelled;
>>> +    int status;
>>> +    int64_t size;
>>> +    int64_t ret;
>>> +} ArchipelagoAIOCB;
>>> +
>>> +typedef struct BDRVArchipelagoState {
>>> +    ArchipelagoAIOCB *event_acb;
>>> +    char *volname;
>>> +    char *segment_name;
>>> +    uint64_t size;
>>> +    /* Archipelago specific */
>>> +    struct xseg *xseg;
>> I assume s->xseg is allocated in xseg_join() - is it ever freed?  In
>> _close(), there is a final call to xseg_leave(s->xseg), but from what
>> I found in libxseg, it does not appear to be freed:
>> https://github.com/cnanakos/libxseg/blob/develop/src/xseg.c#L975
>>
>> Is it up to libxseg to free xseg, or the caller?
>
> libxseg allocated xseg and should free it also. I will fix that in 
> libxseg. Thanks!
>
>>> +    struct xseg_port *port;
>>> +    xport srcport;
>>> +    xport sport;
>>> +    xport mportno;
>>> +    xport vportno;
>>> +    QemuMutex archip_mutex;
>>> +    QemuCond archip_cond;
>>> +    bool is_signaled;
>>> +    /* Request handler specific */
>>> +    QemuThread request_th;
>>> +    QemuCond request_cond;
>>> +    QemuMutex request_mutex;
>>> +    bool th_is_signaled;
>>> +    bool stopping;
>>> +} BDRVArchipelagoState;
>>> +
>>> +typedef struct ArchipelagoSegmentedRequest {
>>> +    size_t count;
>>> +    size_t total;
>>> +    int ref;
>>> +    int failed;
>>> +} ArchipelagoSegmentedRequest;
>>> +
>>> +typedef struct AIORequestData {
>>> +    const char *volname;
>>> +    off_t offset;
>>> +    size_t size;
>>> +    uint64_t bufidx;
>>> +    int ret;
>>> +    int op;
>>> +    ArchipelagoAIOCB *aio_cb;
>>> +    ArchipelagoSegmentedRequest *segreq;
>>> +} AIORequestData;
>>> +
>>> +static void qemu_archipelago_complete_aio(void *opaque);
>>> +
>>> +static void init_local_signal(struct xseg *xseg, xport sport, xport 
>>> srcport)
>>> +{
>>> +    if (xseg && (sport != srcport)) {
>>> +        xseg_init_local_signal(xseg, srcport);
>>> +        sport = srcport;
>>> +    }
>>> +}
>>> +
>>> +static void archipelago_finish_aiocb(AIORequestData *reqdata)
>>> +{
>>> +    if (reqdata->aio_cb->ret != reqdata->segreq->total) {
>>> +        reqdata->aio_cb->ret = -EIO;
>>> +    } else if (reqdata->aio_cb->ret == reqdata->segreq->total) {
>>> +        reqdata->aio_cb->ret = 0;
>>> +    }
>>> +    reqdata->aio_cb->bh = aio_bh_new(
>>> + bdrv_get_aio_context(reqdata->aio_cb->common.bs),
>>> +                        qemu_archipelago_complete_aio, reqdata
>>> +                        );
>>> +    qemu_bh_schedule(reqdata->aio_cb->bh);
>>> +}
>>> +
>>> +static int wait_reply(struct xseg *xseg, xport srcport, struct 
>>> xseg_port *port,
>>> +                      struct xseg_request *expected_req)
>>> +{
>>> +    struct xseg_request *req;
>>> +    xseg_prepare_wait(xseg, srcport);
>>> +    void *psd = xseg_get_signal_desc(xseg, port);
>>> +    while (1) {
>>> +        req = xseg_receive(xseg, srcport, 0);
>>> +        if (req) {
>>> +            if (req != expected_req) {
>>> +                archipelagolog("Unknown received request\n");
>>> +                xseg_put_request(xseg, req, srcport);
>>> +            } else if (!(req->state & XS_SERVED)) {
>>> +                return -1;
>>> +            } else {
>>> +                break;
>>> +            }
>>> +        }
>>> +        xseg_wait_signal(xseg, psd, 100000UL);
>>> +    }
>>> +    xseg_cancel_wait(xseg, srcport);
>>> +    return 0;
>>> +}
>>> +
>>> +static void xseg_request_handler(void *state)
>>> +{
>>> +    BDRVArchipelagoState *s = (BDRVArchipelagoState *) state;
>>> +    void *psd = xseg_get_signal_desc(s->xseg, s->port);
>>> +    qemu_mutex_lock(&s->request_mutex);
>>> +
>>> +    while (!s->stopping) {
>>> +        struct xseg_request *req;
>>> +        void *data;
>>> +        xseg_prepare_wait(s->xseg, s->srcport);
>>> +        req = xseg_receive(s->xseg, s->srcport, 0);
>> Is this a blocking call?  If so, is there a timeout, and if not, what
>> scenarios (if any) could cause us to wait here indefinitely?
>
> xseg_receive won't block until a request is received but It will wait 
> until it takes
> the volume lock, check if there is or not a request and will return 
> after giving back the volume lock.
> If it can't take the lock, it could wait indefinitely, there is no 
> timeout for that at the moment
> but I could easily add this kind of functionality.
>
> On the other hand xseg_receive won't wait for the volume lock if the 
> caller sets X_NONBLOCK.
>
> After that, I believe I should set here and in wait_reply() also, the 
> X_NONBLOCK flag.


s/volume lock/receive queue lock/g



>
>
>
>>
>>> +        if (req) {
>>> +            AIORequestData *reqdata;
>>> +            ArchipelagoSegmentedRequest *segreq;
>>> +            xseg_get_req_data(s->xseg, req, (void **)&reqdata);
>>> +
>>> +            switch (reqdata->op) {
>>> +            case ARCHIP_OP_READ:
>>> +                    data = xseg_get_data(s->xseg, req);
>>> +                    segreq = reqdata->segreq;
>>> +                    segreq->count += req->serviced;
>>> +
>>> + qemu_iovec_from_buf(reqdata->aio_cb->qiov, reqdata->bufidx,
>>> +                                        data,
>>> +                                        req->serviced);
>>> +
>>> +                    xseg_put_request(s->xseg, req, s->srcport);
>>> +
>>> +                    if ((__sync_add_and_fetch(&segreq->ref, -1)) == 
>>> 0) {
>>> +                        if (!segreq->failed) {
>>> +                            reqdata->aio_cb->ret = segreq->count;
>>> + archipelago_finish_aiocb(reqdata);
>>> +                            g_free(segreq);
>>> +                        } else {
>>> +                            g_free(segreq);
>>> +                            g_free(reqdata);
>>> +                        }
>>> +                    } else {
>>> +                        g_free(reqdata);
>>> +                    }
>>> +                    break;
>>> +            case ARCHIP_OP_WRITE:
>>> +                    segreq = reqdata->segreq;
>>> +                    segreq->count += req->serviced;
>>> +                    xseg_put_request(s->xseg, req, s->srcport);
>>> +
>>> +                    if ((__sync_add_and_fetch(&segreq->ref, -1)) == 
>>> 0) {
>>> +                        if (!segreq->failed) {
>>> +                            reqdata->aio_cb->ret = segreq->count;
>>> + archipelago_finish_aiocb(reqdata);
>>> +                            g_free(segreq);
>>> +                        } else {
>>> +                            g_free(segreq);
>>> +                            g_free(reqdata);
>>> +                        }
>>> +                    } else {
>>> +                        g_free(reqdata);
>>> +
>> This (OP_WRITE / OP_READ) is where I am worried that we leak in error
>> cases, and a _close() won't clean it up (see later comments).
>
>>> +                    break;
>>> +            case ARCHIP_OP_VOLINFO:
>>> +                    s->is_signaled = true;
>>> +                    qemu_cond_signal(&s->archip_cond);
>>> +                    break;
>>> +            }
>>> +        } else {
>>> +            xseg_wait_signal(s->xseg, psd, 100000UL);
>>> +        }
>>> +        xseg_cancel_wait(s->xseg, s->srcport);
>>
>>
>>> +    }
>>> +
>>> +    s->th_is_signaled = true;
>>> +    qemu_cond_signal(&s->request_cond);
>>> +    qemu_mutex_unlock(&s->request_mutex);
>>> +    qemu_thread_exit(NULL);
>>> +}
>>> +
>>> +static int qemu_archipelago_xseg_init(BDRVArchipelagoState *s)
>>> +{
>>> +    if (xseg_initialize()) {
>>> +        archipelagolog("Cannot initialize XSEG\n");
>>> +        goto err_exit;
>>> +    }
>>> +
>>> +    s->xseg = xseg_join((char *)"posix", s->segment_name,
>>> +                        (char *)"posixfd", NULL);
>>> +    if (!s->xseg) {
>>> +        archipelagolog("Cannot join XSEG shared memory segment\n");
>>> +        goto err_exit;
>>> +    }
>>> +    s->port = xseg_bind_dynport(s->xseg);
>>> +    s->srcport = s->port->portno;
>>> +    init_local_signal(s->xseg, s->sport, s->srcport);
>>> +    return 0;
>>> +
>>> +err_exit:
>>> +    return -1;
>>> +}
>>> +
>>> +static int qemu_archipelago_init(BDRVArchipelagoState *s)
>>> +{
>>> +    int ret;
>>> +
>>> +    ret = qemu_archipelago_xseg_init(s);
>>> +    if (ret < 0) {
>>> +        error_report("Cannot initialize XSEG. Aborting...\n");
>>> +        goto err_exit;
>>> +    }
>>> +
>>> +    qemu_cond_init(&s->archip_cond);
>>> +    qemu_mutex_init(&s->archip_mutex);
>>> +    qemu_cond_init(&s->request_cond);
>>> +    qemu_mutex_init(&s->request_mutex);
>>> +    s->th_is_signaled = false;
>>> +    qemu_thread_create(&s->request_th, "xseg_io_th",
>>> +                       (void *) xseg_request_handler,
>>> +                       (void *) s, QEMU_THREAD_JOINABLE);
>>> +
>>> +err_exit:
>>> +    return ret;
>>> +}
>>> +
>>> +static void qemu_archipelago_complete_aio(void *opaque)
>>> +{
>>> +    AIORequestData *reqdata = (AIORequestData *) opaque;
>>> +    ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) reqdata->aio_cb;
>>> +
>>> +    qemu_bh_delete(aio_cb->bh);
>>> +    qemu_vfree(aio_cb->buffer);
>>> +    aio_cb->common.cb(aio_cb->common.opaque, aio_cb->ret);
>>> +    aio_cb->status = 0;
>>> +
>>> +    if (!aio_cb->cancelled) {
>>> +        qemu_aio_release(aio_cb);
>>> +    }
>>> +    g_free(reqdata);
>>> +}
>>> +
>>> +static QemuOptsList archipelago_runtime_opts = {
>>> +    .name = "archipelago",
>>> +    .head = QTAILQ_HEAD_INITIALIZER(archipelago_runtime_opts.head),
>>> +    .desc = {
>>> +        {
>>> +            .name = ARCHIPELAGO_OPT_VOLUME,
>>> +            .type = QEMU_OPT_STRING,
>>> +            .help = "Name of the volume image",
>>> +        },
>>> +        {
>>> +            .name = ARCHIPELAGO_OPT_SEGMENT,
>>> +            .type = QEMU_OPT_STRING,
>>> +            .help = "Name of the Archipelago shared memory segment",
>>> +        },
>>> +        {
>>> +            .name = ARCHIPELAGO_OPT_MPORT,
>>> +            .type = QEMU_OPT_NUMBER,
>>> +            .help = "Archipelago mapperd port number"
>>> +        },
>>> +        {
>>> +            .name = ARCHIPELAGO_OPT_VPORT,
>>> +            .type = QEMU_OPT_NUMBER,
>>> +            .help = "Archipelago vlmcd port number"
>>> +
>>> +        },
>>> +        { /* end of list */ }
>>> +    },
>>> +};
>>> +
>>> +static int qemu_archipelago_open(BlockDriverState *bs,
>>> +                                 QDict *options,
>>> +                                 int bdrv_flags,
>>> +                                 Error **errp)
>>> +{
>>> +    int ret = 0;
>>> +    const char *volume, *segment_name;
>>> +    QemuOpts *opts;
>>> +    Error *local_err = NULL;
>>> +    BDRVArchipelagoState *s = bs->opaque;
>>> +
>>> +    opts = qemu_opts_create(&archipelago_runtime_opts, NULL, 0, 
>>> &error_abort);
>>> +    qemu_opts_absorb_qdict(opts, options, &local_err);
>>> +    if (local_err) {
>>> +        error_propagate(errp, local_err);
>>> +        qemu_opts_del(opts);
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    s->mportno = qemu_opt_get_number(opts, ARCHIPELAGO_OPT_MPORT, 
>>> 1001);
>>> +    s->vportno = qemu_opt_get_number(opts, ARCHIPELAGO_OPT_VPORT, 
>>> 501);
>>> +
>>> +    segment_name = qemu_opt_get(opts, ARCHIPELAGO_OPT_SEGMENT);
>>> +    if (segment_name == NULL) {
>>> +        s->segment_name = g_strdup("archipelago");
>>> +    } else {
>>> +        s->segment_name = g_strdup(segment_name);
>>> +    }
>>> +
>>> +    volume = qemu_opt_get(opts, ARCHIPELAGO_OPT_VOLUME);
>>> +    if (volume == NULL) {
>>> +        error_setg(errp, "archipelago block driver requires the 
>>> 'volume'"
>>> +                   " option");
>>> +        qemu_opts_del(opts);
>>> +        return -EINVAL;
>> s->segment_name is leaked here.
>>
>> You already have an exit label (err_exit) that cleans everything up,
>> and g_free() is NULL safe (and bs->opaque is zero-initialized).
>>
>> You should be able to just set ret, and 'goto err_exit' in each error
>> instance in qemu_archipelago_open() - this also gets rid of the extra
>> qemu_opts_del() calls.
>
> Missed it. Thanks for that!
>>
>>> +    }
>>> +    s->volname = g_strdup(volume);
>>> +
>>> +    /* Initialize XSEG, join shared memory segment */
>>> +    ret = qemu_archipelago_init(s);
>>> +    if (ret < 0) {
>>> +        error_setg(errp, "cannot initialize XSEG and join shared "
>>> +                   "memory segment");
>>> +        goto err_exit;
>>> +    }
>>> +
>>> +    qemu_opts_del(opts);
>>> +    return 0;
>>> +
>>> +err_exit:
>>> +    g_free(s->volname);
>>> +    g_free(s->segment_name);
>>> +    qemu_opts_del(opts);
>>> +    return ret;
>>> +}
>>> +
>>> +static void qemu_archipelago_close(BlockDriverState *bs)
>>> +{
>>> +    int r, targetlen;
>>> +    char *target;
>>> +    struct xseg_request *req;
>>> +    BDRVArchipelagoState *s = bs->opaque;
>>> +
>>> +    s->stopping = true;
>>> +
>>> +    qemu_mutex_lock(&s->request_mutex);
>>> +    while (!s->th_is_signaled) {
>>> +        qemu_cond_wait(&s->request_cond,
>>> +                       &s->request_mutex);
>>> +    }
>>> +    qemu_mutex_unlock(&s->request_mutex);
>>> +    qemu_thread_join(&s->request_th);
>>> +    qemu_cond_destroy(&s->request_cond);
>>> +    qemu_mutex_destroy(&s->request_mutex);
>>> +
>>> +    qemu_cond_destroy(&s->archip_cond);
>>> +    qemu_mutex_destroy(&s->archip_mutex);
>>> +
>>> +    targetlen = strlen(s->volname);
>> Should this be strlen(s->volname) + 1, to account for the '\0'?  Or
>> does xseg_prep_request() just need the length of the non-null
>> terminated string?
>
> You are right, xseg_prep_request() needs the length of the non-null 
> terminated string.
>
>>
>>> +    req = xseg_get_request(s->xseg, s->srcport, s->vportno, X_ALLOC);
>>> +    if (!req) {
>>> +        archipelagolog("Cannot get XSEG request\n");
>>> +        goto err_exit;
>>> +    }
>>> +    r = xseg_prep_request(s->xseg, req, targetlen, 0);
>>> +    if (r < 0) {
>>> +        xseg_put_request(s->xseg, req, s->srcport);
>> What does this do here, if xseg_prep_request() failed?  Is it
>> essentially a cleanup function?
>
> Yes this is a cleanup function.
>
>>> +        archipelagolog("Cannot prepare XSEG close request\n");
>>> +        goto err_exit;
>>> +    }
>>> +
>>> +    target = xseg_get_target(s->xseg, req);
>>> +    memcpy(target, s->volname, targetlen);
>>> +    req->size = req->datalen;
>>> +    req->offset = 0;
>>> +    req->op = X_CLOSE;
>>> +
>>> +    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
>>> +    if (p == NoPort) {
>>> +        xseg_put_request(s->xseg, req, s->srcport);
>>> +        archipelagolog("Cannot submit XSEG close request\n");
>>> +        goto err_exit;
>>> +    }
>>> +
>>> +    xseg_signal(s->xseg, p);
>>> +    wait_reply(s->xseg, s->srcport, s->port, req);
>> This is another spot I am wondering if we could get stuck on a
>> blocking call that could potentially wait forever... is there a
>> timeout here?
>
> For the same reasons I explained in xseg_receive(). We will resolve 
> this by setting X_NONBLOCK.
>
>>
>>> +
>>> +    xseg_put_request(s->xseg, req, s->srcport);
>>> +
>>> +err_exit:
>>> +    g_free(s->volname);
>>> +    g_free(s->segment_name);
>>> +    xseg_quit_local_signal(s->xseg, s->srcport);
>>> +    xseg_leave_dynport(s->xseg, s->port);
>>> +    xseg_leave(s->xseg);
>>> +}
>>> +
>>> +static void qemu_archipelago_aio_cancel(BlockDriverAIOCB *blockacb)
>>> +{
>>> +    ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) blockacb;
>>> +    aio_cb->cancelled = true;
>>> +    while (aio_cb->status == -EINPROGRESS) {
>>> +        qemu_aio_wait();
>>> +    }
>>> +    qemu_aio_release(aio_cb);
>>> +}
>>> +
>>> +static const AIOCBInfo archipelago_aiocb_info = {
>>> +    .aiocb_size = sizeof(ArchipelagoAIOCB),
>>> +    .cancel = qemu_archipelago_aio_cancel,
>>> +};
>>> +
>>> +static int __archipelago_submit_request(BDRVArchipelagoState *s,
>>> +                                        uint64_t bufidx,
>>> +                                        size_t count,
>>> +                                        off_t offset,
>>> +                                        ArchipelagoAIOCB *aio_cb,
>>> + ArchipelagoSegmentedRequest *segreq,
>>> +                                        int op)
>>> +{
>>> +    int ret, targetlen;
>>> +    char *target;
>>> +    void *data = NULL;
>>> +    struct xseg_request *req;
>>> +    AIORequestData *reqdata = g_malloc(sizeof(AIORequestData));
>>> +
>>> +    targetlen = strlen(s->volname);
>>> +    req = xseg_get_request(s->xseg, s->srcport, s->vportno, X_ALLOC);
>>> +    if (!req) {
>>> +        archipelagolog("Cannot get XSEG request\n");
>>> +        goto err_exit2;
>>> +    }
>>> +    ret = xseg_prep_request(s->xseg, req, targetlen, count);
>>> +    if (ret < 0) {
>>> +        archipelagolog("Cannot prepare XSEG request\n");
>>> +        goto err_exit;
>>> +    }
>>> +    target = xseg_get_target(s->xseg, req);
>>> +    if (!target) {
>>> +        archipelagolog("Cannot get XSEG target\n");
>>> +        goto err_exit;
>>> +    }
>>> +    memcpy(target, s->volname, targetlen);
>>> +    req->size = count;
>>> +    req->offset = offset;
>>> +
>>> +    switch (op) {
>>> +    case ARCHIP_OP_READ:
>>> +        req->op = X_READ;
>>> +        break;
>>> +    case ARCHIP_OP_WRITE:
>>> +        req->op = X_WRITE;
>>> +        break;
>>> +    }
>>> +    reqdata->volname = s->volname;
>>> +    reqdata->offset = offset;
>>> +    reqdata->size = count;
>>> +    reqdata->bufidx = bufidx;
>>> +    reqdata->aio_cb = aio_cb;
>>> +    reqdata->segreq = segreq;
>>> +    reqdata->op = op;
>>> +
>>> +    xseg_set_req_data(s->xseg, req, reqdata);
>>> +    if (op == ARCHIP_OP_WRITE) {
>>> +        data = xseg_get_data(s->xseg, req);
>>> +        if (!data) {
>>> +            archipelagolog("Cannot get XSEG data\n");
>>> +            goto err_exit;
>>> +        }
>>> +        memcpy(data, aio_cb->buffer + bufidx, count);
>>> +    }
>>> +
>>> +    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
>>> +    if (p == NoPort) {
>>> +        archipelagolog("Could not submit XSEG request\n");
>>> +        goto err_exit;
>>> +    }
>>> +    xseg_signal(s->xseg, p);
>>> +    return 0;
>>> +
>>> +err_exit:
>>> +    g_free(reqdata);
>>> +    xseg_put_request(s->xseg, req, s->srcport);
>>> +    return -EIO;
>>> +err_exit2:
>>> +    g_free(reqdata);
>>> +    return -EIO;
>>> +}
>>> +
>>> +static int archipelago_aio_segmented_rw(BDRVArchipelagoState *s,
>>> +                                        size_t count,
>>> +                                        off_t offset,
>>> +                                        ArchipelagoAIOCB *aio_cb,
>>> +                                        int op)
>>> +{
>>> +    int i, ret, segments_nr, last_segment_size;
>>> +    ArchipelagoSegmentedRequest *segreq;
>>> +
>>> +    segreq = g_malloc(sizeof(ArchipelagoSegmentedRequest));
>>> +
>>> +    if (op == ARCHIP_OP_FLUSH) {
>>> +        segments_nr = 1;
>>> +        segreq->ref = segments_nr;
>>> +        segreq->total = count;
>>> +        segreq->count = 0;
>>> +        segreq->failed = 0;
>>> +        ret = __archipelago_submit_request(s, 0, count, offset, 
>>> aio_cb,
>>> +                                           segreq, ARCHIP_OP_WRITE);
>>> +        if (ret < 0) {
>>> +            goto err_exit;
>>> +        }
>>> +        return 0;
>>> +    }
>>> +
>>> +    segments_nr = (int)(count / MAX_REQUEST_SIZE) + \
>>> +                  ((count % MAX_REQUEST_SIZE) ? 1 : 0);
>>> +    last_segment_size = (int)(count % MAX_REQUEST_SIZE);
>>> +
>>> +    segreq->ref = segments_nr;
>>> +    segreq->total = count;
>>> +    segreq->count = 0;
>>> +    segreq->failed = 0;
>>> +
>>> +    for (i = 0; i < segments_nr - 1; i++) {
>>> +        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
>>> +                                           MAX_REQUEST_SIZE,
>>> +                                           offset + i * 
>>> MAX_REQUEST_SIZE,
>>> +                                           aio_cb, segreq, op);
>>> +
>>> +        if (ret < 0) {
>>> +            goto err_exit;
>>> +        }
>>> +    }
>>> +
>>> +    if ((segments_nr > 1) && last_segment_size) {
>>> +        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
>>> +                                           last_segment_size,
>>> +                                           offset + i * 
>>> MAX_REQUEST_SIZE,
>>> +                                           aio_cb, segreq, op);
>>> +    } else if ((segments_nr > 1) && !last_segment_size) {
>>> +        ret = __archipelago_submit_request(s, i * MAX_REQUEST_SIZE,
>>> +                                           MAX_REQUEST_SIZE,
>>> +                                           offset + i * 
>>> MAX_REQUEST_SIZE,
>>> +                                           aio_cb, segreq, op);
>>> +    } else if (segments_nr == 1) {
>>> +            ret = __archipelago_submit_request(s, 0, count, offset, 
>>> aio_cb,
>>> +                                               segreq, op);
>>> +    }
>>> +
>>> +    if (ret < 0) {
>>> +        goto err_exit;
>>> +    }
>>> +
>>> +    return 0;
>>> +
>>> +err_exit:
>>> +    __sync_add_and_fetch(&segreq->failed, 1);
>>> +    if (segments_nr == 1) {
>>> +        if (__sync_add_and_fetch(&segreq->ref, -1) == 0) {
>>> +            g_free(segreq);
>>> +        }
>>> +    } else {
>>> +        if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i)) 
>>> == 0) {
>>> +            g_free(segreq);
>>> +        }
>>> +    }
>> Don't we run the risk of leaking segreq here?  The other place this is
>> freed is in xseg_request_handler(), but could we run into a race
>> condition where 's->stopping' is true, or even xseg_receive() just 
>> does not
>> return a request?
>
> If 's->stopping' is true means that _close() has been invoked. How 
> QEMU handles unserviced requests while in the meantime someone invokes 
> _close()? Does it wait for the requests to finish and then exits? Or 
> it exits silently without checking for pending requests?
>
> If xseg_receive() does not return an already submitted request then 
> the problem is located in Archipelago stack. Someone should check why 
> the pending requests are not serviced and resolve the problem. The 
> question here is the same as before, how QEMU handles pending requests 
> while in the meantime invokes _close()?
>
> Until all pending requests are serviced successfully or not, segreq 
> allocations will remain and not freed. Another approach could have 
> been a linked list that tracks all submitted requests and handle them 
> accordingly on _close().
>
> Suggestions here are more than welcome!
>
>
>>
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static BlockDriverAIOCB *qemu_archipelago_aio_rw(BlockDriverState *bs,
>>> +                                                 int64_t sector_num,
>>> +                                                 QEMUIOVector *qiov,
>>> +                                                 int nb_sectors,
>>> + BlockDriverCompletionFunc *cb,
>>> +                                                 void *opaque,
>>> +                                                 int op)
>>> +{
>>> +    ArchipelagoAIOCB *aio_cb;
>>> +    BDRVArchipelagoState *s = bs->opaque;
>>> +    int64_t size, off;
>>> +    int ret;
>>> +
>>> +    aio_cb = qemu_aio_get(&archipelago_aiocb_info, bs, cb, opaque);
>>> +    aio_cb->cmd = op;
>>> +    aio_cb->qiov = qiov;
>>> +
>>> +    if (op != ARCHIP_OP_FLUSH) {
>>> +        aio_cb->buffer = qemu_blockalign(bs, qiov->size);
>>> +    } else {
>>> +        aio_cb->buffer = NULL;
>>> +    }
>>> +
>>> +    aio_cb->ret = 0;
>>> +    aio_cb->s = s;
>>> +    aio_cb->cancelled = false;
>>> +    aio_cb->status = -EINPROGRESS;
>>> +
>>> +    if (op == ARCHIP_OP_WRITE) {
>>> +        qemu_iovec_to_buf(aio_cb->qiov, 0, aio_cb->buffer, 
>>> qiov->size);
>>> +    }
>>> +
>>> +    off = sector_num * BDRV_SECTOR_SIZE;
>>> +    size = nb_sectors * BDRV_SECTOR_SIZE;
>>> +    aio_cb->size = size;
>>> +
>>> +    ret = archipelago_aio_segmented_rw(s, size, off,
>>> +                                       aio_cb, op);
>>> +    if (ret < 0) {
>>> +        goto err_exit;
>>> +    }
>>> +    return &aio_cb->common;
>>> +
>>> +err_exit:
>>> +    error_report("qemu_archipelago_aio_rw(): I/O Error\n");
>>> +    qemu_vfree(aio_cb->buffer);
>>> +    qemu_aio_release(aio_cb);
>>> +    return NULL;
>>> +}
>>> +
>>> +static BlockDriverAIOCB 
>>> *qemu_archipelago_aio_readv(BlockDriverState *bs,
>>> +        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>>> +        BlockDriverCompletionFunc *cb, void *opaque)
>>> +{
>>> +    return qemu_archipelago_aio_rw(bs, sector_num, qiov, 
>>> nb_sectors, cb,
>>> +                                   opaque, ARCHIP_OP_READ);
>>> +}
>>> +
>>> +static BlockDriverAIOCB 
>>> *qemu_archipelago_aio_writev(BlockDriverState *bs,
>>> +        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>>> +        BlockDriverCompletionFunc *cb, void *opaque)
>>> +{
>>> +    return qemu_archipelago_aio_rw(bs, sector_num, qiov, 
>>> nb_sectors, cb,
>>> +                                   opaque, ARCHIP_OP_WRITE);
>>> +}
>>> +
>>> +static int64_t archipelago_volume_info(BDRVArchipelagoState *s)
>>> +{
>>> +    uint64_t size;
>>> +    int ret, targetlen;
>>> +    struct xseg_request *req;
>>> +    struct xseg_reply_info *xinfo;
>>> +    AIORequestData *reqdata = g_malloc(sizeof(AIORequestData));
>>> +
>>> +    const char *volname = s->volname;
>>> +    targetlen = strlen(volname);
>>> +    req = xseg_get_request(s->xseg, s->srcport, s->mportno, X_ALLOC);
>>> +    if (!req) {
>>> +        archipelagolog("Cannot get XSEG request\n");
>>> +        goto err_exit2;
>>> +    }
>>> +    ret = xseg_prep_request(s->xseg, req, targetlen,
>>> +                            sizeof(struct xseg_reply_info));
>>> +    if (ret < 0) {
>>> +        archipelagolog("Cannot prepare XSEG request\n");
>>> +        goto err_exit;
>>> +    }
>>> +    char *target = xseg_get_target(s->xseg, req);
>>> +    if (!target) {
>>> +        archipelagolog("Cannot get XSEG target\n");
>>> +        goto err_exit;
>>> +    }
>>> +    memcpy(target, volname, targetlen);
>>> +    req->size = req->datalen;
>>> +    req->offset = 0;
>>> +    req->op = X_INFO;
>>> +
>>> +    reqdata->op = ARCHIP_OP_VOLINFO;
>>> +    reqdata->volname = volname;
>>> +    xseg_set_req_data(s->xseg, req, reqdata);
>>> +
>>> +    xport p = xseg_submit(s->xseg, req, s->srcport, X_ALLOC);
>>> +    if (p == NoPort) {
>>> +        archipelagolog("Cannot submit XSEG request\n");
>>> +        goto err_exit;
>>> +    }
>>> +    xseg_signal(s->xseg, p);
>>> +    qemu_mutex_lock(&s->archip_mutex);
>>> +    while (!s->is_signaled) {
>>> +        qemu_cond_wait(&s->archip_cond, &s->archip_mutex);
>>> +    }
>>> +    s->is_signaled = false;
>>> +    qemu_mutex_unlock(&s->archip_mutex);
>>> +
>>> +    xinfo = (struct xseg_reply_info *) xseg_get_data(s->xseg, req);
>>> +    size = xinfo->size;
>>> +    xseg_put_request(s->xseg, req, s->srcport);
>>> +    g_free(reqdata);
>>> +    s->size = size;
>>> +    return size;
>>> +
>>
>>
>>> +err_exit:
>>> +    g_free(reqdata);
>>> +    xseg_put_request(s->xseg, req, s->srcport);
>>> +    return -1;
>>> +err_exit2:
>>> +    g_free(reqdata);
>>> +    return -1;
>>> +}
>> This could be simplified to just:
>>
>>   err_exit:
>>       xseg_put_request(s->xseg, req, s->srcport);
>>   err_exit2:
>>       g_free(reqdata);
>>       return -1;
>>   }
>>
>> Maybe it'd also be best to return -EIO (or other meaningful error
>> value) instead of just -1, as this value gets passed along to
>> .bdrv_getlength().
>>
>>> +
>>> +static int64_t qemu_archipelago_getlength(BlockDriverState *bs)
>>> +{
>>> +    int64_t ret;
>>> +    BDRVArchipelagoState *s = bs->opaque;
>>> +
>>> +    ret = archipelago_volume_info(s);
>> (This is where I am talking about an error value such as -EIO may be
>> better)
>
> Yes, it seems a lot better. Thanks!
>
>>
>>> +    return ret;
>>> +}
>>> +
>>> +static BlockDriverAIOCB 
>>> *qemu_archipelago_aio_flush(BlockDriverState *bs,
>>> +        BlockDriverCompletionFunc *cb, void *opaque)
>>> +{
>>> +    return qemu_archipelago_aio_rw(bs, 0, NULL, 0, cb, opaque,
>>> +                                   ARCHIP_OP_FLUSH);
>>> +}
>>> +
>>> +static BlockDriver bdrv_archipelago = {
>>> +    .format_name         = "archipelago",
>>> +    .protocol_name       = "archipelago",
>>> +    .instance_size       = sizeof(BDRVArchipelagoState),
>>> +    .bdrv_file_open      = qemu_archipelago_open,
>>> +    .bdrv_close          = qemu_archipelago_close,
>>> +    .bdrv_getlength      = qemu_archipelago_getlength,
>>> +    .bdrv_aio_readv      = qemu_archipelago_aio_readv,
>>> +    .bdrv_aio_writev     = qemu_archipelago_aio_writev,
>>> +    .bdrv_aio_flush      = qemu_archipelago_aio_flush,
>>> +    .bdrv_has_zero_init  = bdrv_has_zero_init_1,
>>> +};
>>> +
>>> +static void bdrv_archipelago_init(void)
>>> +{
>>> +    bdrv_register(&bdrv_archipelago);
>>> +}
>>> +
>>> +block_init(bdrv_archipelago_init);
>>> diff --git a/configure b/configure
>>> index 7102964..e4acd9c 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -326,6 +326,7 @@ seccomp=""
>>>   glusterfs=""
>>>   glusterfs_discard="no"
>>>   glusterfs_zerofill="no"
>>> +archipelago=""
>>>   virtio_blk_data_plane=""
>>>   gtk=""
>>>   gtkabi=""
>>> @@ -1087,6 +1088,10 @@ for opt do
>>>     ;;
>>>     --enable-glusterfs) glusterfs="yes"
>>>     ;;
>>> +  --disable-archipelago) archipelago="no"
>>> +  ;;
>>> +  --enable-archipelago) archipelago="yes"
>>> +  ;;
>>>     --disable-virtio-blk-data-plane) virtio_blk_data_plane="no"
>>>     ;;
>>>     --enable-virtio-blk-data-plane) virtio_blk_data_plane="yes"
>>> @@ -1382,6 +1387,8 @@ Advanced options (experts only):
>>>     --enable-coroutine-pool  enable coroutine freelist (better 
>>> performance)
>>>     --enable-glusterfs       enable GlusterFS backend
>>>     --disable-glusterfs      disable GlusterFS backend
>>> +  --enable-archipelago     enable Archipelago backend
>>> +  --disable-archipelago    disable Archipelago backend
>>>     --enable-gcov            enable test coverage analysis with gcov
>>>     --gcov=GCOV              use specified gcov [$gcov_tool]
>>>     --disable-tpm            disable TPM support
>>> @@ -3051,6 +3058,33 @@ EOF
>>>     fi
>>>   fi
>>>   +
>>> +##########################################
>>> +# archipelago probe
>>> +if test "$archipelago" != "no" ; then
>>> +    cat > $TMPC <<EOF
>>> +#include <stdio.h>
>>> +#include <xseg/xseg.h>
>>> +#include <xseg/protocol.h>
>>> +int main(void) {
>>> +    xseg_initialize();
>>> +    return 0;
>>> +}
>>> +EOF
>>> +    archipelago_libs=-lxseg
>>> +    if compile_prog "" "$archipelago_libs"; then
>>> +        archipelago="yes"
>>> +        libs_tools="$archipelago_libs $libs_tools"
>>> +        libs_softmmu="$archipelago_libs $libs_softmmu"
>>> +    else
>>> +      if test "$archipelago" = "yes" ; then
>>> +        feature_not_found "Archipelago backend support" "Install 
>>> libxseg devel"
>>> +      fi
>>> +      archipelago="no"
>>> +    fi
>>> +fi
>>> +
>>> +
>>>   ##########################################
>>>   # glusterfs probe
>>>   if test "$glusterfs" != "no" ; then
>>> @@ -4230,6 +4264,7 @@ echo "seccomp support   $seccomp"
>>>   echo "coroutine backend $coroutine"
>>>   echo "coroutine pool    $coroutine_pool"
>>>   echo "GlusterFS support $glusterfs"
>>> +echo "Archipelago support $archipelago"
>>>   echo "virtio-blk-data-plane $virtio_blk_data_plane"
>>>   echo "gcov              $gcov_tool"
>>>   echo "gcov enabled      $gcov"
>>> @@ -4665,6 +4700,11 @@ if test "$glusterfs_zerofill" = "yes" ; then
>>>     echo "CONFIG_GLUSTERFS_ZEROFILL=y" >> $config_host_mak
>>>   fi
>>>   +if test "$archipelago" = "yes" ; then
>>> +  echo "CONFIG_ARCHIPELAGO=m" >> $config_host_mak
>>> +  echo "ARCHIPELAGO_LIBS=$archipelago_libs" >> $config_host_mak
>>> +fi
>>> +
>>>   if test "$libssh2" = "yes" ; then
>>>     echo "CONFIG_LIBSSH2=m" >> $config_host_mak
>>>     echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
>>> -- 
>>> 1.7.10.4
>>>
>>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 2/5] block/archipelago: Implement bdrv_parse_filename()
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 2/5] block/archipelago: Implement bdrv_parse_filename() Chrysostomos Nanakos
@ 2014-07-21 15:55   ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2014-07-21 15:55 UTC (permalink / raw)
  To: Chrysostomos Nanakos; +Cc: kwolf, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 733 bytes --]

On Fri, Jun 27, 2014 at 11:24:09AM +0300, Chrysostomos Nanakos wrote:
> VM Image on Archipelago volume can also be specified like this:
> 
> file=archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>][:
> segment=<segment_name>]]
> 
> Examples:
> 
> file=archipelago:my_vm_volume
> file=archipelago:my_vm_volume/mport=123
> file=archipelago:my_vm_volume/mport=123:vport=1234
> file=archipelago:my_vm_volume/mport=123:vport=1234:segment=my_segment
> 
> Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
> ---
>  block/archipelago.c |  139 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 137 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 3/5] block/archipelago: Add support for creating images
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 3/5] block/archipelago: Add support for creating images Chrysostomos Nanakos
  2014-07-02 14:01   ` Eric Blake
@ 2014-07-21 16:01   ` Stefan Hajnoczi
  1 sibling, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2014-07-21 16:01 UTC (permalink / raw)
  To: Chrysostomos Nanakos; +Cc: kwolf, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 3646 bytes --]

On Fri, Jun 27, 2014 at 11:24:10AM +0300, Chrysostomos Nanakos wrote:
> +static int qemu_archipelago_create_volume(Error **errp, const char *volname,
> +                                          char *segment_name,
> +                                          uint64_t size, xport mportno,
> +                                          xport vportno)
> +{
> +    int ret, targetlen;
> +    struct xseg *xseg = NULL;
> +    struct xseg_request *req;
> +    struct xseg_request_clone *xclone;
> +    struct xseg_port *port;
> +    xport srcport = NoPort, sport = NoPort;
> +    char *target;
> +
> +    /* Try default values if none has been set */
> +    if (mportno == (xport) -1) {
> +        mportno = 1001;
> +    }
> +
> +    if (vportno == (xport) -1) {
> +        vportno = 501;
> +    }
> +
> +    if (segment_name == NULL) {
> +        segment_name = g_strdup("archipelago");
> +    }
> +
> +    if (xseg_initialize()) {
> +        error_setg(errp, "Cannot initialize XSEG");
> +        return -1;

Leaks segment_name (if internally allocated)

> +    }
> +
> +    xseg = xseg_join((char *)"posix", segment_name,
> +                     (char *)"posixfd", NULL);
> +
> +    if (!xseg) {
> +        error_setg(errp, "Cannot join XSEG shared memory segment");
> +        return -1;

Leaks segment_name (if internally allocated)

> +    }
> +
> +    port = xseg_bind_dynport(xseg);
> +    srcport = port->portno;
> +    init_local_signal(xseg, sport, srcport);
> +
> +    req = xseg_get_request(xseg, srcport, mportno, X_ALLOC);
> +    if (!req) {
> +        error_setg(errp, "Cannot get XSEG request");
> +        return -1;

Leaks segment_name (if internally allocated)

> +    }
> +
> +    targetlen = strlen(volname);
> +    ret = xseg_prep_request(xseg, req, targetlen,
> +                            sizeof(struct xseg_request_clone));
> +    if (ret < 0) {
> +        error_setg(errp, "Cannot prepare XSEG request");
> +        goto err_exit;
> +    }
> +
> +    target = xseg_get_target(xseg, req);
> +    if (!target) {
> +        error_setg(errp, "Cannot get XSEG target.\n");
> +        goto err_exit;
> +    }
> +    memcpy(target, volname, targetlen);
> +    xclone = (struct xseg_request_clone *) xseg_get_data(xseg, req);
> +    memset(xclone->target, 0 , XSEG_MAX_TARGETLEN);
> +    xclone->targetlen = 0;
> +    xclone->size = size;
> +    req->offset = 0;
> +    req->size = req->datalen;
> +    req->op = X_CLONE;
> +
> +    xport p = xseg_submit(xseg, req, srcport, X_ALLOC);
> +    if (p == NoPort) {
> +        error_setg(errp, "Could not submit XSEG request");
> +        goto err_exit;
> +    }
> +    xseg_signal(xseg, p);
> +
> +    ret = wait_reply(xseg, srcport, port, req);
> +    if (ret < 0) {
> +        error_setg(errp, "wait_reply() error.");
> +    }
> +
> +    xseg_put_request(xseg, req, srcport);
> +    xseg_quit_local_signal(xseg, srcport);
> +    xseg_leave_dynport(xseg, port);
> +    xseg_leave(xseg);
> +    return ret;
> +
> +err_exit:
> +    xseg_put_request(xseg, req, srcport);
> +    xseg_quit_local_signal(xseg, srcport);
> +    xseg_leave_dynport(xseg, port);
> +    xseg_leave(xseg);
> +    return -1;

Leaks segment_name (if internally allocated)

> +    /* Create an Archipelago volume */
> +    ret = qemu_archipelago_create_volume(errp, volname, segment_name,
> +                                         total_size, mport,
> +                                         vport);
> +
> +    if (volname) {
> +        g_free(volname);
> +    }
> +    if (segment_name) {
> +        g_free(segment_name);
> +    }

g_free(NULL) is a nop.  The if statement isn't needed to check NULL
pointers.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 5/5] qemu-iotests: add support for Archipelago protocol
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 5/5] qemu-iotests: add support for Archipelago protocol Chrysostomos Nanakos
@ 2014-07-21 16:02   ` Stefan Hajnoczi
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2014-07-21 16:02 UTC (permalink / raw)
  To: Chrysostomos Nanakos; +Cc: kwolf, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 333 bytes --]

On Fri, Jun 27, 2014 at 11:24:12AM +0300, Chrysostomos Nanakos wrote:
> Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
> ---
>  tests/qemu-iotests/common    |    6 ++++++
>  tests/qemu-iotests/common.rc |    9 ++++++++-
>  2 files changed, 14 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/5] block: Support Archipelago as a QEMU block backend
  2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 1/5] block: " Chrysostomos Nanakos
  2014-07-02 13:59   ` Eric Blake
  2014-07-10  0:23   ` Jeff Cody
@ 2014-07-22 12:35   ` Stefan Hajnoczi
  2 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2014-07-22 12:35 UTC (permalink / raw)
  To: Chrysostomos Nanakos; +Cc: kwolf, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 469 bytes --]

On Fri, Jun 27, 2014 at 11:24:08AM +0300, Chrysostomos Nanakos wrote:
> +    xseg_set_req_data(s->xseg, req, reqdata);
> +    if (op == ARCHIP_OP_WRITE) {
> +        data = xseg_get_data(s->xseg, req);
> +        if (!data) {
> +            archipelagolog("Cannot get XSEG data\n");
> +            goto err_exit;
> +        }
> +        memcpy(data, aio_cb->buffer + bufidx, count);
> +    }

Can you avoid ->buffer and use iov_to_buf() or qemu_iovec_to_buf()
instead?

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/5] block: Support Archipelago as a QEMU block backend
  2014-07-10 10:04     ` Chrysostomos Nanakos
  2014-07-10 14:02       ` Chrysostomos Nanakos
@ 2014-07-22 12:40       ` Stefan Hajnoczi
  1 sibling, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2014-07-22 12:40 UTC (permalink / raw)
  To: Chrysostomos Nanakos; +Cc: kwolf, Jeff Cody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1850 bytes --]

On Thu, Jul 10, 2014 at 01:04:54PM +0300, Chrysostomos Nanakos wrote:
> On 07/10/2014 03:23 AM, Jeff Cody wrote:
> >On Fri, Jun 27, 2014 at 11:24:08AM +0300, Chrysostomos Nanakos wrote:
> >>+err_exit:
> >>+    __sync_add_and_fetch(&segreq->failed, 1);
> >>+    if (segments_nr == 1) {
> >>+        if (__sync_add_and_fetch(&segreq->ref, -1) == 0) {
> >>+            g_free(segreq);
> >>+        }
> >>+    } else {
> >>+        if ((__sync_add_and_fetch(&segreq->ref, -segments_nr + i)) == 0) {
> >>+            g_free(segreq);
> >>+        }
> >>+    }
> >Don't we run the risk of leaking segreq here?  The other place this is
> >freed is in xseg_request_handler(), but could we run into a race
> >condition where 's->stopping' is true, or even xseg_receive() just does not
> >return a request?
> 
> If 's->stopping' is true means that _close() has been invoked. How QEMU
> handles unserviced requests while in the meantime someone invokes _close()?
> Does it wait for the requests to finish and then exits? Or it exits silently
> without checking for pending requests?
> 
> If xseg_receive() does not return an already submitted request then the
> problem is located in Archipelago stack. Someone should check why the
> pending requests are not serviced and resolve the problem. The question here
> is the same as before, how QEMU handles pending requests while in the
> meantime invokes _close()?
> 
> Until all pending requests are serviced successfully or not, segreq
> allocations will remain and not freed. Another approach could have been a
> linked list that tracks all submitted requests and handle them accordingly
> on _close().
> 
> Suggestions here are more than welcome!

bdrv_close() drains all requests before invoking .bdrv_close().  I think
there is no race condition in that case.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-07-22 12:40 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-27  8:24 [Qemu-devel] [PATCH v6 0/5] Support Archipelago as a QEMU block backend Chrysostomos Nanakos
2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 1/5] block: " Chrysostomos Nanakos
2014-07-02 13:59   ` Eric Blake
2014-07-02 14:18     ` Chrysostomos Nanakos
2014-07-02 14:30       ` Eric Blake
2014-07-10  0:23   ` Jeff Cody
2014-07-10 10:04     ` Chrysostomos Nanakos
2014-07-10 14:02       ` Chrysostomos Nanakos
2014-07-22 12:40       ` Stefan Hajnoczi
2014-07-22 12:35   ` Stefan Hajnoczi
2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 2/5] block/archipelago: Implement bdrv_parse_filename() Chrysostomos Nanakos
2014-07-21 15:55   ` Stefan Hajnoczi
2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 3/5] block/archipelago: Add support for creating images Chrysostomos Nanakos
2014-07-02 14:01   ` Eric Blake
2014-07-02 14:06     ` Chrysostomos Nanakos
2014-07-21 16:01   ` Stefan Hajnoczi
2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 4/5] QMP: Add support for Archipelago Chrysostomos Nanakos
2014-07-02 13:58   ` Eric Blake
2014-07-02 14:11     ` Chrysostomos Nanakos
2014-07-02 14:22       ` Eric Blake
2014-06-27  8:24 ` [Qemu-devel] [PATCH v6 5/5] qemu-iotests: add support for Archipelago protocol Chrysostomos Nanakos
2014-07-21 16:02   ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.