[Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices
@ 2015-12-16 16:55 Alex Pyrgiotis
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic Alex Pyrgiotis
                   ` (9 more replies)
  0 siblings, 10 replies; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-16 16:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

Hi all,

This patch is an attempt to boost the performance of "scsi-generic" and
"scsi-block" device types, by removing an extra data copy and reducing
their memory footprint. More specifically, the problem lies in the
functions in the `scsi-generic_req_ops` struct of scsi-generic.c. These
functions rely on an intermediate buffer to do the SG_IO ioctl request,
without checking if the SCSI controller has provided a scatter-gather
list with the request.

In a nutshell, our proposal is to map the provided scatter-gather list
(if any) to the address space of the QEMU process and use the resulting
iovec as the buffer for the ioctl request. You'll find that the logic is
quite similar to the one used in scsi-disk.c.

Cheers,
Alex

Alex Pyrgiotis (9):
  dma-helpers: Expose the sg mapping logic
  dma-helpers: Add support for ioctl operations
  dma-helpers: Do not truncate small qiovs
  scsi-generic: Add common functions
  scsi-generic: Separate `sg_io_hdr' initializations
  scsi-generic: Make request execution buf-specific
  scsi-generic: Make data-copying logic clearer
  scsi-generic: Factor out response interception
  scsi-generic: Allow full scatter-gather support

 dma-helpers.c          | 184 ++++++++++++++++++---
 hw/scsi/scsi-generic.c | 422 +++++++++++++++++++++++++++++++++++++------------
 include/sysemu/dma.h   |   8 +
 trace-events           |   4 +-
 4 files changed, 488 insertions(+), 130 deletions(-)

-- 
2.6.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
  2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
@ 2015-12-16 16:55 ` Alex Pyrgiotis
  2016-02-11 11:17   ` Paolo Bonzini
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 2/9] dma-helpers: Add support for ioctl operations Alex Pyrgiotis
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-16 16:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

The mapping of scatter-gather lists from physical addresses (as
perceived by the guest kernel) to the virtual address space of the QEMU
process is a vital step for a DMA operation. This step is currently
implemented, amongst other things, in dma_blk_cb(), making it impossible
to be used by anyone else.

In order to pave the way for the DMA support of ioctl commands, expose
the aforementioned logic in a separate function called "dma_map_sg".
Also, expose some other important pieces too, such as the initialization
of the dbs structure.

Signed-off-by: Alex Pyrgiotis <apyrgio@arrikto.com>
Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>

diff --git a/dma-helpers.c b/dma-helpers.c
index 4faec5d..c38661e 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -79,9 +79,10 @@ typedef struct {
     QEMUIOVector iov;
     QEMUBH *bh;
     DMAIOFunc *io_func;
+    BlockCompletionFunc *dma_cb;
 } DMAAIOCB;
 
-static void dma_blk_cb(void *opaque, int ret);
+static void dma_blk_io_cb(void *opaque, int ret);
 
 static void reschedule_dma(void *opaque)
 {
@@ -89,7 +90,7 @@ static void reschedule_dma(void *opaque)
 
     qemu_bh_delete(dbs->bh);
     dbs->bh = NULL;
-    dma_blk_cb(dbs, 0);
+    dbs->dma_cb(dbs, 0);
 }
 
 static void dma_blk_unmap(DMAAIOCB *dbs)
@@ -120,21 +121,19 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
     qemu_aio_unref(dbs);
 }
 
-static void dma_blk_cb(void *opaque, int ret)
+/*
+ * Create a QEMUIOVector from a scatter-gather list.
+ *
+ * This function does not copy the data of the scatter-gather list. Instead, it
+ * uses the dma_memory_map() function to map physical memory regions of the
+ * virtual device (as interpreted by the guest kernel) into the address space
+ * of the QEMU process, in order to have access to the data.
+ */
+static void dma_map_sg(DMAAIOCB *dbs)
 {
-    DMAAIOCB *dbs = (DMAAIOCB *)opaque;
     dma_addr_t cur_addr, cur_len;
     void *mem;
 
-    trace_dma_blk_cb(dbs, ret);
-
-    dbs->acb = NULL;
-    dbs->sector_num += dbs->iov.size / 512;
-
-    if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
-        dma_complete(dbs, ret);
-        return;
-    }
     dma_blk_unmap(dbs);
 
     while (dbs->sg_cur_index < dbs->sg->nsg) {
@@ -162,9 +161,38 @@ static void dma_blk_cb(void *opaque, int ret)
     if (dbs->iov.size & ~BDRV_SECTOR_MASK) {
         qemu_iovec_discard_back(&dbs->iov, dbs->iov.size & ~BDRV_SECTOR_MASK);
     }
+}
+
+/*
+ * Callback function for DMA read/write operations.
+ *
+ * This function initiates the read/write operation and also acts as a
+ * completion callback. It uses the dma_map_sg() function to map the
+ * scatter-gather list to a QEMUIOVector and then passes this iovec to the
+ * underlying read/write I/O function.
+ *
+ * If the DMA operation cannot take place in one step, e.g. it couldn't map all
+ * the scatter-gather entries, then this function will do a partial I/O
+ * operation and once done, it will be called and will retry the I/O operation.
+ */
+static void dma_blk_io_cb(void *opaque, int ret)
+{
+    DMAAIOCB *dbs = (DMAAIOCB *)opaque;
+
+    trace_dma_blk_io_cb(dbs, ret);
+
+    dbs->acb = NULL;
+    dbs->sector_num += dbs->iov.size / 512;
+
+    if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
+        dma_complete(dbs, ret);
+        return;
+    }
+
+    dma_map_sg(dbs);
 
     dbs->acb = dbs->io_func(dbs->blk, dbs->sector_num, &dbs->iov,
-                            dbs->iov.size / 512, dma_blk_cb, dbs);
+                            dbs->iov.size / 512, dma_blk_io_cb, dbs);
     assert(dbs->acb);
 }
 
@@ -190,6 +218,22 @@ static const AIOCBInfo dma_aiocb_info = {
     .cancel_async       = dma_aio_cancel,
 };
 
+/*
+ * Initialize the dbs structure and the QEMUIOVector to sane defaults.
+ */
+static void dma_init_dbs(DMAAIOCB *dbs, BlockBackend *blk, QEMUSGList *sg,
+                         DMADirection dir)
+{
+    dbs->acb = NULL;
+    dbs->blk = blk;
+    dbs->sg = sg;
+    dbs->sg_cur_index = 0;
+    dbs->sg_cur_byte = 0;
+    dbs->dir = dir;
+    dbs->bh = NULL;
+    qemu_iovec_init(&dbs->iov, sg->nsg);
+}
+
 BlockAIOCB *dma_blk_io(
     BlockBackend *blk, QEMUSGList *sg, uint64_t sector_num,
     DMAIOFunc *io_func, BlockCompletionFunc *cb,
@@ -199,21 +243,14 @@ BlockAIOCB *dma_blk_io(
 
     trace_dma_blk_io(dbs, blk, sector_num, (dir == DMA_DIRECTION_TO_DEVICE));
 
-    dbs->acb = NULL;
-    dbs->blk = blk;
-    dbs->sg = sg;
+    dma_init_dbs(dbs, blk, sg, dir);
     dbs->sector_num = sector_num;
-    dbs->sg_cur_index = 0;
-    dbs->sg_cur_byte = 0;
-    dbs->dir = dir;
     dbs->io_func = io_func;
-    dbs->bh = NULL;
-    qemu_iovec_init(&dbs->iov, sg->nsg);
-    dma_blk_cb(dbs, 0);
+    dbs->dma_cb = dma_blk_io_cb;
+    dbs->dma_cb(dbs, 0);
     return &dbs->common;
 }
 
-
 BlockAIOCB *dma_blk_read(BlockBackend *blk,
                          QEMUSGList *sg, uint64_t sector,
                          void (*cb)(void *opaque, int ret), void *opaque)
diff --git a/trace-events b/trace-events
index 2fce98e..120cdd4 100644
--- a/trace-events
+++ b/trace-events
@@ -1127,7 +1127,7 @@ win_helper_retry(uint32_t tl) "tl=%d"
 dma_blk_io(void *dbs, void *bs, int64_t sector_num, bool to_dev) "dbs=%p bs=%p sector_num=%" PRId64 " to_dev=%d"
 dma_aio_cancel(void *dbs) "dbs=%p"
 dma_complete(void *dbs, int ret, void *cb) "dbs=%p ret=%d cb=%p"
-dma_blk_cb(void *dbs, int ret) "dbs=%p ret=%d"
+dma_blk_io_cb(void *dbs, int ret) "dbs=%p ret=%d"
 dma_map_wait(void *dbs) "dbs=%p"
 
 # ui/console.c
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 2/9] dma-helpers: Add support for ioctl operations
  2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic Alex Pyrgiotis
@ 2015-12-16 16:55 ` Alex Pyrgiotis
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 3/9] dma-helpers: Do not truncate small qiovs Alex Pyrgiotis
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-16 16:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

Allow ioctl operations to benefit from the DMA functionality created for
the read/write operations. More specifically, create a function called
"dma_blk_ioctl" that uses the existing code for mapping scatter-gather
lists to qiovs and ultimately calls the blk_aio_ioctl() function to
perform the actual ioctl operation.

Also, in order to use the qiov in the ioctl request, the user
can specify a qiov handler when calling dma_blk_ioctl(). This handler
will be called after the creation of the qiov and before the actual
ioctl request. This allows the user to update the ioctl request with the
iovec stored in the qiov.

Signed-off-by: Alex Pyrgiotis <apyrgio@arrikto.com>
Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>

diff --git a/dma-helpers.c b/dma-helpers.c
index c38661e..e1ea7b3 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -79,10 +79,14 @@ typedef struct {
     QEMUIOVector iov;
     QEMUBH *bh;
     DMAIOFunc *io_func;
+    unsigned int ioctl_req_type;
+    void *ioctl_req;
+    void (*handle_iov)(void *opaque, QEMUIOVector *iov);
     BlockCompletionFunc *dma_cb;
 } DMAAIOCB;
 
 static void dma_blk_io_cb(void *opaque, int ret);
+static void dma_blk_ioctl_cb(void *opaque, int ret);
 
 static void reschedule_dma(void *opaque)
 {
@@ -196,6 +200,42 @@ static void dma_blk_io_cb(void *opaque, int ret)
     assert(dbs->acb);
 }
 
+/*
+ * Callback function for DMA ioctl operations.
+ *
+ * This function is similar to the dma_blk_io_cb() function. The only addition
+ * is that it checks if a caller has specified a handler resulting iovec, and
+ * calls it before starting the ioctl operation.
+ */
+static void dma_blk_ioctl_cb(void *opaque, int ret)
+{
+    DMAAIOCB *dbs = (DMAAIOCB *)opaque;
+
+    trace_dma_blk_ioctl_cb(dbs, ret);
+
+    dbs->acb = NULL;
+
+    if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
+        dma_complete(dbs, ret);
+        return;
+    }
+
+    dma_map_sg(dbs);
+
+    /*
+     * If the user has specified a handler for the resulting iovec, call it
+     * before starting the ioctl.
+     */
+    if (dbs->handle_iov) {
+        dbs->handle_iov(dbs->common.opaque, &dbs->iov);
+    }
+
+    dbs->acb = blk_aio_ioctl(dbs->blk, dbs->ioctl_req_type, dbs->ioctl_req,
+            dma_blk_ioctl_cb, dbs);
+
+    assert(dbs->acb);
+}
+
 static void dma_aio_cancel(BlockAIOCB *acb)
 {
     DMAAIOCB *dbs = container_of(acb, DMAAIOCB, common);
@@ -267,6 +307,54 @@ BlockAIOCB *dma_blk_write(BlockBackend *blk,
                       DMA_DIRECTION_TO_DEVICE);
 }
 
+/**
+ * dma_blk_ioctl - perform an ioctl request using a scatter-gather list
+ * @blk:             The BlockBackend struct of the underlying device
+ * @ioctl_req_type:  The type of the control function to perform
+ * @ioctl_req:       The payload for the action
+ * @dir:             The direction of the DMA request
+ * @iov_cb:          The callback for handling the qiov that results from the
+ *                   mapping of the scatter gather lists.
+ *                   Expected parameters:
+ *                   @ opaque:          The opaque struct that is provided
+ *                                      with this call (see below)
+ *                   @ iov:             The QEMUIOVector that is mapped to the
+ *                                      scatter-gather list
+ * @complete_cb:     The completion callback for this request.
+ *                   Expected parameters:
+ *                   @ opaque:          The opaque struct that is provided
+ *                                      with this call (see below)
+ *                   @ ret:             The return value of the ioctl call
+ * @opaque:          Private data of the caller
+ *
+ * Description:
+ *     This function calls dma_map_sg, which maps the provided scatter-gather
+ *     list to a qiov. Next, the iov_cb handler is called with the `opaque' and
+ *     qiov as arguments. Then, the ioctl request is enqueued and the function
+ *     returns.  The caller will be notified about the completion of the
+ *     request with the complete_cb handler.
+ */
+BlockAIOCB *dma_blk_ioctl(BlockBackend *blk,
+                          unsigned long int ioctl_req_type,
+                          void *ioctl_req,
+                          QEMUSGList *sg,
+                          DMADirection dir,
+                          void (*iov_cb)(void *opaque, QEMUIOVector *iov),
+                          void (*complete_cb)(void *opaque, int ret),
+                          void *opaque)
+{
+    DMAAIOCB *dbs = blk_aio_get(&dma_aiocb_info, blk, complete_cb, opaque);
+
+    trace_dma_blk_ioctl(dbs, blk, (dir == DMA_DIRECTION_TO_DEVICE));
+
+    dma_init_dbs(dbs, blk, sg, dir);
+    dbs->handle_iov = iov_cb;
+    dbs->ioctl_req = ioctl_req;
+    dbs->ioctl_req_type = ioctl_req_type;
+    dbs->dma_cb = dma_blk_ioctl_cb;
+    dbs->dma_cb(dbs, 0);
+    return &dbs->common;
+}
 
 static uint64_t dma_buf_rw(uint8_t *ptr, int32_t len, QEMUSGList *sg,
                            DMADirection dir)
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index efa8b99..aaf561d 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -212,6 +212,14 @@ BlockAIOCB *dma_blk_read(BlockBackend *blk,
 BlockAIOCB *dma_blk_write(BlockBackend *blk,
                           QEMUSGList *sg, uint64_t sector,
                           BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *dma_blk_ioctl(BlockBackend *blk,
+                          unsigned long int ioctl_req_type,
+                          void *ioctl_req,
+                          QEMUSGList *sg,
+                          DMADirection dir,
+                          void (*iov_cb)(void *opaque, QEMUIOVector *iov),
+                          void (*complete_cb)(void *opaque, int ret),
+                          void *opaque);
 uint64_t dma_buf_read(uint8_t *ptr, int32_t len, QEMUSGList *sg);
 uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg);
 
diff --git a/trace-events b/trace-events
index 120cdd4..0cee005 100644
--- a/trace-events
+++ b/trace-events
@@ -1125,9 +1125,11 @@ win_helper_retry(uint32_t tl) "tl=%d"
 
 # dma-helpers.c
 dma_blk_io(void *dbs, void *bs, int64_t sector_num, bool to_dev) "dbs=%p bs=%p sector_num=%" PRId64 " to_dev=%d"
+dma_blk_ioctl(void *dbs, void *bs, bool to_dev) "dbs=%p bs=%p to_dev=%d"
 dma_aio_cancel(void *dbs) "dbs=%p"
 dma_complete(void *dbs, int ret, void *cb) "dbs=%p ret=%d cb=%p"
 dma_blk_io_cb(void *dbs, int ret) "dbs=%p ret=%d"
+dma_blk_ioctl_cb(void *dbs, int ret) "dbs=%p ret=%d"
 dma_map_wait(void *dbs) "dbs=%p"
 
 # ui/console.c
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 3/9] dma-helpers: Do not truncate small qiovs
  2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic Alex Pyrgiotis
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 2/9] dma-helpers: Add support for ioctl operations Alex Pyrgiotis
@ 2015-12-16 16:55 ` Alex Pyrgiotis
  2016-02-11 11:08   ` Paolo Bonzini
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 4/9] scsi-generic: Add common functions Alex Pyrgiotis
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-16 16:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

If the size of the qiov is smaller than the sector size, do not truncate
the qiov, which would effectively make it empty. Instead, allow it to
pass as is.

This is necessary for SCSI requests like READ CAPACITY which have small
buffers, e.g. 32 bytes.

Signed-off-by: Alex Pyrgiotis <apyrgio@arrikto.com>
Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>

diff --git a/dma-helpers.c b/dma-helpers.c
index e1ea7b3..b8f2ae0 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -162,7 +162,16 @@ static void dma_map_sg(DMAAIOCB *dbs)
         return;
     }
 
-    if (dbs->iov.size & ~BDRV_SECTOR_MASK) {
+    /*
+     * If the size of the qiov is not a multiple of the sector size, truncate
+     * the qiov.
+     *
+     * NOTE: If the qiov is less than a sector, we can assume that there is a
+     * reason for it, e.g., a SCSI request such as READ CAPACITY, and we should
+     * not truncate it.
+     */
+    if (dbs->iov.size & ~BDRV_SECTOR_MASK &&
+            dbs->iov.size > BDRV_SECTOR_SIZE) {
         qemu_iovec_discard_back(&dbs->iov, dbs->iov.size & ~BDRV_SECTOR_MASK);
     }
 }
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 4/9] scsi-generic: Add common functions
  2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
                   ` (2 preceding siblings ...)
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 3/9] dma-helpers: Do not truncate small qiovs Alex Pyrgiotis
@ 2015-12-16 16:55 ` Alex Pyrgiotis
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 5/9] scsi-generic: Separate `sg_io_hdr' initializations Alex Pyrgiotis
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-16 16:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

In the `scsi_generic_req_ops' struct, instead of pointing to the
implementations of read_data/write_data/send_command, point to wrappers
around these functions, prefixed with "common_".

Also, introduce the concept of "buffer" operations. Buffer operations
are the read/write operations that rely on an intermediate buffer
to do a request. This is the current mode, so we prefix the existing
implementations with "buf_". This paves the way for the introduction of
sg operations, which are operations that rely on scatter-gather lists.

These "common" functions can be used later on as a common entry point
for the read/write operations, in which we can decide whether the device
will use an intermediate buffer for requests or the controller's
scatter-gather list.

Signed-off-by: Alex Pyrgiotis <apyrgio@arrikto.com>
Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index a4626f7..f24f472 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -175,7 +175,7 @@ static int execute_command(BlockBackend *blk,
     return 0;
 }
 
-static void scsi_read_complete(void * opaque, int ret)
+static void scsi_buf_read_complete(void *opaque, int ret)
 {
     SCSIGenericReq *r = (SCSIGenericReq *)opaque;
     SCSIDevice *s = r->req.dev;
@@ -229,13 +229,13 @@ static void scsi_read_complete(void * opaque, int ret)
 }
 
 /* Read more data from scsi device into buffer.  */
-static void scsi_read_data(SCSIRequest *req)
+static void scsi_buf_read_data(SCSIRequest *req)
 {
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
     SCSIDevice *s = r->req.dev;
     int ret;
 
-    DPRINTF("scsi_read_data 0x%x\n", req->tag);
+    DPRINTF("scsi_buf_read_data 0x%x\n", req->tag);
 
     /* The request is used as the AIO opaque value, so add a ref.  */
     scsi_req_ref(&r->req);
@@ -245,13 +245,18 @@ static void scsi_read_data(SCSIRequest *req)
     }
 
     ret = execute_command(s->conf.blk, r, SG_DXFER_FROM_DEV,
-                          scsi_read_complete);
+                          scsi_buf_read_complete);
     if (ret < 0) {
         scsi_command_complete_noio(r, ret);
     }
 }
 
-static void scsi_write_complete(void * opaque, int ret)
+static void scsi_common_read_data(SCSIRequest *req)
+{
+    scsi_buf_read_data(req);
+}
+
+static void scsi_buf_write_complete(void *opaque, int ret)
 {
     SCSIGenericReq *r = (SCSIGenericReq *)opaque;
     SCSIDevice *s = r->req.dev;
@@ -277,13 +282,13 @@ static void scsi_write_complete(void * opaque, int ret)
 
 /* Write data to a scsi device.  Returns nonzero on failure.
    The transfer may complete asynchronously.  */
-static void scsi_write_data(SCSIRequest *req)
+static void scsi_buf_write_data(SCSIRequest *req)
 {
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
     SCSIDevice *s = r->req.dev;
     int ret;
 
-    DPRINTF("scsi_write_data 0x%x\n", req->tag);
+    DPRINTF("scsi_buf_write_data 0x%x\n", req->tag);
     if (r->len == 0) {
         r->len = r->buflen;
         scsi_req_data(&r->req, r->len);
@@ -292,12 +297,18 @@ static void scsi_write_data(SCSIRequest *req)
 
     /* The request is used as the AIO opaque value, so add a ref.  */
     scsi_req_ref(&r->req);
-    ret = execute_command(s->conf.blk, r, SG_DXFER_TO_DEV, scsi_write_complete);
+    ret = execute_command(s->conf.blk, r, SG_DXFER_TO_DEV,
+            scsi_buf_write_complete);
     if (ret < 0) {
         scsi_command_complete_noio(r, ret);
     }
 }
 
+static void scsi_common_write_data(SCSIRequest *req)
+{
+    scsi_buf_write_data(req);
+}
+
 /* Return a pointer to the data buffer.  */
 static uint8_t *scsi_get_buf(SCSIRequest *req)
 {
@@ -311,7 +322,7 @@ static uint8_t *scsi_get_buf(SCSIRequest *req)
    (eg. disk reads), negative for transfers to the device (eg. disk writes),
    and zero if the command does not transfer any data.  */
 
-static int32_t scsi_send_command(SCSIRequest *req, uint8_t *cmd)
+static int32_t scsi_common_send_command(SCSIRequest *req, uint8_t *cmd)
 {
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
     SCSIDevice *s = r->req.dev;
@@ -466,9 +477,9 @@ static void scsi_generic_realize(SCSIDevice *s, Error **errp)
 const SCSIReqOps scsi_generic_req_ops = {
     .size         = sizeof(SCSIGenericReq),
     .free_req     = scsi_free_request,
-    .send_command = scsi_send_command,
-    .read_data    = scsi_read_data,
-    .write_data   = scsi_write_data,
+    .send_command = scsi_common_send_command,
+    .read_data    = scsi_common_read_data,
+    .write_data   = scsi_common_write_data,
     .get_buf      = scsi_get_buf,
     .load_request = scsi_generic_load_request,
     .save_request = scsi_generic_save_request,
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 5/9] scsi-generic: Separate `sg_io_hdr' initializations
  2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
                   ` (3 preceding siblings ...)
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 4/9] scsi-generic: Add common functions Alex Pyrgiotis
@ 2015-12-16 16:55 ` Alex Pyrgiotis
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 6/9] scsi-generic: Make request execution buf-specific Alex Pyrgiotis
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-16 16:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

Separate the initialization of the `sg_io_hdr' struct in two parts: one
part that fills the struct with sane defaults, and another part that
prepares it for an SG_IO request with DIRECT IO and a single buffer. The
first part can also be reused later on by the code that uses
scatter-gather lists.

Signed-off-by: Alex Pyrgiotis <apyrgio@arrikto.com>
Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index f24f472..71c0110 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -58,6 +58,30 @@ typedef struct SCSIGenericReq {
     sg_io_hdr_t io_header;
 } SCSIGenericReq;
 
+/* Fill an io header with sane defaults. */
+static void _init_io_header(SCSIGenericReq *r, int direction)
+{
+    r->io_header.interface_id = 'S';
+    r->io_header.dxfer_direction = direction;
+    r->io_header.dxfer_len = r->buflen;
+    r->io_header.cmdp = r->req.cmd.buf;
+    r->io_header.cmd_len = r->req.cmd.len;
+    r->io_header.mx_sb_len = sizeof(r->req.sense);
+    r->io_header.sbp = r->req.sense;
+    r->io_header.timeout = MAX_UINT;
+    r->io_header.usr_ptr = r;
+}
+
+/* Create an io_header for buf* operations. */
+static void scsi_buf_init_io_header(SCSIGenericReq *r, int direction)
+{
+    _init_io_header(r, direction);
+
+    r->io_header.iovec_count = 0;
+    r->io_header.dxferp = r->buf;
+    r->io_header.flags |= SG_FLAG_DIRECT_IO;
+}
+
 static void scsi_generic_save_request(QEMUFile *f, SCSIRequest *req)
 {
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
@@ -155,17 +179,7 @@ static int execute_command(BlockBackend *blk,
                            SCSIGenericReq *r, int direction,
                            BlockCompletionFunc *complete)
 {
-    r->io_header.interface_id = 'S';
-    r->io_header.dxfer_direction = direction;
-    r->io_header.dxferp = r->buf;
-    r->io_header.dxfer_len = r->buflen;
-    r->io_header.cmdp = r->req.cmd.buf;
-    r->io_header.cmd_len = r->req.cmd.len;
-    r->io_header.mx_sb_len = sizeof(r->req.sense);
-    r->io_header.sbp = r->req.sense;
-    r->io_header.timeout = MAX_UINT;
-    r->io_header.usr_ptr = r;
-    r->io_header.flags |= SG_FLAG_DIRECT_IO;
+    scsi_buf_init_io_header(r, direction);
 
     r->req.aiocb = blk_aio_ioctl(blk, SG_IO, &r->io_header, complete, r);
     if (r->req.aiocb == NULL) {
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 6/9] scsi-generic: Make request execution buf-specific
  2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
                   ` (4 preceding siblings ...)
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 5/9] scsi-generic: Separate `sg_io_hdr' initializations Alex Pyrgiotis
@ 2015-12-16 16:55 ` Alex Pyrgiotis
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 7/9] scsi-generic: Make data-copying logic clearer Alex Pyrgiotis
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-16 16:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

Move the request execution logic from execute_command() to
scsi_buf_do_request(), since the way the io header is initialized and
the ioctl is performed is used only for requests that use an
intermediate buffer.

For now, the above is the only request type, but we need to pave the way
for the support of requests with scatter-gather lists.

Signed-off-by: Alex Pyrgiotis <apyrgio@arrikto.com>
Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 71c0110..8e2058d 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -175,20 +175,6 @@ static void scsi_command_complete(void *opaque, int ret)
     scsi_command_complete_noio(r, ret);
 }
 
-static int execute_command(BlockBackend *blk,
-                           SCSIGenericReq *r, int direction,
-                           BlockCompletionFunc *complete)
-{
-    scsi_buf_init_io_header(r, direction);
-
-    r->req.aiocb = blk_aio_ioctl(blk, SG_IO, &r->io_header, complete, r);
-    if (r->req.aiocb == NULL) {
-        return -EIO;
-    }
-
-    return 0;
-}
-
 static void scsi_buf_read_complete(void *opaque, int ret)
 {
     SCSIGenericReq *r = (SCSIGenericReq *)opaque;
@@ -242,12 +228,33 @@ static void scsi_buf_read_complete(void *opaque, int ret)
     scsi_req_unref(&r->req);
 }
 
+/*
+ * Execute the request using an intermediate buffer.
+ *
+ * This function does the following:
+ *
+ * a. Initialize the io header for the ioctl request.
+ * b. Perform the ioctl request using blk_aio_ioctl().
+ */
+static void scsi_buf_do_request(SCSIGenericReq *r, int direction,
+        BlockCompletionFunc *complete)
+{
+    SCSIDevice *s = r->req.dev;
+
+    scsi_buf_init_io_header(r, direction);
+
+    r->req.aiocb = blk_aio_ioctl(s->conf.blk, SG_IO,
+            &r->io_header, complete, r);
+
+    if (!r->req.aiocb) {
+        scsi_command_complete_noio(r, -EIO);
+    }
+}
+
 /* Read more data from scsi device into buffer.  */
 static void scsi_buf_read_data(SCSIRequest *req)
 {
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
-    SCSIDevice *s = r->req.dev;
-    int ret;
 
     DPRINTF("scsi_buf_read_data 0x%x\n", req->tag);
 
@@ -258,11 +265,7 @@ static void scsi_buf_read_data(SCSIRequest *req)
         return;
     }
 
-    ret = execute_command(s->conf.blk, r, SG_DXFER_FROM_DEV,
-                          scsi_buf_read_complete);
-    if (ret < 0) {
-        scsi_command_complete_noio(r, ret);
-    }
+    scsi_buf_do_request(r, SG_DXFER_FROM_DEV, scsi_buf_read_complete);
 }
 
 static void scsi_common_read_data(SCSIRequest *req)
@@ -299,8 +302,6 @@ static void scsi_buf_write_complete(void *opaque, int ret)
 static void scsi_buf_write_data(SCSIRequest *req)
 {
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
-    SCSIDevice *s = r->req.dev;
-    int ret;
 
     DPRINTF("scsi_buf_write_data 0x%x\n", req->tag);
     if (r->len == 0) {
@@ -311,11 +312,8 @@ static void scsi_buf_write_data(SCSIRequest *req)
 
     /* The request is used as the AIO opaque value, so add a ref.  */
     scsi_req_ref(&r->req);
-    ret = execute_command(s->conf.blk, r, SG_DXFER_TO_DEV,
-            scsi_buf_write_complete);
-    if (ret < 0) {
-        scsi_command_complete_noio(r, ret);
-    }
+
+    scsi_buf_do_request(r, SG_DXFER_TO_DEV, scsi_buf_write_complete);
 }
 
 static void scsi_common_write_data(SCSIRequest *req)
@@ -339,8 +337,6 @@ static uint8_t *scsi_get_buf(SCSIRequest *req)
 static int32_t scsi_common_send_command(SCSIRequest *req, uint8_t *cmd)
 {
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
-    SCSIDevice *s = r->req.dev;
-    int ret;
 
 #ifdef DEBUG_SCSI
     {
@@ -358,12 +354,7 @@ static int32_t scsi_common_send_command(SCSIRequest *req, uint8_t *cmd)
         r->buf = NULL;
         /* The request is used as the AIO opaque value, so add a ref.  */
         scsi_req_ref(&r->req);
-        ret = execute_command(s->conf.blk, r, SG_DXFER_NONE,
-                              scsi_command_complete);
-        if (ret < 0) {
-            scsi_command_complete_noio(r, ret);
-            return 0;
-        }
+        scsi_buf_do_request(r, SG_DXFER_NONE, scsi_command_complete);
         return 0;
     }
 
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 7/9] scsi-generic: Make data-copying logic clearer
  2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
                   ` (5 preceding siblings ...)
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 6/9] scsi-generic: Make request execution buf-specific Alex Pyrgiotis
@ 2015-12-16 16:55 ` Alex Pyrgiotis
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 8/9] scsi-generic: Factor out response interception Alex Pyrgiotis
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-16 16:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

The copying of data to/from the intermediate buffer of the device is
done by scsi_req_data(). Internally, scsi_req_data() also restarts the
request with scsi_req_continue(). Therefore, we need a guard variable to
know when the contents of the intermediate buffer are in sync with the
data of the guest kernel, so that we can proceed with the request.

While there is such a guard variable in the `SCSIGenericReq' struct (the
`len' field), its intent is not clear and is assigned various
values, when only two are necessary; 0 and 1.

Rename the `len' field to `synced' and add an explanation for what it
does.

Signed-off-by: Alex Pyrgiotis <apyrgio@arrikto.com>
Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 8e2058d..6c0cfa5 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -54,7 +54,23 @@ typedef struct SCSIGenericReq {
     SCSIRequest req;
     uint8_t *buf;
     int buflen;
-    int len;
+
+    /*
+     * Note for the `synced' field:
+     *
+     * If the controller does not provide the device with a scatter-gather
+     * list, then the device must create an intermediate buffer for the
+     * request's data.  For write requests, this buffer must be filled with
+     * data from the controller. For read requests, the controller must get the
+     * data from the buffer.
+     *
+     * The above transfers are handled by the scsi_req_data() function. Since
+     * scsi_req_data() effectively restarts the request, we need an indication
+     * so that we don't do the same request twice. This indication is the
+     * `synced' field.
+     */
+    int synced;
+
     sg_io_hdr_t io_header;
 } SCSIGenericReq;
 
@@ -192,7 +208,6 @@ static void scsi_buf_read_complete(void *opaque, int ret)
     len = r->io_header.dxfer_len - r->io_header.resid;
     DPRINTF("Data ready tag=0x%x len=%d\n", r->req.tag, len);
 
-    r->len = -1;
     if (len == 0) {
         scsi_command_complete_noio(r, 0);
         return;
@@ -224,6 +239,7 @@ static void scsi_buf_read_complete(void *opaque, int ret)
             r->buf[3] |= 0x80;
         }
     }
+    r->synced = 1;
     scsi_req_data(&r->req, len);
     scsi_req_unref(&r->req);
 }
@@ -260,7 +276,9 @@ static void scsi_buf_read_data(SCSIRequest *req)
 
     /* The request is used as the AIO opaque value, so add a ref.  */
     scsi_req_ref(&r->req);
-    if (r->len == -1) {
+
+    /* If we are here due to scsi_req_data(), we can complete the request. */
+    if (r->synced) {
         scsi_command_complete_noio(r, 0);
         return;
     }
@@ -304,9 +322,14 @@ static void scsi_buf_write_data(SCSIRequest *req)
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
 
     DPRINTF("scsi_buf_write_data 0x%x\n", req->tag);
-    if (r->len == 0) {
-        r->len = r->buflen;
-        scsi_req_data(&r->req, r->len);
+
+    /*
+     * Before performing the write request, we need to transfer the data to
+     * our intermediate buffer.
+     */
+    if (!r->synced) {
+        r->synced = 1;
+        scsi_req_data(req, r->buflen);
         return;
     }
 
@@ -365,9 +388,7 @@ static int32_t scsi_common_send_command(SCSIRequest *req, uint8_t *cmd)
     }
 
     memset(r->buf, 0, r->buflen);
-    r->len = r->req.cmd.xfer;
     if (r->req.cmd.mode == SCSI_XFER_TO_DEV) {
-        r->len = 0;
         return -r->req.cmd.xfer;
     } else {
         return r->req.cmd.xfer;
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 8/9] scsi-generic: Factor out response interception
  2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
                   ` (6 preceding siblings ...)
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 7/9] scsi-generic: Make data-copying logic clearer Alex Pyrgiotis
@ 2015-12-16 16:55 ` Alex Pyrgiotis
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 9/9] scsi-generic: Allow full scatter-gather support Alex Pyrgiotis
  2015-12-16 18:16 ` [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Paolo Bonzini
  9 siblings, 0 replies; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-16 16:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

The interception of read/write responses is currently done in the main
code that handles the read write response. Move the interception logic
in a function of its own, so that it can be reused from the
scatter-gather path.

Also, instead of altering the response buffer directly, use the
scsi_get_buf() function and alter the buffer that it returns. This is
also required for the support of scatter-gather lists.

Signed-off-by: Alex Pyrgiotis <apyrgio@arrikto.com>
Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 6c0cfa5..6704861 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -98,6 +98,14 @@ static void scsi_buf_init_io_header(SCSIGenericReq *r, int direction)
     r->io_header.flags |= SG_FLAG_DIRECT_IO;
 }
 
+/* Return a pointer to the data buffer.  */
+static uint8_t *scsi_get_buf(SCSIRequest *req)
+{
+    SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
+
+    return r->buf;
+}
+
 static void scsi_generic_save_request(QEMUFile *f, SCSIRequest *req)
 {
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
@@ -127,7 +135,7 @@ static void scsi_free_request(SCSIRequest *req)
     g_free(r->buf);
 }
 
-/* Helper function for command completion.  */
+/* Helper function for command completion. */
 static void scsi_command_complete_noio(SCSIGenericReq *r, int ret)
 {
     int status;
@@ -191,37 +199,23 @@ static void scsi_command_complete(void *opaque, int ret)
     scsi_command_complete_noio(r, ret);
 }
 
-static void scsi_buf_read_complete(void *opaque, int ret)
+/* Intercept the read response in order to snoop or alter it. */
+static void scsi_intercept_read_response(SCSIGenericReq *r)
 {
-    SCSIGenericReq *r = (SCSIGenericReq *)opaque;
+    uint8_t *res_buf;
     SCSIDevice *s = r->req.dev;
-    int len;
 
-    assert(r->req.aiocb != NULL);
-    r->req.aiocb = NULL;
-
-    if (ret || r->req.io_canceled) {
-        scsi_command_complete_noio(r, ret);
-        return;
-    }
-
-    len = r->io_header.dxfer_len - r->io_header.resid;
-    DPRINTF("Data ready tag=0x%x len=%d\n", r->req.tag, len);
-
-    if (len == 0) {
-        scsi_command_complete_noio(r, 0);
-        return;
-    }
+    res_buf = scsi_get_buf(&r->req);
 
     /* Snoop READ CAPACITY output to set the blocksize.  */
     if (r->req.cmd.buf[0] == READ_CAPACITY_10 &&
-        (ldl_be_p(&r->buf[0]) != 0xffffffffU || s->max_lba == 0)) {
-        s->blocksize = ldl_be_p(&r->buf[4]);
-        s->max_lba = ldl_be_p(&r->buf[0]) & 0xffffffffULL;
+        (ldl_be_p(&res_buf[0]) != 0xffffffffU || s->max_lba == 0)) {
+        s->blocksize = ldl_be_p(&res_buf[4]);
+        s->max_lba = ldl_be_p(&res_buf[0]) & 0xffffffffULL;
     } else if (r->req.cmd.buf[0] == SERVICE_ACTION_IN_16 &&
                (r->req.cmd.buf[1] & 31) == SAI_READ_CAPACITY_16) {
-        s->blocksize = ldl_be_p(&r->buf[8]);
-        s->max_lba = ldq_be_p(&r->buf[0]);
+        s->blocksize = ldl_be_p(&res_buf[8]);
+        s->max_lba = ldq_be_p(&res_buf[0]);
     }
     blk_set_guest_block_size(s->conf.blk, s->blocksize);
 
@@ -234,14 +228,54 @@ static void scsi_buf_read_complete(void *opaque, int ret)
          r->req.cmd.buf[0] == MODE_SENSE_10) &&
         (r->req.cmd.buf[1] & 0x8) == 0) {
         if (r->req.cmd.buf[0] == MODE_SENSE) {
-            r->buf[2] |= 0x80;
+            res_buf[2] |= 0x80;
         } else  {
-            r->buf[3] |= 0x80;
+            res_buf[3] |= 0x80;
         }
     }
-    r->synced = 1;
-    scsi_req_data(&r->req, len);
-    scsi_req_unref(&r->req);
+}
+
+/*
+ * Perform some common checks and return the number of bytes read.
+ *
+ * If we encounter an error, return -1.
+ */
+static int scsi_common_read_complete(SCSIGenericReq *r, int ret)
+{
+    int len;
+
+    assert(r->req.aiocb != NULL);
+    r->req.aiocb = NULL;
+
+    if (ret || r->req.io_canceled) {
+        scsi_command_complete_noio(r, ret);
+        return -1;
+    }
+
+    len = r->io_header.dxfer_len - r->io_header.resid;
+    DPRINTF("Data ready tag=0x%x len=%d\n", r->req.tag, len);
+
+    if (len == 0) {
+        scsi_command_complete_noio(r, 0);
+        return 0;
+    }
+
+    scsi_intercept_read_response(r);
+
+    return len;
+}
+
+static void scsi_buf_read_complete(void *opaque, int ret)
+{
+    int len;
+    SCSIGenericReq *r = (SCSIGenericReq *)opaque;
+
+    len = scsi_common_read_complete(r, ret);
+    if (len > 0) {
+        r->synced = 1;
+        scsi_req_data(&r->req, len);
+        scsi_req_unref(&r->req);
+    }
 }
 
 /*
@@ -291,12 +325,26 @@ static void scsi_common_read_data(SCSIRequest *req)
     scsi_buf_read_data(req);
 }
 
-static void scsi_buf_write_complete(void *opaque, int ret)
+/* Intercept the write response in order to snoop or alter it */
+static void scsi_intercept_write_response(SCSIGenericReq *r)
 {
-    SCSIGenericReq *r = (SCSIGenericReq *)opaque;
+    uint8_t *res_buf;
     SCSIDevice *s = r->req.dev;
 
-    DPRINTF("scsi_write_complete() ret = %d\n", ret);
+    res_buf = scsi_get_buf(&r->req);
+
+    if (r->req.cmd.buf[0] == MODE_SELECT && r->req.cmd.buf[4] == 12 &&
+        s->type == TYPE_TAPE) {
+        s->blocksize = (res_buf[9] << 16) | (res_buf[10] << 8) | res_buf[11];
+        DPRINTF("block size %d\n", s->blocksize);
+    }
+}
+
+static void scsi_common_write_complete(void *opaque, int ret)
+{
+    SCSIGenericReq *r = (SCSIGenericReq *)opaque;
+
+    DPRINTF("scsi_common_write_complete() ret = %d\n", ret);
 
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
@@ -306,15 +354,17 @@ static void scsi_buf_write_complete(void *opaque, int ret)
         return;
     }
 
-    if (r->req.cmd.buf[0] == MODE_SELECT && r->req.cmd.buf[4] == 12 &&
-        s->type == TYPE_TAPE) {
-        s->blocksize = (r->buf[9] << 16) | (r->buf[10] << 8) | r->buf[11];
-        DPRINTF("block size %d\n", s->blocksize);
-    }
+    scsi_intercept_write_response(r);
 
+    r->req.resid = r->io_header.resid;
     scsi_command_complete_noio(r, ret);
 }
 
+static void scsi_buf_write_complete(void *opaque, int ret)
+{
+    scsi_common_write_complete(opaque, ret);
+}
+
 /* Write data to a scsi device.  Returns nonzero on failure.
    The transfer may complete asynchronously.  */
 static void scsi_buf_write_data(SCSIRequest *req)
@@ -344,14 +394,6 @@ static void scsi_common_write_data(SCSIRequest *req)
     scsi_buf_write_data(req);
 }
 
-/* Return a pointer to the data buffer.  */
-static uint8_t *scsi_get_buf(SCSIRequest *req)
-{
-    SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
-
-    return r->buf;
-}
-
 /* Execute a scsi command.  Returns the length of the data expected by the
    command.  This will be Positive for data transfers from the device
    (eg. disk reads), negative for transfers to the device (eg. disk writes),
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 9/9] scsi-generic: Allow full scatter-gather support
  2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
                   ` (7 preceding siblings ...)
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 8/9] scsi-generic: Factor out response interception Alex Pyrgiotis
@ 2015-12-16 16:55 ` Alex Pyrgiotis
  2015-12-16 18:16 ` [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Paolo Bonzini
  9 siblings, 0 replies; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-16 16:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

If the scsi controller uses scatter-gather lists, do not copy them to an
intermediate buffer. Instead, use them as is via the dma_blk_ioctl()
function.

In order to make this feature possible, the following changes have been
made to the code:

* All I/O functions have been branched into two types of functions:
  "sg_*" and "buf_*", which are used for sg and non-sg operations
  respectively.  The shared code between them is likewise moved in
  "common_*" functions.
* The scsi_req_data() function is predictably used only for non-sg
  operations.
* The `sg_io_hdr' struct for scatter-gather operations does not include
  the `SG_FLAG_DIRECT_IO' flag, since it does not work with iovecs, as
  explained here [1].

[1]: http://www.tldp.org/HOWTO/SCSI-Generic-HOWTO/x192.html

Signed-off-by: Alex Pyrgiotis <apyrgio@arrikto.com>
Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 6704861..5f3c401 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -16,6 +16,7 @@
 #include "hw/scsi/scsi.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/blockdev.h"
+#include "sysemu/dma.h"
 
 #ifdef __linux__
 
@@ -98,12 +99,57 @@ static void scsi_buf_init_io_header(SCSIGenericReq *r, int direction)
     r->io_header.flags |= SG_FLAG_DIRECT_IO;
 }
 
-/* Return a pointer to the data buffer.  */
+/*
+ * Create an io_header for sg* operations.
+ *
+ * Note: The rest of the fields are filled when scsi_sg_handle_iov() is called.
+ */
+static void scsi_sg_init_io_header(SCSIGenericReq *r, int direction)
+{
+    _init_io_header(r, direction);
+}
+
+/*
+ * Handle a QEMUIOVector, as created by the dma_blk_ioctl() function.
+ *
+ * As mentioned here [1], in order to use an iovec for the ioctl operation, the
+ * iovec_count field must be filled with the number of vectors, the dxferp
+ * field must point to the iovecs and the dxfer_len field must state the size
+ * of the request, which should amount to the sum of iov_lens.
+ *
+ * [1] http://sg.danny.cz/sg/p/sg_v3_ho.html#id2495162
+ */
+static void scsi_sg_handle_iov(void *opaque, QEMUIOVector *iov)
+{
+    SCSIGenericReq *r = (SCSIGenericReq *)opaque;
+
+    r->io_header.iovec_count = iov->niov;
+    r->io_header.dxferp = iov->iov;
+    r->io_header.dxfer_len = iov->size;
+}
+
+/*
+ * Return a pointer to the request's data buffer.
+ *
+ * If the SCSI controller does not provide a scatter-gather list, then the data
+ * buffer of the request is stored in the `buf' field of SCSIGenericReq.
+ *
+ * Else, the data buffer is mapped to a list of iovecs and this function is
+ * never called externally. If called internally, it will be solely for the
+ * manipulation of the first 9 bytes of data of a SCSI response. In this case,
+ * we can safely return a pointer to the data of the first iovec.
+ */
 static uint8_t *scsi_get_buf(SCSIRequest *req)
 {
+    struct iovec *iov;
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
 
-    return r->buf;
+    if (req->sg) {
+        iov = (struct iovec *)r->io_header.dxferp;
+        return iov->iov_base;
+    } else {
+        return r->buf;
+    }
 }
 
 static void scsi_generic_save_request(QEMUFile *f, SCSIRequest *req)
@@ -265,6 +311,18 @@ static int scsi_common_read_complete(SCSIGenericReq *r, int ret)
     return len;
 }
 
+static void scsi_sg_read_complete(void *opaque, int ret)
+{
+    int len;
+    SCSIGenericReq *r = (SCSIGenericReq *)opaque;
+
+    len = scsi_common_read_complete(r, ret);
+    if (len > 0) {
+        r->req.resid = r->io_header.resid;
+        scsi_command_complete_noio(r, 0);
+    }
+}
+
 static void scsi_buf_read_complete(void *opaque, int ret)
 {
     int len;
@@ -279,6 +337,39 @@ static void scsi_buf_read_complete(void *opaque, int ret)
 }
 
 /*
+ * Execute the request using the provided scatter-gather list.
+ *
+ * This function does the following:
+ *
+ * a. Initialize the io header for the ioctl request.
+ * b. Derive the DMA direction from the provided request direction.
+ * c. Perform the ioctl request using dma_blk_ioctl() from dma-helpers. Also,
+ *    register a callback for the handling of the resulting iovector from the
+ *    internal mapping of the scatter-gather list.
+ */
+static void scsi_sg_do_request(SCSIGenericReq *r, int direction,
+        BlockCompletionFunc *complete)
+{
+    SCSIDevice *s = r->req.dev;
+    DMADirection dir;
+
+    scsi_sg_init_io_header(r, direction);
+
+    if (direction == SG_DXFER_TO_DEV) {
+        dir = DMA_DIRECTION_TO_DEVICE;
+    } else if (direction == SG_DXFER_FROM_DEV) {
+        dir = DMA_DIRECTION_FROM_DEVICE;
+    }
+
+    r->req.aiocb = dma_blk_ioctl(s->conf.blk, SG_IO, &r->io_header, r->req.sg,
+            dir, scsi_sg_handle_iov, complete, r);
+
+    if (!r->req.aiocb) {
+        scsi_command_complete_noio(r, -EIO);
+    }
+}
+
+/*
  * Execute the request using an intermediate buffer.
  *
  * This function does the following:
@@ -301,7 +392,18 @@ static void scsi_buf_do_request(SCSIGenericReq *r, int direction,
     }
 }
 
-/* Read more data from scsi device into buffer.  */
+static void scsi_sg_read_data(SCSIRequest *req)
+{
+    SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
+
+    DPRINTF("scsi_sg_read_data 0x%x\n", req->tag);
+
+    /* The request is used as the AIO opaque value, so add a ref.  */
+    scsi_req_ref(&r->req);
+
+    scsi_sg_do_request(r, SG_DXFER_FROM_DEV, scsi_sg_read_complete);
+}
+
 static void scsi_buf_read_data(SCSIRequest *req)
 {
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
@@ -322,7 +424,11 @@ static void scsi_buf_read_data(SCSIRequest *req)
 
 static void scsi_common_read_data(SCSIRequest *req)
 {
-    scsi_buf_read_data(req);
+    if (req->sg) {
+        scsi_sg_read_data(req);
+    } else {
+        scsi_buf_read_data(req);
+    }
 }
 
 /* Intercept the write response in order to snoop or alter it */
@@ -360,6 +466,11 @@ static void scsi_common_write_complete(void *opaque, int ret)
     scsi_command_complete_noio(r, ret);
 }
 
+static void scsi_sg_write_complete(void *opaque, int ret)
+{
+    scsi_common_write_complete(opaque, ret);
+}
+
 static void scsi_buf_write_complete(void *opaque, int ret)
 {
     scsi_common_write_complete(opaque, ret);
@@ -367,6 +478,20 @@ static void scsi_buf_write_complete(void *opaque, int ret)
 
 /* Write data to a scsi device.  Returns nonzero on failure.
    The transfer may complete asynchronously.  */
+static void scsi_sg_write_data(SCSIRequest *req)
+{
+    SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
+
+    DPRINTF("scsi_sg_write_data 0x%x\n", req->tag);
+
+    /* The request is used as the AIO opaque value, so add a ref.  */
+    scsi_req_ref(&r->req);
+
+    scsi_sg_do_request(r, SG_DXFER_TO_DEV, scsi_sg_write_complete);
+}
+
+/* Write data to a scsi device.  Returns nonzero on failure.
+   The transfer may complete asynchronously.  */
 static void scsi_buf_write_data(SCSIRequest *req)
 {
     SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
@@ -374,8 +499,8 @@ static void scsi_buf_write_data(SCSIRequest *req)
     DPRINTF("scsi_buf_write_data 0x%x\n", req->tag);
 
     /*
-     * Before performing the write request, we need to transfer the data to
-     * our intermediate buffer.
+     * If the controller does not provide a scatter-gather list, we need to
+     * sync the controller's data with the request's intermediate buffer.
      */
     if (!r->synced) {
         r->synced = 1;
@@ -391,7 +516,11 @@ static void scsi_buf_write_data(SCSIRequest *req)
 
 static void scsi_common_write_data(SCSIRequest *req)
 {
-    scsi_buf_write_data(req);
+    if (req->sg) {
+        scsi_sg_write_data(req);
+    } else {
+        scsi_buf_write_data(req);
+    }
 }
 
 /* Execute a scsi command.  Returns the length of the data expected by the
@@ -413,6 +542,9 @@ static int32_t scsi_common_send_command(SCSIRequest *req, uint8_t *cmd)
     }
 #endif
 
+    /*
+     * SCSI commands that have nothing to transfer can be executed immediately.
+     */
     if (r->req.cmd.xfer == 0) {
         g_free(r->buf);
         r->buflen = 0;
@@ -423,13 +555,16 @@ static int32_t scsi_common_send_command(SCSIRequest *req, uint8_t *cmd)
         return 0;
     }
 
-    if (r->buflen != r->req.cmd.xfer) {
-        g_free(r->buf);
-        r->buf = g_malloc(r->req.cmd.xfer);
-        r->buflen = r->req.cmd.xfer;
+    r->buflen = r->req.cmd.xfer;
+
+    /*
+     * If the controller does not support scatter-gather lists, allocate an
+     * intermediate buffer that can hold the request data.
+     */
+    if (!req->sg) {
+        r->buf = g_malloc0(r->buflen);
     }
 
-    memset(r->buf, 0, r->buflen);
     if (r->req.cmd.mode == SCSI_XFER_TO_DEV) {
         return -r->req.cmd.xfer;
     } else {
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices
  2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
                   ` (8 preceding siblings ...)
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 9/9] scsi-generic: Allow full scatter-gather support Alex Pyrgiotis
@ 2015-12-16 18:16 ` Paolo Bonzini
  2015-12-17  8:47   ` Alex Pyrgiotis
  9 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2015-12-16 18:16 UTC (permalink / raw)
  To: Alex Pyrgiotis, qemu-devel



On 16/12/2015 17:55, Alex Pyrgiotis wrote:
> Hi all,
> 
> This patch is an attempt to boost the performance of "scsi-generic" and
> "scsi-block" device types, by removing an extra data copy and reducing
> their memory footprint. More specifically, the problem lies in the
> functions in the `scsi-generic_req_ops` struct of scsi-generic.c. These
> functions rely on an intermediate buffer to do the SG_IO ioctl request,
> without checking if the SCSI controller has provided a scatter-gather
> list with the request.
> 
> In a nutshell, our proposal is to map the provided scatter-gather list
> (if any) to the address space of the QEMU process and use the resulting
> iovec as the buffer for the ioctl request. You'll find that the logic is
> quite similar to the one used in scsi-disk.c.

Which commands have large payloads and are on the data path, for
scsi-block?  Or is the use case just scsi-generic (e.g. tape devices?)?

(Just trying to understand before I dive into the patches).

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices
  2015-12-16 18:16 ` [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Paolo Bonzini
@ 2015-12-17  8:47   ` Alex Pyrgiotis
  2015-12-17 10:31     ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-17  8:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Hi Paolo,

On 12/16/2015 08:16 PM, Paolo Bonzini wrote:
> 
> 
> On 16/12/2015 17:55, Alex Pyrgiotis wrote:
>> Hi all,
>>
>> This patch is an attempt to boost the performance of "scsi-generic" and
>> "scsi-block" device types, by removing an extra data copy and reducing
>> their memory footprint. More specifically, the problem lies in the
>> functions in the `scsi-generic_req_ops` struct of scsi-generic.c. These
>> functions rely on an intermediate buffer to do the SG_IO ioctl request,
>> without checking if the SCSI controller has provided a scatter-gather
>> list with the request.
>>
>> In a nutshell, our proposal is to map the provided scatter-gather list
>> (if any) to the address space of the QEMU process and use the resulting
>> iovec as the buffer for the ioctl request. You'll find that the logic is
>> quite similar to the one used in scsi-disk.c.
> 
> Which commands have large payloads and are on the data path, for
> scsi-block?  Or is the use case just scsi-generic (e.g. tape devices?)?
> 
> (Just trying to understand before I dive into the patches).

Sure, no problem. The commands that have large payloads and are on the
data path are the classic SCSI READ/WRITE commands. Usually, these
commands are implemented with vectored reads/writes, which utilize the
controller's scatter-gather list.

However, when opening a "scsi-block" device with the default cache
policy (cache=writeback), QEMU fallbacks to the "scsi-generic" functions
(i.e, SG_IO ioctl requests) for reading/writing data [1]. In this case,
the data are copied in a bounce buffer, which is the issue that this
patch tackles.

Thanks,
Alex


[1]: I'll quote the comment on the code for the rationale behind this
choice:

	"If we are not using O_DIRECT, we might read stale data from
    	the host cache if writes were made using other commands than 	
	these ones (such as WRITE SAME or EXTENDED COPY, etc.).  So,
	without O_DIRECT everything must go through SG_IO."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices
  2015-12-17  8:47   ` Alex Pyrgiotis
@ 2015-12-17 10:31     ` Paolo Bonzini
  2015-12-17 13:10       ` Alex Pyrgiotis
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2015-12-17 10:31 UTC (permalink / raw)
  To: Alex Pyrgiotis; +Cc: qemu-devel



On 17/12/2015 09:47, Alex Pyrgiotis wrote:
>> > Which commands have large payloads and are on the data path, for
>> > scsi-block?  Or is the use case just scsi-generic (e.g. tape devices?)?
>> > 
>> > (Just trying to understand before I dive into the patches).
> Sure, no problem. The commands that have large payloads and are on the
> data path are the classic SCSI READ/WRITE commands. Usually, these
> commands are implemented with vectored reads/writes, which utilize the
> controller's scatter-gather list.
> 
> However, when opening a "scsi-block" device with the default cache
> policy (cache=writeback), QEMU fallbacks to the "scsi-generic" functions
> (i.e, SG_IO ioctl requests) for reading/writing data [1]. In this case,
> the data are copied in a bounce buffer, which is the issue that this
> patch tackles.

Right, I forgot about that.  However, falling back to scsi-generic
effectively means that scsi-block is always O_DIRECT/cache=none.  So why
not just specify cache=none?

We can improve the code to print a warning if you don't.  (It needs some
care: iscsi never caches, independent of the cache= argument, so we
don't want to warn for it.  But it can be done).

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices
  2015-12-17 10:31     ` Paolo Bonzini
@ 2015-12-17 13:10       ` Alex Pyrgiotis
  2015-12-17 13:13         ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-17 13:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Hi Paolo,

On 12/17/2015 12:31 PM, Paolo Bonzini wrote:
> 
> 
> On 17/12/2015 09:47, Alex Pyrgiotis wrote:
>>>> Which commands have large payloads and are on the data path, for
>>>> scsi-block?  Or is the use case just scsi-generic (e.g. tape devices?)?
>>>>
>>>> (Just trying to understand before I dive into the patches).
>> Sure, no problem. The commands that have large payloads and are on the
>> data path are the classic SCSI READ/WRITE commands. Usually, these
>> commands are implemented with vectored reads/writes, which utilize the
>> controller's scatter-gather list.
>>
>> However, when opening a "scsi-block" device with the default cache
>> policy (cache=writeback), QEMU fallbacks to the "scsi-generic" functions
>> (i.e, SG_IO ioctl requests) for reading/writing data [1]. In this case,
>> the data are copied in a bounce buffer, which is the issue that this
>> patch tackles.
> 
> Right, I forgot about that.  However, falling back to scsi-generic
> effectively means that scsi-block is always O_DIRECT/cache=none.  So why
> not just specify cache=none?

If I understand correctly, what you're saying is that if "scsi-block" is
started with "cache=writeback" and internally uses ioctl()s to bypass
the page cache, why not set "cache=none" beforehand and use
readv()/writev()?

This is a valid suggestion, but this patch does not target only the
"scsi-block" device type. Its purpose is to allow faster read/writes via
ioctl()s, either to a "scsi-block" device or to a "scsi-generic" device.
Note that the latter device type can only use ioctl()s, so it cannot
benefit from the readv()/writev() DMA interface and currently has to use
a bounce buffer.

> We can improve the code to print a warning if you don't.  (It needs some
> care: iscsi never caches, independent of the cache= argument, so we
> don't want to warn for it.  But it can be done).

I wasn't particularly concerned about that issue. I'd may prefer if this
was explicitly addressed in the QEMU doc, under the "cache=" section,
but that's a different discussion.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices
  2015-12-17 13:10       ` Alex Pyrgiotis
@ 2015-12-17 13:13         ` Paolo Bonzini
  2015-12-21 10:58           ` Alex Pyrgiotis
  2016-01-11 13:30           ` Alex Pyrgiotis
  0 siblings, 2 replies; 26+ messages in thread
From: Paolo Bonzini @ 2015-12-17 13:13 UTC (permalink / raw)
  To: Alex Pyrgiotis; +Cc: qemu-devel



On 17/12/2015 14:10, Alex Pyrgiotis wrote:
>>>>> Which commands have large payloads and are on the data path, for
>>>>> scsi-block?  Or is the use case just scsi-generic (e.g. tape devices?)?
>
> If I understand correctly, what you're saying is that if "scsi-block" is
> started with "cache=writeback" and internally uses ioctl()s to bypass
> the page cache, why not set "cache=none" beforehand and use
> readv()/writev()?
> 
> This is a valid suggestion, but this patch does not target only the
> "scsi-block" device type. Its purpose is to allow faster read/writes via
> ioctl()s, either to a "scsi-block" device or to a "scsi-generic" device.
> Note that the latter device type can only use ioctl()s, so it cannot
> benefit from the readv()/writev() DMA interface and currently has to use
> a bounce buffer.

Okay, so that answers my questions; there is still a valid use case for
e.g. tape devices, and of course for when someone forgets to use scsi-block.

Paolo

>> We can improve the code to print a warning if you don't.  (It needs some
>> care: iscsi never caches, independent of the cache= argument, so we
>> don't want to warn for it.  But it can be done).
> 
> I wasn't particularly concerned about that issue. I'd may prefer if this
> was explicitly addressed in the QEMU doc, under the "cache=" section,
> but that's a different discussion.
> 
> Thanks,
> Alex
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices
  2015-12-17 13:13         ` Paolo Bonzini
@ 2015-12-21 10:58           ` Alex Pyrgiotis
  2016-01-11 13:30           ` Alex Pyrgiotis
  1 sibling, 0 replies; 26+ messages in thread
From: Alex Pyrgiotis @ 2015-12-21 10:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Hi Paolo,

On 12/17/2015 03:13 PM, Paolo Bonzini wrote:
> 
> 
> On 17/12/2015 14:10, Alex Pyrgiotis wrote:
>>>>>> Which commands have large payloads and are on the data path, for
>>>>>> scsi-block?  Or is the use case just scsi-generic (e.g. tape devices?)?
>>
>> If I understand correctly, what you're saying is that if "scsi-block" is
>> started with "cache=writeback" and internally uses ioctl()s to bypass
>> the page cache, why not set "cache=none" beforehand and use
>> readv()/writev()?
>>
>> This is a valid suggestion, but this patch does not target only the
>> "scsi-block" device type. Its purpose is to allow faster read/writes via
>> ioctl()s, either to a "scsi-block" device or to a "scsi-generic" device.
>> Note that the latter device type can only use ioctl()s, so it cannot
>> benefit from the readv()/writev() DMA interface and currently has to use
>> a bounce buffer.
> 
> Okay, so that answers my questions; there is still a valid use case for
> e.g. tape devices, and of course for when someone forgets to use scsi-block.

Nice. I'm looking forward to your comments and I'll reply promptly.

Thanks,
Alex

>>> We can improve the code to print a warning if you don't.  (It needs some
>>> care: iscsi never caches, independent of the cache= argument, so we
>>> don't want to warn for it.  But it can be done).
>>
>> I wasn't particularly concerned about that issue. I'd may prefer if this
>> was explicitly addressed in the QEMU doc, under the "cache=" section,
>> but that's a different discussion.
>>
>> Thanks,
>> Alex
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices
  2015-12-17 13:13         ` Paolo Bonzini
  2015-12-21 10:58           ` Alex Pyrgiotis
@ 2016-01-11 13:30           ` Alex Pyrgiotis
  1 sibling, 0 replies; 26+ messages in thread
From: Alex Pyrgiotis @ 2016-01-11 13:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On 12/17/2015 03:13 PM, Paolo Bonzini wrote:
> 
> 
> On 17/12/2015 14:10, Alex Pyrgiotis wrote:
>>>>>> Which commands have large payloads and are on the data path, for
>>>>>> scsi-block?  Or is the use case just scsi-generic (e.g. tape devices?)?
>>
>> If I understand correctly, what you're saying is that if "scsi-block" is
>> started with "cache=writeback" and internally uses ioctl()s to bypass
>> the page cache, why not set "cache=none" beforehand and use
>> readv()/writev()?
>>
>> This is a valid suggestion, but this patch does not target only the
>> "scsi-block" device type. Its purpose is to allow faster read/writes via
>> ioctl()s, either to a "scsi-block" device or to a "scsi-generic" device.
>> Note that the latter device type can only use ioctl()s, so it cannot
>> benefit from the readv()/writev() DMA interface and currently has to use
>> a bounce buffer.
> 
> Okay, so that answers my questions; there is still a valid use case for
> e.g. tape devices, and of course for when someone forgets to use scsi-block.
> 
> Paolo

Ping?

Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] dma-helpers: Do not truncate small qiovs
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 3/9] dma-helpers: Do not truncate small qiovs Alex Pyrgiotis
@ 2016-02-11 11:08   ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-11 11:08 UTC (permalink / raw)
  To: Alex Pyrgiotis, qemu-devel, Kevin Wolf



On 16/12/2015 17:55, Alex Pyrgiotis wrote:
> If the size of the qiov is smaller than the sector size, do not truncate
> the qiov, which would effectively make it empty. Instead, allow it to
> pass as is.
> 
> This is necessary for SCSI requests like READ CAPACITY which have small
> buffers, e.g. 32 bytes.
> 
> Signed-off-by: Alex Pyrgiotis <apyrgio@arrikto.com>
> Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>
> 
> diff --git a/dma-helpers.c b/dma-helpers.c
> index e1ea7b3..b8f2ae0 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -162,7 +162,16 @@ static void dma_map_sg(DMAAIOCB *dbs)
>          return;
>      }
>  
> -    if (dbs->iov.size & ~BDRV_SECTOR_MASK) {
> +    /*
> +     * If the size of the qiov is not a multiple of the sector size, truncate
> +     * the qiov.
> +     *
> +     * NOTE: If the qiov is less than a sector, we can assume that there is a
> +     * reason for it, e.g., a SCSI request such as READ CAPACITY, and we should
> +     * not truncate it.
> +     */

I'm afraid this is wrong.  It is legal to send SCSI requests that are
e.g. 513 bytes in size.  The sector limit is arbitrary.

I think the "if" statement is wrong too, however.  In particular it has
not broken IDE TRIM only because the ATA standard makes 512 bytes the
unit of DMA transfer.  Rather, it is the IDE device that should discard
extra data in the QEMUSGList.

Paolo

> +    if (dbs->iov.size & ~BDRV_SECTOR_MASK &&
> +            dbs->iov.size > BDRV_SECTOR_SIZE) {
>          qemu_iovec_discard_back(&dbs->iov, dbs->iov.size & ~BDRV_SECTOR_MASK);
>      }
>  }
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
  2015-12-16 16:55 ` [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic Alex Pyrgiotis
@ 2016-02-11 11:17   ` Paolo Bonzini
  2016-02-19 11:50     ` Alex Pyrgiotis
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-11 11:17 UTC (permalink / raw)
  To: Alex Pyrgiotis, qemu-devel

On 16/12/2015 17:55, Alex Pyrgiotis wrote:
> +/*
> + * Create a QEMUIOVector from a scatter-gather list.
> + *
> + * This function does not copy the data of the scatter-gather list. Instead, it
> + * uses the dma_memory_map() function to map physical memory regions of the
> + * virtual device (as interpreted by the guest kernel) into the address space
> + * of the QEMU process, in order to have access to the data.
> + */
> +static void dma_map_sg(DMAAIOCB *dbs)

In special cases where the QEMUSGList includes MMIO regions, dma_map_sg
might not be able to map the whole list.  In this case, for regular I/O
it is possible to break the operation in multiple steps---in fact, this
breaking of requests is the main purpose of most of the code in
dma-helpers.c.

However, it is not possible to do the same for ioctls.  This is actually
the reason why no one has ever tried to make scsi-generic do anything
but bounce-buffering.  I think that your code breaks horribly in this
case, and I don't see a way to fix it, except for reverting to bounce
buffering.

This would require major changes in your patches, and I'm not sure
whether they are worth it for the single use case of tape devices...

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
  2016-02-11 11:17   ` Paolo Bonzini
@ 2016-02-19 11:50     ` Alex Pyrgiotis
  2016-02-22 10:43       ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Pyrgiotis @ 2016-02-19 11:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Hi Paolo,

Sorry for the late reply and thanks for the review. See my comments inline:

On 02/11/2016 01:17 PM, Paolo Bonzini wrote:
> 
> 
> On 16/12/2015 17:55, Alex Pyrgiotis wrote:
>> +/*
>> + * Create a QEMUIOVector from a scatter-gather list.
>> + *
>> + * This function does not copy the data of the scatter-gather list. Instead, it
>> + * uses the dma_memory_map() function to map physical memory regions of the
>> + * virtual device (as interpreted by the guest kernel) into the address space
>> + * of the QEMU process, in order to have access to the data.
>> + */
>> +static void dma_map_sg(DMAAIOCB *dbs)
> 
> In special cases where the QEMUSGList includes MMIO regions, dma_map_sg
> might not be able to map the whole list.  In this case, for regular I/O
> it is possible to break the operation in multiple steps---in fact, this
> breaking of requests is the main purpose of most of the code in
> dma-helpers.c.

You're right. I've written a comment about that yet, somehow, the
implications of rescheduling a DMA operation flew over my head. Great.

So, I've tried to grasp the current situation and this is what I've got
so far. Please correct me where I'm wrong.

Guest kernel space:
1. The guest kernel wants to send a SCSI request to a (para)virtual SCSI
   controller.
2. This SCSI request involves a list of physical address ranges.
   Whether the request only refers to a single contiguous physical
   range, or whether it is a scatter-gather list depends on the
   implementation of the kernel's Low-level Device Driver (LLDD) and
   the capabilities of the SCSI controller.
3. Some of these ranges may include MMIO regions. Since I'm not aware
   of a scenario where this happens, can you provide an example?
4. The LLDD writes the SCSI CDB to the SCSI controller's queue and
   informs the SCSI controller by writing to some sort of doorbell
   register.

QEMU/Hardware space:
5. The SCSI controller code will create a QEMUSGList that points to
   the memory regions of the SCSI request. This QEMUSGList will also
   include the MMIO regions.
6. The QEMU device implementation, e.g. scsi-block, chooses to use
   the dma_* interface.
7. The dma_blk_read/write() code will ultimately attempt to map all the
   memory regions pointed by the QEMUSGList in order to create a
   QEMUIOVector.
8. At some point during the mapping loop, the code will encounter an
   MMIO region. Since reading and writing from/to an MMIO region
   requires  special handling, e.g., we need to call
   MemoryRegion->ops->write(), we cannot include it in our read/write
   system call to the host kernel.
9. This leads to a partial read/write and the mapping loop will resume
   once the partial read/write() has finished.

Are we in the same page so far? If so, let's dig into the
dma_blk_read/write() logic. The following is a simplified pseudocode:

	allocate QEMUIOVector
callback:

	# Reset any changes to QEMUIOVector from previous operations
	for address in QEMUIOVector:
		unmap address
	reset QEMUIOVector

	# Check if the previous IO operation was the last one and if so,
	# call the completion callback of the user and return.
	if operation is completed:
		dma_complete()
		return

	offset = adjust read/write offset
	# Map each sg of QEMUSGList and add it to the QEMUIOVector
	for sg in QEMUSGList:
		address = dma_memory_map(sg)
		if address == NULL:
			break
		add address to QEMUIOVector

	# When either of the following operations have finished, restart
	# this DMA operation.
	if nothing mapped:
		# If we can't map anything, we need to retry
		create reschedule_dma callback
		cpu_register_map_client(callback)
	else:
		# Else, we can perform a partial read/write()
		do readv/writev(offset, QEMUIOVector, callback)

Are the above OK? If so, I have some questions:

a) Is an MMIO region one of the reasons why we can't map an sg?
b) At which point will the relevant ops->write() method for the MMIO
   region be called when we have to DMA into the region?? Is it handled
   implicitly in dma_memory_map()?
c) I'm not quite sure about the logic of the "nothing mapped" section.
   Correct me if I'm wrong, but what I think it does is that it
   registers a callback (reschedule_dma) once some sort of mapping has
   completed. What kind of mapping is this? Is there anything more to
   it?

> However, it is not possible to do the same for ioctls.  This is actually
> the reason why no one has ever tried to make scsi-generic do anything
> but bounce-buffering. I think that your code breaks horribly in this
> case, and I don't see a way to fix it, except for reverting to bounce
> buffering.
>
> This would require major changes in your patches, and I'm not sure
> whether they are worth it for the single use case of tape devices...

Well, I wouldn't narrow it down to tape devices. The scsi-generic
interface is the universal interface for all SCSI devices and the only
interface that is fully passthrough. So this patch would effectively
boost the baseline performance of SCSI devices. I think it's worth a try.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
  2016-02-19 11:50     ` Alex Pyrgiotis
@ 2016-02-22 10:43       ` Paolo Bonzini
  2016-02-25 10:10         ` Alex Pyrgiotis
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-22 10:43 UTC (permalink / raw)
  To: Alex Pyrgiotis; +Cc: qemu-devel



On 19/02/2016 12:50, Alex Pyrgiotis wrote:
> QEMU/Hardware space:
> 5. The SCSI controller code will create a QEMUSGList that points to
>    the memory regions of the SCSI request. This QEMUSGList will also
>    include the MMIO regions.
> 6. The QEMU device implementation, e.g. scsi-block, chooses to use
>    the dma_* interface.
> 7. The dma_blk_read/write() code will ultimately attempt to map all the
>    memory regions pointed by the QEMUSGList in order to create a
>    QEMUIOVector.
> 8. At some point during the mapping loop, the code will encounter an
>    MMIO region. Since reading and writing from/to an MMIO region
>    requires  special handling, e.g., we need to call
>    MemoryRegion->ops->write(), we cannot include it in our read/write
>    system call to the host kernel.
> 9. This leads to a partial read/write and the mapping loop will resume
>    once the partial read/write() has finished.
> 
> Are we in the same page so far?

Yes.

> Are the above OK? If so, I have some questions:
> 
> a) Is an MMIO region one of the reasons why we can't map an sg?

Yes, the only one pretty much.

> b) At which point will the relevant ops->write() method for the MMIO
>    region be called when we have to DMA into the region?? Is it handled
>    implicitly in dma_memory_map()?

It's in address_space_unmap:

    if (is_write) {
        address_space_write(as, bounce.addr, MEMTXATTRS_UNSPECIFIED,
                            bounce.buffer, access_len);
    }

Likewise, address_space_map does the ops->read call through
address_space_read.

> c) I'm not quite sure about the logic of the "nothing mapped" section.
>    Correct me if I'm wrong, but what I think it does is that it
>    registers a callback (reschedule_dma) once some sort of mapping has
>    completed. What kind of mapping is this? Is there anything more to
>    it?

Once something (presumably a concurrent user of dma-helpers.c) calls
address_space_unmap to free the mapping (the bounce.buffer in the above
address_space_write call), reschedule_dma is called.

>> However, it is not possible to do the same for ioctls.  This is actually
>> the reason why no one has ever tried to make scsi-generic do anything
>> but bounce-buffering. I think that your code breaks horribly in this
>> case, and I don't see a way to fix it, except for reverting to bounce
>> buffering.
>>
>> This would require major changes in your patches, and I'm not sure
>> whether they are worth it for the single use case of tape devices...
> 
> Well, I wouldn't narrow it down to tape devices. The scsi-generic
> interface is the universal interface for all SCSI devices and the only
> interface that is fully passthrough.

Sure, but what's the advantage of a fully passthrough interface over
scsi-block?

> So this patch would effectively
> boost the baseline performance of SCSI devices. I think it's worth a try.

I think the first step is understanding what to do about the weird "&
~BDRV_SECTOR_MASK" case, then.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
  2016-02-22 10:43       ` Paolo Bonzini
@ 2016-02-25 10:10         ` Alex Pyrgiotis
  2016-02-25 10:22           ` Paolo Bonzini
  2016-02-26  9:20           ` Kevin Wolf
  0 siblings, 2 replies; 26+ messages in thread
From: Alex Pyrgiotis @ 2016-02-25 10:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, famz, qemu-devel

Hi Paolo,

Thanks a lot for your clarifications. See my comments inline:

(tl;dr: I suggest we reconsider Fam Zheng's attempt to remove the global
bounce buffer, which would make dma-helpers simpler and unblock this patch)

On 02/22/2016 12:43 PM, Paolo Bonzini wrote:
> 
> 
> On 19/02/2016 12:50, Alex Pyrgiotis wrote:
>> QEMU/Hardware space:
>> 5. The SCSI controller code will create a QEMUSGList that points to
>>    the memory regions of the SCSI request. This QEMUSGList will also
>>    include the MMIO regions.
>> 6. The QEMU device implementation, e.g. scsi-block, chooses to use
>>    the dma_* interface.
>> 7. The dma_blk_read/write() code will ultimately attempt to map all the
>>    memory regions pointed by the QEMUSGList in order to create a
>>    QEMUIOVector.
>> 8. At some point during the mapping loop, the code will encounter an
>>    MMIO region. Since reading and writing from/to an MMIO region
>>    requires  special handling, e.g., we need to call
>>    MemoryRegion->ops->write(), we cannot include it in our read/write
>>    system call to the host kernel.

This step and the next one were not clear to me, but thanks to your
comments, I now get what's happening behind the scenes. So, let's reiterate:

All normal regions in a QEMUSGList point to an address range in the
guest's RAM. The MMIO regions of QEMU's virtual devices, however, do not
correspond to such an address range, so QEMU must create a bounce buffer
to represent them. This bounce buffer is added in the I/O vector which
contains the rest of the mapped addresses and is later given to a
readv()/writev() call.

>> 9. This leads to a partial read/write and the mapping loop will resume
>>    once the partial read/write() has finished.

The MMIO region is the trigger for a partial read/write, but it's not
the actual reason. The actual reason is that there is only *one*
*global* bounce buffer. This means that if it's in use it or we
need to use it twice, we will have to wait.

>> Are we in the same page so far?
> 
> Yes.
> 
>> Are the above OK? If so, I have some questions:
>>
>> a) Is an MMIO region one of the reasons why we can't map an sg?
> 
> Yes, the only one pretty much.
> 
>> b) At which point will the relevant ops->write() method for the MMIO
>>    region be called when we have to DMA into the region?? Is it handled
>>    implicitly in dma_memory_map()?
> 
> It's in address_space_unmap:
> 
>     if (is_write) {
>         address_space_write(as, bounce.addr, MEMTXATTRS_UNSPECIFIED,
>                             bounce.buffer, access_len);
>     }
> 
> Likewise, address_space_map does the ops->read call through
> address_space_read.
> 
>> c) I'm not quite sure about the logic of the "nothing mapped" section.
>>    Correct me if I'm wrong, but what I think it does is that it
>>    registers a callback (reschedule_dma) once some sort of mapping has
>>    completed. What kind of mapping is this? Is there anything more to
>>    it?
> 
> Once something (presumably a concurrent user of dma-helpers.c) calls
> address_space_unmap to free the mapping (the bounce.buffer in the above
> address_space_write call), reschedule_dma is called.
>
>>> However, it is not possible to do the same for ioctls.  This is actually
>>> the reason why no one has ever tried to make scsi-generic do anything
>>> but bounce-buffering. I think that your code breaks horribly in this
>>> case, and I don't see a way to fix it, except for reverting to bounce
>>> buffering.

Sure, you're right, there's no sensible way to break an ioctl()
operation in many. However, I'd argue that we shouldn't need to, as it
would be much better if the DMA operation was never restarted in the
first place. Instead, if we dealt with the bigger issue of the global
bounce buffer, we could kill two birds with one stone.

I see that there was an attempt [1] to replace the global bounce buffer
with something more dynamic. In short, the objections [2] were the
following:

1. It introduced locking/unlocking a global mutex in the hot path as
   well as a hash table lookup.
2. It allowed for unbounded memory allocations.

An improvement that would address (1) is to get rid of any global state:

Since the mapping operation takes place in the context of a DMA
operation, we could provide a ctx-type struct to the dma_memory_(un)map
--> address_space_(un)map functions that would contain a hash table. If
any memory allocations were needed, we would track them using that hash
table, which would require no locks. Moreover, if the initialization of
the hash table hurts the performance in the general case, we could use
instead a skip list, if the number of memory allocations is small (e.g.
< 100).

If a mapping operation does not take place in the context of a DMA
operation, we could pass NULL and the address_space_(un)map code would
fallback to the global bounce buffer. Having a fallback would allow for
a smooth transition.

As for the point raised in (2), we can have a limit on the allocated
pages, e.g. 1024. If a well-behaved guest reaches that limit, it will
resume the allocation once a request has completed. For misbehaving
guests, this means that their request will block. Is this reasonable

>>> This would require major changes in your patches, and I'm not sure
>>> whether they are worth it for the single use case of tape devices...
>>
>> Well, I wouldn't narrow it down to tape devices. The scsi-generic
>> interface is the universal interface for all SCSI devices and the only
>> interface that is fully passthrough.
> 
> Sure, but what's the advantage of a fully passthrough interface over
> scsi-block?

This is not a matter of advantage between the two interfaces. A fully
passthrough interface is simply an alternative data path which one
should be able to test and benchmark.

>> So this patch would effectively
>> boost the baseline performance of SCSI devices. I think it's worth a try.
> 
> I think the first step is understanding what to do about the weird "&
> ~BDRV_SECTOR_MASK" case, then.

We can discuss about this case in the "dma-helpers: Do not truncate
small qiovs" thread. I'm all for the removal of this check, but I was
hoping that Kevin would clarify if it's still relevant.

Cheers,
Alex

[1] http://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg01776.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg01884.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
  2016-02-25 10:10         ` Alex Pyrgiotis
@ 2016-02-25 10:22           ` Paolo Bonzini
  2016-02-25 11:19             ` Alex Pyrgiotis
  2016-02-26  9:20           ` Kevin Wolf
  1 sibling, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25 10:22 UTC (permalink / raw)
  To: Alex Pyrgiotis; +Cc: kwolf, famz, qemu-devel

On 25/02/2016 11:10, Alex Pyrgiotis wrote:
> All normal regions in a QEMUSGList point to an address range in the
> guest's RAM. The MMIO regions of QEMU's virtual devices, however, do not
> correspond to such an address range, so QEMU must create a bounce buffer
> to represent them. This bounce buffer is added in the I/O vector which
> contains the rest of the mapped addresses and is later given to a
> readv()/writev() call.

Correct.

>>> 9. This leads to a partial read/write and the mapping loop will resume
>>>    once the partial read/write() has finished.
> 
> The MMIO region is the trigger for a partial read/write, but it's not
> the actual reason. The actual reason is that there is only *one*
> *global* bounce buffer. This means that if it's in use it or we
> need to use it twice, we will have to wait.

Yes.

>>>> However, it is not possible to do the same for ioctls.  This is actually
>>>> the reason why no one has ever tried to make scsi-generic do anything
>>>> but bounce-buffering. I think that your code breaks horribly in this
>>>> case, and I don't see a way to fix it, except for reverting to bounce
>>>> buffering.
> 
> Sure, you're right, there's no sensible way to break an ioctl()
> operation in many. However, I'd argue that we shouldn't need to, as it
> would be much better if the DMA operation was never restarted in the
> first place. Instead, if we dealt with the bigger issue of the global
> bounce buffer, we could kill two birds with one stone.
> 
> I see that there was an attempt [1] to replace the global bounce buffer
> with something more dynamic. In short, the objections [2] were the
> following:
> 
> 1. It introduced locking/unlocking a global mutex in the hot path as
>    well as a hash table lookup.
> 2. It allowed for unbounded memory allocations.
> 
> An improvement that would address (1) is to get rid of any global state:
> 
> Since the mapping operation takes place in the context of a DMA
> operation, we could provide a ctx-type struct to the dma_memory_(un)map
> --> address_space_(un)map functions that would contain a hash table. If
> any memory allocations were needed, we would track them using that hash
> table, which would require no locks. Moreover, if the initialization of
> the hash table hurts the performance in the general case, we could use
> instead a skip list, if the number of memory allocations is small (e.g.
> < 100).

You don't need a hash table either if you manage the bounce buffer list
per DMA transfer, and the simplest way to achieve that is to move the
bounce buffer from exec.c to dma-helpers.c entirely.

The patch could first introduce address_space_map_direct that never uses
the bounce buffer.  dma-helpers.c can call address_space_map_direct and,
if it fails, proceed to allocate (and fill if writing to the device) a
bounce buffer.  Since the QEMUSGList is mapped and unmapped
beginning-to-end, you can just use a FIFO queue.  The FIFO queue stores
a (QEMUSGList, buffer) tuple.  When unmapping a QEMUSGList you check if
it matches the head of the queue; if it does, you write back the
contents of the bounce buffer (for reads from the device) and free it.
If it doesn't match, you call address_space_unmap.

Then, once the bounce buffer is implemented within dma-helpers.c, you
remove address_space_map and rename address_space_map_direct to
address_space_map.  cpu_register_map_client goes away.

The unbounded memory allocation can be avoided by bounding the number of
entries in the queue.  In the case of scsi-generic you could just as
well allow INT_MAX entries, because scsi-generic would do unbounded
memory allocation anyway for the bounce buffer.

Modulo the "& BDRV_SECTOR_MASK" issue, this actually seems simpler than
what this series was doing.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
  2016-02-25 10:22           ` Paolo Bonzini
@ 2016-02-25 11:19             ` Alex Pyrgiotis
  2016-02-25 13:01               ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Pyrgiotis @ 2016-02-25 11:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, famz, qemu-devel

Hi Paolo,

On 02/25/2016 12:22 PM, Paolo Bonzini wrote:
> 
> 
> On 25/02/2016 11:10, Alex Pyrgiotis wrote:
>> 8<... snip ...>8
>>
>> Sure, you're right, there's no sensible way to break an ioctl()
>> operation in many. However, I'd argue that we shouldn't need to, as it
>> would be much better if the DMA operation was never restarted in the
>> first place. Instead, if we dealt with the bigger issue of the global
>> bounce buffer, we could kill two birds with one stone.
>>
>> I see that there was an attempt [1] to replace the global bounce buffer
>> with something more dynamic. In short, the objections [2] were the
>> following:
>>
>> 1. It introduced locking/unlocking a global mutex in the hot path as
>>    well as a hash table lookup.
>> 2. It allowed for unbounded memory allocations.
>>
>> An improvement that would address (1) is to get rid of any global state:
>>
>> Since the mapping operation takes place in the context of a DMA
>> operation, we could provide a ctx-type struct to the dma_memory_(un)map
>> --> address_space_(un)map functions that would contain a hash table. If
>> any memory allocations were needed, we would track them using that hash
>> table, which would require no locks. Moreover, if the initialization of
>> the hash table hurts the performance in the general case, we could use
>> instead a skip list, if the number of memory allocations is small (e.g.
>> < 100).
> 
> You don't need a hash table either if you manage the bounce buffer list
> per DMA transfer, and the simplest way to achieve that is to move the
> bounce buffer from exec.c to dma-helpers.c entirely.
> 
> The patch could first introduce address_space_map_direct that never uses
> the bounce buffer.  dma-helpers.c can call address_space_map_direct and,
> if it fails, proceed to allocate (and fill if writing to the device) a
> bounce buffer.

You mean that address_space_map_direct() is a copy of the
address_space_map() code, minus the bounce buffer part which will be
handled in dma-helpers.c? If so, I agree.

Also, the buffer should be removed from address_space_unmap.

> Since the QEMUSGList is mapped and unmapped
> beginning-to-end, you can just use a FIFO queue.  The FIFO queue stores
> a (QEMUSGList, buffer) tuple.

I suppose that this queue is stored in the dbs structure of dma-helpers?
If so, I agree.

> When unmapping a QEMUSGList you check if
> it matches the head of the queue; if it does, you write back the
> contents of the bounce buffer (for reads from the device) and free it.
> If it doesn't match, you call address_space_unmap.

Agree.

> Then, once the bounce buffer is implemented within dma-helpers.c, you
> remove address_space_map and rename address_space_map_direct to
> address_space_map.  cpu_register_map_client goes away.

What about the other users of address_space_map()? Is the bounce buffer
used only for DMA operations? If so, I agree. Else, we need a fallback.

> The unbounded memory allocation can be avoided by bounding the number of
> entries in the queue.

Agree.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
  2016-02-25 11:19             ` Alex Pyrgiotis
@ 2016-02-25 13:01               ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25 13:01 UTC (permalink / raw)
  To: Alex Pyrgiotis, Fam Zheng, Kevin Wolf, qemu-devel



On 25/02/2016 12:19, Alex Pyrgiotis wrote:
>> The patch could first introduce address_space_map_direct that never uses
>> the bounce buffer.  dma-helpers.c can call address_space_map_direct and,
>> if it fails, proceed to allocate (and fill if writing to the device) a
>> bounce buffer.
> 
> You mean that address_space_map_direct() is a copy of the
> address_space_map() code, minus the bounce buffer part which will be
> handled in dma-helpers.c? If so, I agree.

Yes.  In particular minus the:

    if (!memory_access_is_direct(mr, is_write))

part, hence the name. :)  If direct access isn't possible,
address_space_map_direct would always return NULL, like
address_space_map does when bounce.in_use is already true.

> Also, the buffer should be removed from address_space_unmap.

Right, though only after address_space_map_direct is renamed to
address_space_map.  That's also the point where cpu_register_map_client
disappears, and other parts of dma-helpers.c too such as the bottom half.

>> Since the QEMUSGList is mapped and unmapped
>> beginning-to-end, you can just use a FIFO queue.  The FIFO queue stores
>> a (QEMUSGList, buffer) tuple.
> 
> I suppose that this queue is stored in the dbs structure of dma-helpers?
> If so, I agree.

Yes.

>> Then, once the bounce buffer is implemented within dma-helpers.c, you
>> remove address_space_map and rename address_space_map_direct to
>> address_space_map.  cpu_register_map_client goes away.
> 
> What about the other users of address_space_map()? Is the bounce buffer
> used only for DMA operations? If so, I agree. Else, we need a fallback.

Other users usually fail if address_space_map cannot return a mapping as
big as requested.  They also do not use cpu_register_map_client.  Both
shortcomings mean that in practice they do not support the bounce buffer.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
  2016-02-25 10:10         ` Alex Pyrgiotis
  2016-02-25 10:22           ` Paolo Bonzini
@ 2016-02-26  9:20           ` Kevin Wolf
  1 sibling, 0 replies; 26+ messages in thread
From: Kevin Wolf @ 2016-02-26  9:20 UTC (permalink / raw)
  To: Alex Pyrgiotis; +Cc: Paolo Bonzini, famz, qemu-devel

Am 25.02.2016 um 11:10 hat Alex Pyrgiotis geschrieben:
> > I think the first step is understanding what to do about the weird "&
> > ~BDRV_SECTOR_MASK" case, then.
> 
> We can discuss about this case in the "dma-helpers: Do not truncate
> small qiovs" thread. I'm all for the removal of this check, but I was
> hoping that Kevin would clarify if it's still relevant.

I think all of dma-helpers.c is currently sector based, and as long as
it is, this code needs to stay. You need to look at it in relation to
the divisions by 512 that the same function performs - the block layer
simply expects that if you have an I/O request for one sector, your qiov
is 512 bytes in length and not 527. So whenever the integer division
rounds down, the qiov needs to be trimmed, too.

As soon as you convert the whole function to byte granularity (and the
block layer does have byte granularity APIs, so that shouldn't be a lot
of work), you don't have any implied rounding by integer divisions any
more and the qiov trimming can go away as well.

Kevin

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-02-26  9:20 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic Alex Pyrgiotis
2016-02-11 11:17   ` Paolo Bonzini
2016-02-19 11:50     ` Alex Pyrgiotis
2016-02-22 10:43       ` Paolo Bonzini
2016-02-25 10:10         ` Alex Pyrgiotis
2016-02-25 10:22           ` Paolo Bonzini
2016-02-25 11:19             ` Alex Pyrgiotis
2016-02-25 13:01               ` Paolo Bonzini
2016-02-26  9:20           ` Kevin Wolf
2015-12-16 16:55 ` [Qemu-devel] [PATCH 2/9] dma-helpers: Add support for ioctl operations Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 3/9] dma-helpers: Do not truncate small qiovs Alex Pyrgiotis
2016-02-11 11:08   ` Paolo Bonzini
2015-12-16 16:55 ` [Qemu-devel] [PATCH 4/9] scsi-generic: Add common functions Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 5/9] scsi-generic: Separate `sg_io_hdr' initializations Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 6/9] scsi-generic: Make request execution buf-specific Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 7/9] scsi-generic: Make data-copying logic clearer Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 8/9] scsi-generic: Factor out response interception Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 9/9] scsi-generic: Allow full scatter-gather support Alex Pyrgiotis
2015-12-16 18:16 ` [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Paolo Bonzini
2015-12-17  8:47   ` Alex Pyrgiotis
2015-12-17 10:31     ` Paolo Bonzini
2015-12-17 13:10       ` Alex Pyrgiotis
2015-12-17 13:13         ` Paolo Bonzini
2015-12-21 10:58           ` Alex Pyrgiotis
2016-01-11 13:30           ` Alex Pyrgiotis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).