All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block
@ 2016-05-23 12:54 Paolo Bonzini
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 1/7] dma-helpers: change interface to byte-based Paolo Bonzini
                   ` (8 more replies)
  0 siblings, 9 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-23 12:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

scsi-block uses the block layer for reads and writes in order to avoid
allocating bounce buffers as big as the transferred data.  We know how
to split a large transfer to multiple reads and writes, and thus we can
use scsi-disk.c's existing code to do I/O in multiple chunks (for non-s/g
SCSI hosts) or through the DMA helpers (for s/g SCSI hosts).

Unfortunately, this has the side effect of eating the SCSI status except
in the very few cases where we can convert an errno code back to a SCSI
status.  It puts a big wrench in persistent reservations support in the
guest, for example.

Luckily, splitting a large transfer into multiple SBC commands is just as
easy, and this is what the last patch does.  It takes the original CDB,
patches in a modified starting sector and sector count, and executes the
SCSI command through blk_aio_ioctl.  It is also easy to pass a QEMUIOVector
to SG_IO, so that s/g SCSI hosts keep the performance.

This rebases the patches on top of Eric's changes for byte-based
BlockBackend access and fixes a few bugs I knew about in the RFC.

Patches 1, 5 and 6 are new.

Paolo

Paolo Bonzini (7):
  dma-helpers: change interface to byte-based
  dma-helpers: change BlockBackend to opaque value in DMAIOFunc
  scsi-disk: introduce a common base class
  scsi-disk: introduce dma_readv and dma_writev
  scsi-disk: add need_fua_emulation to SCSIDiskClass
  scsi-disk: introduce scsi_disk_req_check_error
  scsi-block: always use SG_IO

 dma-helpers.c        |  54 +++++--
 hw/block/nvme.c      |   6 +-
 hw/ide/ahci.c        |   6 +-
 hw/ide/core.c        |  20 ++-
 hw/ide/internal.h    |   6 +-
 hw/ide/macio.c       |   2 +-
 hw/scsi/scsi-disk.c  | 409 +++++++++++++++++++++++++++++++++++++--------------
 include/sysemu/dma.h |  20 +--
 trace-events         |   2 +-
 9 files changed, 371 insertions(+), 154 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 1/7] dma-helpers: change interface to byte-based
  2016-05-23 12:54 [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Paolo Bonzini
@ 2016-05-23 12:54 ` Paolo Bonzini
  2016-05-23 15:06   ` Eric Blake
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 2/7] dma-helpers: change BlockBackend to opaque value in DMAIOFunc Paolo Bonzini
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-23 12:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 dma-helpers.c        | 14 +++++++-------
 hw/block/nvme.c      |  6 +++---
 hw/ide/ahci.c        |  6 ++++--
 hw/ide/core.c        |  8 +++++---
 hw/scsi/scsi-disk.c  |  6 ++++--
 include/sysemu/dma.h |  6 +++---
 trace-events         |  2 +-
 7 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index a6cc15f..af07aab 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -192,18 +192,18 @@ static const AIOCBInfo dma_aiocb_info = {
 };
 
 BlockAIOCB *dma_blk_io(
-    BlockBackend *blk, QEMUSGList *sg, uint64_t sector_num,
+    BlockBackend *blk, QEMUSGList *sg, uint64_t offset,
     DMAIOFunc *io_func, BlockCompletionFunc *cb,
     void *opaque, DMADirection dir)
 {
     DMAAIOCB *dbs = blk_aio_get(&dma_aiocb_info, blk, cb, opaque);
 
-    trace_dma_blk_io(dbs, blk, sector_num, (dir == DMA_DIRECTION_TO_DEVICE));
+    trace_dma_blk_io(dbs, blk, offset, (dir == DMA_DIRECTION_TO_DEVICE));
 
     dbs->acb = NULL;
     dbs->blk = blk;
     dbs->sg = sg;
-    dbs->offset = sector_num << BDRV_SECTOR_BITS;
+    dbs->offset = offset;
     dbs->sg_cur_index = 0;
     dbs->sg_cur_byte = 0;
     dbs->dir = dir;
@@ -216,18 +216,18 @@ BlockAIOCB *dma_blk_io(
 
 
 BlockAIOCB *dma_blk_read(BlockBackend *blk,
-                         QEMUSGList *sg, uint64_t sector,
+                         QEMUSGList *sg, uint64_t offset,
                          void (*cb)(void *opaque, int ret), void *opaque)
 {
-    return dma_blk_io(blk, sg, sector, blk_aio_preadv, cb, opaque,
+    return dma_blk_io(blk, sg, offset, blk_aio_preadv, cb, opaque,
                       DMA_DIRECTION_FROM_DEVICE);
 }
 
 BlockAIOCB *dma_blk_write(BlockBackend *blk,
-                          QEMUSGList *sg, uint64_t sector,
+                          QEMUSGList *sg, uint64_t offset,
                           void (*cb)(void *opaque, int ret), void *opaque)
 {
-    return dma_blk_io(blk, sg, sector, blk_aio_pwritev, cb, opaque,
+    return dma_blk_io(blk, sg, offset, blk_aio_pwritev, cb, opaque,
                       DMA_DIRECTION_TO_DEVICE);
 }
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 173988e..9faad29 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -239,7 +239,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     uint8_t lba_index  = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
     uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
     uint64_t data_size = (uint64_t)nlb << data_shift;
-    uint64_t aio_slba  = slba << (data_shift - BDRV_SECTOR_BITS);
+    uint64_t data_offset = slba << data_shift;
     int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
     enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
 
@@ -258,8 +258,8 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
     req->has_sg = true;
     dma_acct_start(n->conf.blk, &req->acct, &req->qsg, acct);
     req->aiocb = is_write ?
-        dma_blk_write(n->conf.blk, &req->qsg, aio_slba, nvme_rw_cb, req) :
-        dma_blk_read(n->conf.blk, &req->qsg, aio_slba, nvme_rw_cb, req);
+        dma_blk_write(n->conf.blk, &req->qsg, data_offset, nvme_rw_cb, req) :
+        dma_blk_read(n->conf.blk, &req->qsg, data_offset, nvme_rw_cb, req);
 
     return NVME_NO_COMPLETE;
 }
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index f244bc0..502d4f1 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1006,7 +1006,8 @@ static void execute_ncq_command(NCQTransferState *ncq_tfs)
         dma_acct_start(ide_state->blk, &ncq_tfs->acct,
                        &ncq_tfs->sglist, BLOCK_ACCT_READ);
         ncq_tfs->aiocb = dma_blk_read(ide_state->blk, &ncq_tfs->sglist,
-                                      ncq_tfs->lba, ncq_cb, ncq_tfs);
+                                      ncq_tfs->lba << BDRV_SECTOR_BITS,
+                                      ncq_cb, ncq_tfs);
         break;
     case WRITE_FPDMA_QUEUED:
         DPRINTF(port, "NCQ writing %d sectors to LBA %"PRId64", tag %d\n",
@@ -1018,7 +1019,8 @@ static void execute_ncq_command(NCQTransferState *ncq_tfs)
         dma_acct_start(ide_state->blk, &ncq_tfs->acct,
                        &ncq_tfs->sglist, BLOCK_ACCT_WRITE);
         ncq_tfs->aiocb = dma_blk_write(ide_state->blk, &ncq_tfs->sglist,
-                                       ncq_tfs->lba, ncq_cb, ncq_tfs);
+                                       ncq_tfs->lba << BDRV_SECTOR_BITS,
+                                       ncq_cb, ncq_tfs);
         break;
     default:
         DPRINTF(port, "error: unsupported NCQ command (0x%02x) received\n",
diff --git a/hw/ide/core.c b/hw/ide/core.c
index fe2bfba..7326189 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -799,6 +799,7 @@ static void ide_dma_cb(void *opaque, int ret)
     IDEState *s = opaque;
     int n;
     int64_t sector_num;
+    uint64_t offset;
     bool stay_active = false;
 
     if (ret == -ECANCELED) {
@@ -859,17 +860,18 @@ static void ide_dma_cb(void *opaque, int ret)
         return;
     }
 
+    offset = sector_num << BDRV_SECTOR_BITS;
     switch (s->dma_cmd) {
     case IDE_DMA_READ:
-        s->bus->dma->aiocb = dma_blk_read(s->blk, &s->sg, sector_num,
+        s->bus->dma->aiocb = dma_blk_read(s->blk, &s->sg, offset,
                                           ide_dma_cb, s);
         break;
     case IDE_DMA_WRITE:
-        s->bus->dma->aiocb = dma_blk_write(s->blk, &s->sg, sector_num,
+        s->bus->dma->aiocb = dma_blk_write(s->blk, &s->sg, offset,
                                            ide_dma_cb, s);
         break;
     case IDE_DMA_TRIM:
-        s->bus->dma->aiocb = dma_blk_io(s->blk, &s->sg, sector_num,
+        s->bus->dma->aiocb = dma_blk_io(s->blk, &s->sg, offset,
                                         ide_issue_trim, ide_dma_cb, s,
                                         DMA_DIRECTION_TO_DEVICE);
         break;
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index ce89c98..c374ea8 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -335,7 +335,8 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
     if (r->req.sg) {
         dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_READ);
         r->req.resid -= r->req.sg->size;
-        r->req.aiocb = dma_blk_read(s->qdev.conf.blk, r->req.sg, r->sector,
+        r->req.aiocb = dma_blk_read(s->qdev.conf.blk, r->req.sg,
+                                    r->sector << BDRV_SECTOR_BITS,
                                     scsi_dma_complete, r);
     } else {
         scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
@@ -539,7 +540,8 @@ static void scsi_write_data(SCSIRequest *req)
     if (r->req.sg) {
         dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_WRITE);
         r->req.resid -= r->req.sg->size;
-        r->req.aiocb = dma_blk_write(s->qdev.conf.blk, r->req.sg, r->sector,
+        r->req.aiocb = dma_blk_write(s->qdev.conf.blk, r->req.sg,
+                                     r->sector << BDRV_SECTOR_BITS,
                                      scsi_dma_complete, r);
     } else {
         block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index d6e96a4..f0e0c30 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -199,14 +199,14 @@ typedef BlockAIOCB *DMAIOFunc(BlockBackend *blk, int64_t offset,
                               BlockCompletionFunc *cb, void *opaque);
 
 BlockAIOCB *dma_blk_io(BlockBackend *blk,
-                       QEMUSGList *sg, uint64_t sector_num,
+                       QEMUSGList *sg, uint64_t offset,
                        DMAIOFunc *io_func, BlockCompletionFunc *cb,
                        void *opaque, DMADirection dir);
 BlockAIOCB *dma_blk_read(BlockBackend *blk,
-                         QEMUSGList *sg, uint64_t sector,
+                         QEMUSGList *sg, uint64_t offset,
                          BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *dma_blk_write(BlockBackend *blk,
-                          QEMUSGList *sg, uint64_t sector,
+                          QEMUSGList *sg, uint64_t offset,
                           BlockCompletionFunc *cb, void *opaque);
 uint64_t dma_buf_read(uint8_t *ptr, int32_t len, QEMUSGList *sg);
 uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg);
diff --git a/trace-events b/trace-events
index b53c354..e01b436 100644
--- a/trace-events
+++ b/trace-events
@@ -1143,7 +1143,7 @@ win_helper_done(uint32_t tl) "tl=%d"
 win_helper_retry(uint32_t tl) "tl=%d"
 
 # dma-helpers.c
-dma_blk_io(void *dbs, void *bs, int64_t sector_num, bool to_dev) "dbs=%p bs=%p sector_num=%" PRId64 " to_dev=%d"
+dma_blk_io(void *dbs, void *bs, int64_t offset, bool to_dev) "dbs=%p bs=%p offset=%" PRId64 " to_dev=%d"
 dma_aio_cancel(void *dbs) "dbs=%p"
 dma_complete(void *dbs, int ret, void *cb) "dbs=%p ret=%d cb=%p"
 dma_blk_cb(void *dbs, int ret) "dbs=%p ret=%d"
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 2/7] dma-helpers: change BlockBackend to opaque value in DMAIOFunc
  2016-05-23 12:54 [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Paolo Bonzini
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 1/7] dma-helpers: change interface to byte-based Paolo Bonzini
@ 2016-05-23 12:54 ` Paolo Bonzini
  2016-05-23 15:43   ` Eric Blake
  2016-05-24  2:47   ` Fam Zheng
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 3/7] scsi-disk: introduce a common base class Paolo Bonzini
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-23 12:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

Callers of dma_blk_io have no way to pass extra data to the DMAIOFunc,
because the original callback and opaque are gone by the time DMAIOFunc
is called.  On the other hand, the BlockBackend is usually derived
from those extra data that you could pass to the DMAIOFunc (in the
next patch, that would be the SCSIRequest).

So change DMAIOFunc's prototype, decoupling it from blk_aio_readv
and blk_aio_writev's.  The new prototype loses the BlockBackend
and gains an extra opaque value which, in the case of dma_blk_readv
and dma_blk_writev, is of course used for the BlockBackend.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 dma-helpers.c        | 48 +++++++++++++++++++++++++++++++++++-------------
 hw/ide/core.c        | 14 ++++++++------
 hw/ide/internal.h    |  6 +++---
 hw/ide/macio.c       |  2 +-
 include/sysemu/dma.h | 12 ++++++------
 5 files changed, 53 insertions(+), 29 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index af07aab..92c5d55 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -70,7 +70,7 @@ void qemu_sglist_destroy(QEMUSGList *qsg)
 
 typedef struct {
     BlockAIOCB common;
-    BlockBackend *blk;
+    AioContext *ctx;
     BlockAIOCB *acb;
     QEMUSGList *sg;
     uint64_t offset;
@@ -80,6 +80,7 @@ typedef struct {
     QEMUIOVector iov;
     QEMUBH *bh;
     DMAIOFunc *io_func;
+    void *io_func_opaque;
 } DMAAIOCB;
 
 static void dma_blk_cb(void *opaque, int ret);
@@ -154,8 +155,7 @@ static void dma_blk_cb(void *opaque, int ret)
 
     if (dbs->iov.size == 0) {
         trace_dma_map_wait(dbs);
-        dbs->bh = aio_bh_new(blk_get_aio_context(dbs->blk),
-                             reschedule_dma, dbs);
+        dbs->bh = aio_bh_new(dbs->ctx, reschedule_dma, dbs);
         cpu_register_map_client(dbs->bh);
         return;
     }
@@ -164,8 +164,8 @@ static void dma_blk_cb(void *opaque, int ret)
         qemu_iovec_discard_back(&dbs->iov, dbs->iov.size & ~BDRV_SECTOR_MASK);
     }
 
-    dbs->acb = dbs->io_func(dbs->blk, dbs->offset, &dbs->iov, 0,
-                            dma_blk_cb, dbs);
+    dbs->acb = dbs->io_func(dbs->offset, &dbs->iov,
+                            dma_blk_cb, dbs, dbs->io_func_opaque);
     assert(dbs->acb);
 }
 
@@ -191,23 +191,25 @@ static const AIOCBInfo dma_aiocb_info = {
     .cancel_async       = dma_aio_cancel,
 };
 
-BlockAIOCB *dma_blk_io(
-    BlockBackend *blk, QEMUSGList *sg, uint64_t offset,
-    DMAIOFunc *io_func, BlockCompletionFunc *cb,
+BlockAIOCB *dma_blk_io(AioContext *ctx,
+    QEMUSGList *sg, uint64_t opaque,
+    DMAIOFunc *io_func, void *io_func_opaque,
+    BlockCompletionFunc *cb,
     void *opaque, DMADirection dir)
 {
-    DMAAIOCB *dbs = blk_aio_get(&dma_aiocb_info, blk, cb, opaque);
+    DMAAIOCB *dbs = qemu_aio_get(&dma_aiocb_info, NULL, cb, opaque);
 
-    trace_dma_blk_io(dbs, blk, offset, (dir == DMA_DIRECTION_TO_DEVICE));
+    trace_dma_blk_io(dbs, io_func_opaque, offset, (dir == DMA_DIRECTION_TO_DEVICE));
 
     dbs->acb = NULL;
-    dbs->blk = blk;
     dbs->sg = sg;
+    dbs->ctx = ctx;
     dbs->offset = offset;
     dbs->sg_cur_index = 0;
     dbs->sg_cur_byte = 0;
     dbs->dir = dir;
     dbs->io_func = io_func;
+    dbs->io_func_opaque = io_func_opaque;
     dbs->bh = NULL;
     qemu_iovec_init(&dbs->iov, sg->nsg);
     dma_blk_cb(dbs, 0);
@@ -215,19 +217,39 @@ BlockAIOCB *dma_blk_io(
 }
 
 
+static
+BlockAIOCB *dma_blk_read_io_func(int64_t offset, QEMUIOVector *iov,
+                                 BlockCompletionFunc *cb, void *cb_opaque,
+                                 void *opaque)
+{
+    BlockBackend *blk = opaque;
+    return blk_aio_preadv(blk, offset, iov, 0, cb, cb_opaque);
+}
+
 BlockAIOCB *dma_blk_read(BlockBackend *blk,
                          QEMUSGList *sg, uint64_t offset,
                          void (*cb)(void *opaque, int ret), void *opaque)
 {
-    return dma_blk_io(blk, sg, offset, blk_aio_preadv, cb, opaque,
+    return dma_blk_io(blk_get_aio_context(blk),
+                      sg, offset, dma_blk_read_io_func, blk, cb, opaque,
                       DMA_DIRECTION_FROM_DEVICE);
 }
 
+static
+BlockAIOCB *dma_blk_write_io_func(int64_t offset, QEMUIOVector *iov,
+                                  BlockCompletionFunc *cb, void *cb_opaque,
+                                  void *opaque)
+{
+    BlockBackend *blk = opaque;
+    return blk_aio_pwritev(blk, offset, iov, 0, cb, cb_opaque);
+}
+
 BlockAIOCB *dma_blk_write(BlockBackend *blk,
                           QEMUSGList *sg, uint64_t offset,
                           void (*cb)(void *opaque, int ret), void *opaque)
 {
-    return dma_blk_io(blk, sg, offset, blk_aio_pwritev, cb, opaque,
+    return dma_blk_io(blk_get_aio_context(blk),
+                      sg, offset, dma_blk_write_io_func, blk, cb, opaque,
                       DMA_DIRECTION_TO_DEVICE);
 }
 
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 7326189..029f6b9 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -441,13 +441,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
     }
 }
 
-BlockAIOCB *ide_issue_trim(BlockBackend *blk,
-        int64_t offset, QEMUIOVector *qiov, BdrvRequestFlags flags,
-        BlockCompletionFunc *cb, void *opaque)
+BlockAIOCB *ide_issue_trim(
+        int64_t offset, QEMUIOVector *qiov,
+        BlockCompletionFunc *cb, void *cb_opaque, void *opaque)
 {
+    BlockBackend *blk = opaque;
     TrimAIOCB *iocb;
 
-    iocb = blk_aio_get(&trim_aiocb_info, blk, cb, opaque);
+    iocb = blk_aio_get(&trim_aiocb_info, blk, cb, cb_opaque);
     iocb->blk = blk;
     iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);
     iocb->ret = 0;
@@ -871,8 +872,9 @@ static void ide_dma_cb(void *opaque, int ret)
                                            ide_dma_cb, s);
         break;
     case IDE_DMA_TRIM:
-        s->bus->dma->aiocb = dma_blk_io(s->blk, &s->sg, offset,
-                                        ide_issue_trim, ide_dma_cb, s,
+        s->bus->dma->aiocb = dma_blk_io(blk_get_aio_context(s->blk),
+                                        &s->sg, offset,
+                                        ide_issue_trim, s->blk, ide_dma_cb, s,
                                         DMA_DIRECTION_TO_DEVICE);
         break;
     default:
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index ceb9e59..773928a 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -613,9 +613,9 @@ void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
                         EndTransferFunc *end_transfer_func);
 void ide_transfer_stop(IDEState *s);
 void ide_set_inactive(IDEState *s, bool more);
-BlockAIOCB *ide_issue_trim(BlockBackend *blk,
-        int64_t offset, QEMUIOVector *qiov, BdrvRequestFlags flags,
-        BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *ide_issue_trim(
+        int64_t offset, QEMUIOVector *qiov,
+        BlockCompletionFunc *cb, void *cb_opaque, void *opaque);
 BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t sector_num,
                                QEMUIOVector *iov, int nb_sectors,
                                BlockCompletionFunc *cb, void *opaque);
diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index d7d9c0f..42ad68a 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -230,7 +230,7 @@ static void pmac_dma_trim(BlockBackend *blk,
     s->io_buffer_index += io->len;
     io->len = 0;
 
-    s->bus->dma->aiocb = ide_issue_trim(blk, offset, &io->iov, 0, cb, io);
+    s->bus->dma->aiocb = ide_issue_trim(offset, &io->iov, cb, io, blk);
 }
 
 static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index f0e0c30..34c8eaf 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -194,14 +194,14 @@ void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
 #endif
 
-typedef BlockAIOCB *DMAIOFunc(BlockBackend *blk, int64_t offset,
-                              QEMUIOVector *iov, BdrvRequestFlags flags,
-                              BlockCompletionFunc *cb, void *opaque);
+typedef BlockAIOCB *DMAIOFunc(int64_t offset, QEMUIOVector *iov,
+                              BlockCompletionFunc *cb, void *cb_opaque,
+                              void *opaque);
 
-BlockAIOCB *dma_blk_io(BlockBackend *blk,
+BlockAIOCB *dma_blk_io(AioContext *ctx,
                        QEMUSGList *sg, uint64_t offset,
-                       DMAIOFunc *io_func, BlockCompletionFunc *cb,
-                       void *opaque, DMADirection dir);
+                       DMAIOFunc *io_func, void *io_func_opaque,
+                       BlockCompletionFunc *cb, void *opaque, DMADirection dir);
 BlockAIOCB *dma_blk_read(BlockBackend *blk,
                          QEMUSGList *sg, uint64_t offset,
                          BlockCompletionFunc *cb, void *opaque);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 3/7] scsi-disk: introduce a common base class
  2016-05-23 12:54 [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Paolo Bonzini
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 1/7] dma-helpers: change interface to byte-based Paolo Bonzini
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 2/7] dma-helpers: change BlockBackend to opaque value in DMAIOFunc Paolo Bonzini
@ 2016-05-23 12:54 ` Paolo Bonzini
  2016-05-23 15:53   ` Eric Blake
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 4/7] scsi-disk: introduce dma_readv and dma_writev Paolo Bonzini
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-23 12:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

This will be the place to add DMAIOFuncs in the next patch.  There
are also a couple DeviceClass members that can be moved to the
abstract class's initialization function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/scsi-disk.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index c374ea8..eaadfd6 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -53,6 +53,8 @@ do { printf("scsi-disk: " fmt , ## __VA_ARGS__); } while (0)
 #define DEFAULT_MAX_UNMAP_SIZE      (1 << 30)   /* 1 GB */
 #define DEFAULT_MAX_IO_SIZE         INT_MAX     /* 2 GB - 1 block */
 
+#define TYPE_SCSI_DISK_BASE         "scsi-disk-base"
+
 typedef struct SCSIDiskState SCSIDiskState;
 
 typedef struct SCSIDiskReq {
@@ -2656,6 +2658,21 @@ static int scsi_block_parse_cdb(SCSIDevice *d, SCSICommand *cmd,
 
 #endif
 
+static void scsi_disk_base_class_initfn(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->fw_name = "disk";
+    dc->reset = scsi_disk_reset;
+}
+
+static const TypeInfo scsi_disk_base_info = {
+    .name          = TYPE_SCSI_DISK_BASE,
+    .parent        = TYPE_SCSI_DEVICE,
+    .class_init    = scsi_disk_base_class_initfn,
+    .instance_size = sizeof(SCSIDiskState),
+};
+
 #define DEFINE_SCSI_DISK_PROPERTIES()                                \
     DEFINE_BLOCK_PROPERTIES(SCSIDiskState, qdev.conf),               \
     DEFINE_PROP_STRING("ver", SCSIDiskState, version),               \
@@ -2703,17 +2720,14 @@ static void scsi_hd_class_initfn(ObjectClass *klass, void *data)
     sc->realize      = scsi_hd_realize;
     sc->alloc_req    = scsi_new_request;
     sc->unit_attention_reported = scsi_disk_unit_attention_reported;
-    dc->fw_name = "disk";
     dc->desc = "virtual SCSI disk";
-    dc->reset = scsi_disk_reset;
     dc->props = scsi_hd_properties;
     dc->vmsd  = &vmstate_scsi_disk_state;
 }
 
 static const TypeInfo scsi_hd_info = {
     .name          = "scsi-hd",
-    .parent        = TYPE_SCSI_DEVICE,
-    .instance_size = sizeof(SCSIDiskState),
+    .parent        = TYPE_SCSI_DISK_BASE,
     .class_init    = scsi_hd_class_initfn,
 };
 
@@ -2735,17 +2749,14 @@ static void scsi_cd_class_initfn(ObjectClass *klass, void *data)
     sc->realize      = scsi_cd_realize;
     sc->alloc_req    = scsi_new_request;
     sc->unit_attention_reported = scsi_disk_unit_attention_reported;
-    dc->fw_name = "disk";
     dc->desc = "virtual SCSI CD-ROM";
-    dc->reset = scsi_disk_reset;
     dc->props = scsi_cd_properties;
     dc->vmsd  = &vmstate_scsi_disk_state;
 }
 
 static const TypeInfo scsi_cd_info = {
     .name          = "scsi-cd",
-    .parent        = TYPE_SCSI_DEVICE,
-    .instance_size = sizeof(SCSIDiskState),
+    .parent        = TYPE_SCSI_DISK_BASE,
     .class_init    = scsi_cd_class_initfn,
 };
 
@@ -2763,17 +2774,14 @@ static void scsi_block_class_initfn(ObjectClass *klass, void *data)
     sc->realize      = scsi_block_realize;
     sc->alloc_req    = scsi_block_new_request;
     sc->parse_cdb    = scsi_block_parse_cdb;
-    dc->fw_name = "disk";
     dc->desc = "SCSI block device passthrough";
-    dc->reset = scsi_disk_reset;
     dc->props = scsi_block_properties;
     dc->vmsd  = &vmstate_scsi_disk_state;
 }
 
 static const TypeInfo scsi_block_info = {
     .name          = "scsi-block",
-    .parent        = TYPE_SCSI_DEVICE,
-    .instance_size = sizeof(SCSIDiskState),
+    .parent        = TYPE_SCSI_DISK_BASE,
     .class_init    = scsi_block_class_initfn,
 };
 #endif
@@ -2811,13 +2819,13 @@ static void scsi_disk_class_initfn(ObjectClass *klass, void *data)
 
 static const TypeInfo scsi_disk_info = {
     .name          = "scsi-disk",
-    .parent        = TYPE_SCSI_DEVICE,
-    .instance_size = sizeof(SCSIDiskState),
+    .parent        = TYPE_SCSI_DISK_BASE,
     .class_init    = scsi_disk_class_initfn,
 };
 
 static void scsi_disk_register_types(void)
 {
+    type_register_static(&scsi_disk_base_info);
     type_register_static(&scsi_hd_info);
     type_register_static(&scsi_cd_info);
 #ifdef __linux__
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 4/7] scsi-disk: introduce dma_readv and dma_writev
  2016-05-23 12:54 [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 3/7] scsi-disk: introduce a common base class Paolo Bonzini
@ 2016-05-23 12:54 ` Paolo Bonzini
  2016-05-23 16:09   ` Eric Blake
  2016-06-01 19:07   ` Mark Cave-Ayland
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 5/7] scsi-disk: add need_fua_emulation to SCSIDiskClass Paolo Bonzini
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-23 12:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

These are replacements for blk_aio_preadv and blk_aio_pwritev that allow
customization of the data path.  They reuse the DMA helpers' DMAIOFunc
callback type, so that the same function can be used in either the
QEMUSGList or the bounce-buffered case.

This customization will be needed in the next patch to do zero-copy
SG_IO on scsi-block.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/scsi-disk.c | 63 +++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 52 insertions(+), 11 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index eaadfd6..4b5db59 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -55,7 +55,21 @@ do { printf("scsi-disk: " fmt , ## __VA_ARGS__); } while (0)
 
 #define TYPE_SCSI_DISK_BASE         "scsi-disk-base"
 
+#define SCSI_DISK_BASE(obj) \
+     OBJECT_CHECK(SCSIDiskState, (obj), TYPE_SCSI_DISK_BASE)
+#define SCSI_DISK_BASE_CLASS(klass) \
+     OBJECT_CLASS_CHECK(SCSIDiskClass, (klass), TYPE_SCSI_DISK_BASE)
+#define SCSI_DISK_BASE_GET_CLASS(obj) \
+     OBJECT_GET_CLASS(SCSIDiskClass, (obj), TYPE_SCSI_DISK_BASE)
+
 typedef struct SCSIDiskState SCSIDiskState;
+typedef struct SCSIDiskClass SCSIDiskClass;
+
+typedef struct SCSIDiskClass {
+    SCSIDeviceClass parent_class;
+    DMAIOFunc       *dma_readv;
+    DMAIOFunc       *dma_writev;
+} SCSIDiskClass;
 
 typedef struct SCSIDiskReq {
     SCSIRequest req;
@@ -317,6 +331,7 @@ done:
 static void scsi_do_read(SCSIDiskReq *r, int ret)
 {
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+    SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
 
     assert (r->req.aiocb == NULL);
 
@@ -337,16 +352,16 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
     if (r->req.sg) {
         dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_READ);
         r->req.resid -= r->req.sg->size;
-        r->req.aiocb = dma_blk_read(s->qdev.conf.blk, r->req.sg,
-                                    r->sector << BDRV_SECTOR_BITS,
-                                    scsi_dma_complete, r);
+        r->req.aiocb = dma_blk_io(blk_get_aio_context(s->qdev.conf.blk),
+                                  r->req.sg, r->sector << BDRV_SECTOR_BITS,
+                                  sdc->dma_readv, r, scsi_dma_complete, r,
+                                  DMA_DIRECTION_FROM_DEVICE);
     } else {
         scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
         block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
                          r->qiov.size, BLOCK_ACCT_READ);
-        r->req.aiocb = blk_aio_preadv(s->qdev.conf.blk,
-                                      r->sector << BDRV_SECTOR_BITS, &r->qiov,
-                                      0, scsi_read_complete, r);
+        r->req.aiocb = sdc->dma_readv(r->sector, &r->qiov,
+                                      scsi_read_complete, r, r);
     }
 
 done:
@@ -506,6 +521,7 @@ static void scsi_write_data(SCSIRequest *req)
 {
     SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+    SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
 
     /* No data transfer may already be in progress */
     assert(r->req.aiocb == NULL);
@@ -542,15 +558,16 @@ static void scsi_write_data(SCSIRequest *req)
     if (r->req.sg) {
         dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_WRITE);
         r->req.resid -= r->req.sg->size;
-        r->req.aiocb = dma_blk_write(s->qdev.conf.blk, r->req.sg,
-                                     r->sector << BDRV_SECTOR_BITS,
                                      scsi_dma_complete, r);
+        r->req.aiocb = dma_blk_io(blk_get_aio_context(s->qdev.conf.blk),
+                                  r->req.sg, r->sector << BDRV_SECTOR_BITS,
+                                  sdc->dma_writev, r, scsi_dma_complete, r,
+                                  DMA_DIRECTION_TO_DEVICE);
     } else {
         block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
                          r->qiov.size, BLOCK_ACCT_WRITE);
-        r->req.aiocb = blk_aio_pwritev(s->qdev.conf.blk,
-                                       r->sector << BDRV_SECTOR_BITS, &r->qiov,
-                                       0, scsi_write_complete, r);
+        r->req.aiocb = sdc->dma_writev(r->sector << BDRV_SECTOR_BITS, &r->qiov,
+                                       scsi_write_complete, r, r);
     }
 }
 
@@ -2658,12 +2675,35 @@ static int scsi_block_parse_cdb(SCSIDevice *d, SCSICommand *cmd,
 
 #endif
 
+static
+BlockAIOCB *scsi_dma_readv(int64_t offset, QEMUIOVector *iov,
+                           BlockCompletionFunc *cb, void *cb_opaque,
+                           void *opaque)
+{
+    SCSIDiskReq *r = opaque;
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+    return blk_aio_preadv(s->qdev.conf.blk, offset, iov, 0, cb, cb_opaque);
+}
+
+static
+BlockAIOCB *scsi_dma_writev(int64_t offset, QEMUIOVector *iov,
+                            BlockCompletionFunc *cb, void *cb_opaque,
+                            void *opaque)
+{
+    SCSIDiskReq *r = opaque;
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+    return blk_aio_pwritev(s->qdev.conf.blk, offset, iov, 0, cb, cb_opaque);
+}
+
 static void scsi_disk_base_class_initfn(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass);
 
     dc->fw_name = "disk";
     dc->reset = scsi_disk_reset;
+    sdc->dma_readv = scsi_dma_readv;
+    sdc->dma_writev = scsi_dma_writev;
 }
 
 static const TypeInfo scsi_disk_base_info = {
@@ -2671,6 +2711,7 @@ static const TypeInfo scsi_disk_base_info = {
     .parent        = TYPE_SCSI_DEVICE,
     .class_init    = scsi_disk_base_class_initfn,
     .instance_size = sizeof(SCSIDiskState),
+    .class_size    = sizeof(SCSIDiskClass),
 };
 
 #define DEFINE_SCSI_DISK_PROPERTIES()                                \
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 5/7] scsi-disk: add need_fua_emulation to SCSIDiskClass
  2016-05-23 12:54 [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Paolo Bonzini
                   ` (3 preceding siblings ...)
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 4/7] scsi-disk: introduce dma_readv and dma_writev Paolo Bonzini
@ 2016-05-23 12:54 ` Paolo Bonzini
  2016-05-23 16:34   ` Eric Blake
  2016-05-24  3:04   ` Fam Zheng
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 6/7] scsi-disk: introduce scsi_disk_req_check_error Paolo Bonzini
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-23 12:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

scsi-block will be able to do FUA just by passing the request through
to the LUN (which is also more efficient); there is no need to emulate
it like we do for scsi-disk.

Add a new method to distinguish this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/scsi-disk.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 4b5db59..f9870e9 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -69,6 +69,7 @@ typedef struct SCSIDiskClass {
     SCSIDeviceClass parent_class;
     DMAIOFunc       *dma_readv;
     DMAIOFunc       *dma_writev;
+    bool            (*need_fua_emulation)(SCSICommand *cmd);
 } SCSIDiskClass;
 
 typedef struct SCSIDiskReq {
@@ -78,6 +79,7 @@ typedef struct SCSIDiskReq {
     uint32_t sector_count;
     uint32_t buflen;
     bool started;
+    bool need_fua_emulation;
     struct iovec iov;
     QEMUIOVector qiov;
     BlockAcctCookie acct;
@@ -239,7 +241,7 @@ static void scsi_write_do_fua(SCSIDiskReq *r)
         goto done;
     }
 
-    if (scsi_is_cmd_fua(&r->req.cmd)) {
+    if (r->need_fua_emulation) {
         block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct, 0,
                          BLOCK_ACCT_FLUSH);
         r->req.aiocb = blk_aio_flush(s->qdev.conf.blk, scsi_aio_complete, r);
@@ -416,7 +418,7 @@ static void scsi_read_data(SCSIRequest *req)
 
     first = !r->started;
     r->started = true;
-    if (first && scsi_is_cmd_fua(&r->req.cmd)) {
+    if (first && r->need_fua_emulation) {
         block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct, 0,
                          BLOCK_ACCT_FLUSH);
         r->req.aiocb = blk_aio_flush(s->qdev.conf.blk, scsi_do_read_cb, r);
@@ -2157,6 +2159,7 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, uint8_t *buf)
 {
     SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
+    SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
     uint32_t len;
     uint8_t command;
 
@@ -2215,6 +2218,7 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, uint8_t *buf)
         scsi_check_condition(r, SENSE_CODE(LBA_OUT_OF_RANGE));
         return 0;
     }
+    r->need_fua_emulation = sdc->need_fua_emulation(&r->req.cmd);
     if (r->sector_count == 0) {
         scsi_req_complete(&r->req, GOOD);
     }
@@ -2704,6 +2708,7 @@ static void scsi_disk_base_class_initfn(ObjectClass *klass, void *data)
     dc->reset = scsi_disk_reset;
     sdc->dma_readv = scsi_dma_readv;
     sdc->dma_writev = scsi_dma_writev;
+    sdc->need_fua_emulation = scsi_is_cmd_fua;
 }
 
 static const TypeInfo scsi_disk_base_info = {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 6/7] scsi-disk: introduce scsi_disk_req_check_error
  2016-05-23 12:54 [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Paolo Bonzini
                   ` (4 preceding siblings ...)
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 5/7] scsi-disk: add need_fua_emulation to SCSIDiskClass Paolo Bonzini
@ 2016-05-23 12:54 ` Paolo Bonzini
  2016-05-23 19:16   ` Eric Blake
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 7/7] scsi-block: always use SG_IO Paolo Bonzini
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-23 12:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

Commonize all the checks for canceled requests and errors.  The next patch
will add another case to check for, in order to handle passthrough commands.

There is no semantic change here; the only nontrivial modification is in
scsi_write_do_fua, where cancellation has been checked earlier by both
callers.  Thus, the check is replaced with an assertion.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/scsi-disk.c | 89 +++++++++++++----------------------------------------
 1 file changed, 22 insertions(+), 67 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index f9870e9..f823781 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -179,6 +179,20 @@ static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req)
     qemu_iovec_init_external(&r->qiov, &r->iov, 1);
 }
 
+static bool scsi_disk_req_check_error(SCSIDiskReq *r, int ret, bool acct_failed)
+{
+    if (r->req.io_canceled) {
+        scsi_req_cancel_complete(&r->req);
+        return true;
+    }
+
+    if (ret < 0) {
+        return scsi_handle_rw_error(r, -ret, acct_failed);
+    }
+
+    return false;
+}
+
 static void scsi_aio_complete(void *opaque, int ret)
 {
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
@@ -186,17 +200,10 @@ static void scsi_aio_complete(void *opaque, int ret)
 
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
-    if (r->req.io_canceled) {
-        scsi_req_cancel_complete(&r->req);
+    if (scsi_disk_req_check_error(r, ret, true)) {
         goto done;
     }
 
-    if (ret < 0) {
-        if (scsi_handle_rw_error(r, -ret, true)) {
-            goto done;
-        }
-    }
-
     block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
     scsi_req_complete(&r->req, GOOD);
 
@@ -235,11 +242,7 @@ static void scsi_write_do_fua(SCSIDiskReq *r)
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
     assert(r->req.aiocb == NULL);
-
-    if (r->req.io_canceled) {
-        scsi_req_cancel_complete(&r->req);
-        goto done;
-    }
+    assert(!r->req.io_canceled);
 
     if (r->need_fua_emulation) {
         block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct, 0,
@@ -249,26 +252,16 @@ static void scsi_write_do_fua(SCSIDiskReq *r)
     }
 
     scsi_req_complete(&r->req, GOOD);
-
-done:
     scsi_req_unref(&r->req);
 }
 
 static void scsi_dma_complete_noio(SCSIDiskReq *r, int ret)
 {
     assert(r->req.aiocb == NULL);
-
-    if (r->req.io_canceled) {
-        scsi_req_cancel_complete(&r->req);
+    if (scsi_disk_req_check_error(r, ret, false)) {
         goto done;
     }
 
-    if (ret < 0) {
-        if (scsi_handle_rw_error(r, -ret, false)) {
-            goto done;
-        }
-    }
-
     r->sector += r->sector_count;
     r->sector_count = 0;
     if (r->req.cmd.mode == SCSI_XFER_TO_DEV) {
@@ -306,17 +299,10 @@ static void scsi_read_complete(void * opaque, int ret)
 
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
-    if (r->req.io_canceled) {
-        scsi_req_cancel_complete(&r->req);
+    if (scsi_disk_req_check_error(r, ret, true)) {
         goto done;
     }
 
-    if (ret < 0) {
-        if (scsi_handle_rw_error(r, -ret, true)) {
-            goto done;
-        }
-    }
-
     block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
     DPRINTF("Data ready tag=0x%x len=%zd\n", r->req.tag, r->qiov.size);
 
@@ -336,18 +322,10 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
     SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
 
     assert (r->req.aiocb == NULL);
-
-    if (r->req.io_canceled) {
-        scsi_req_cancel_complete(&r->req);
+    if (scsi_disk_req_check_error(r, ret, false)) {
         goto done;
     }
 
-    if (ret < 0) {
-        if (scsi_handle_rw_error(r, -ret, false)) {
-            goto done;
-        }
-    }
-
     /* The request is used as the AIO opaque value, so add a ref.  */
     scsi_req_ref(&r->req);
 
@@ -475,18 +453,10 @@ static void scsi_write_complete_noio(SCSIDiskReq *r, int ret)
     uint32_t n;
 
     assert (r->req.aiocb == NULL);
-
-    if (r->req.io_canceled) {
-        scsi_req_cancel_complete(&r->req);
+    if (scsi_disk_req_check_error(r, ret, false)) {
         goto done;
     }
 
-    if (ret < 0) {
-        if (scsi_handle_rw_error(r, -ret, false)) {
-            goto done;
-        }
-    }
-
     n = r->qiov.size / 512;
     r->sector += n;
     r->sector_count -= n;
@@ -1621,18 +1591,10 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, int ret)
     uint32_t nb_sectors;
 
     assert(r->req.aiocb == NULL);
-
-    if (r->req.io_canceled) {
-        scsi_req_cancel_complete(&r->req);
+    if (scsi_disk_req_check_error(r, ret, false)) {
         goto done;
     }
 
-    if (ret < 0) {
-        if (scsi_handle_rw_error(r, -ret, false)) {
-            goto done;
-        }
-    }
-
     if (data->count > 0) {
         sector_num = ldq_be_p(&data->inbuf[0]);
         nb_sectors = ldl_be_p(&data->inbuf[8]) & 0xffffffffULL;
@@ -1732,17 +1694,10 @@ static void scsi_write_same_complete(void *opaque, int ret)
 
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
-    if (r->req.io_canceled) {
-        scsi_req_cancel_complete(&r->req);
+    if (scsi_disk_req_check_error(r, ret, true)) {
         goto done;
     }
 
-    if (ret < 0) {
-        if (scsi_handle_rw_error(r, -ret, true)) {
-            goto done;
-        }
-    }
-
     block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
 
     data->nb_sectors -= data->iov.iov_len / 512;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 7/7] scsi-block: always use SG_IO
  2016-05-23 12:54 [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Paolo Bonzini
                   ` (5 preceding siblings ...)
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 6/7] scsi-disk: introduce scsi_disk_req_check_error Paolo Bonzini
@ 2016-05-23 12:54 ` Paolo Bonzini
  2016-05-23 19:49   ` Eric Blake
  2016-05-23 19:36 ` [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Mark Cave-Ayland
  2016-05-25 16:38 ` [Qemu-devel] [Qemu-block] " Kevin Wolf
  8 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-23 12:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

Using pread/pwrite or io_submit has the advantage of eliminating the
bounce buffer, but drops the SCSI status.  This keeps the guest from
seeing unit attention codes, as well as statuses such as RESERVATION
CONFLICT.  Because we know scsi-block operates on an SBC device we can
still use the DMA helpers with SG_IO; just remember to patch the CDBs
if the transfer is split into multiple segments.

This means that scsi-block will always use the thread-pool unfortunately,
instead of respecting aio=native.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/scsi-disk.c | 214 +++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 196 insertions(+), 18 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index f823781..c0eefc0 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -83,6 +83,7 @@ typedef struct SCSIDiskReq {
     struct iovec iov;
     QEMUIOVector qiov;
     BlockAcctCookie acct;
+    unsigned char *status;
 } SCSIDiskReq;
 
 #define SCSI_DISK_F_REMOVABLE             0
@@ -190,6 +191,15 @@ static bool scsi_disk_req_check_error(SCSIDiskReq *r, int ret, bool acct_failed)
         return scsi_handle_rw_error(r, -ret, acct_failed);
     }
 
+    if (r->status && *r->status) {
+        if (acct_failed) {
+            SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+            block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
+        }
+        scsi_req_complete(&r->req, *r->status);
+        return true;
+    }
+
     return false;
 }
 
@@ -2556,16 +2566,145 @@ static void scsi_block_realize(SCSIDevice *dev, Error **errp)
     scsi_generic_read_device_identification(&s->qdev);
 }
 
+typedef struct SCSIBlockReq {
+    SCSIDiskReq req;
+    sg_io_hdr_t io_header;
+
+    /* Selected bytes of the original CDB, to be copied into our own CDB.  */
+    uint8_t cmd, cdb1, group_number;
+
+    /* CDB passed to SG_IO.  */
+    uint8_t cdb[16];
+} SCSIBlockReq;
+
+static BlockAIOCB *scsi_block_do_sgio(SCSIBlockReq *req,
+                                      int64_t offset, QEMUIOVector *iov,
+                                      int direction,
+                                      BlockCompletionFunc *cb, void *opaque)
+{
+    sg_io_hdr_t *io_header = &req->io_header;
+    SCSIDiskReq *r = &req->req;
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+    int nb_logical_blocks;
+    uint64_t lba;
+    BlockAIOCB *aiocb;
+
+    /* This is not supported yet.  It can only happen if the guest does
+     * reads and writes that are not aligned to one logical sectors
+     * _and_ cover multiple MemoryRegions.
+     */
+    assert(offset % s->qdev.blocksize == 0);
+    assert(iov->size % s->qdev.blocksize == 0);
+
+    io_header->interface_id = 'S';
+
+    /* The data transfer comes from the QEMUIOVector.  */
+    io_header->dxfer_direction = direction;
+    io_header->dxfer_len = iov->size;
+    io_header->dxferp = (void *)iov->iov;
+    io_header->iovec_count = iov->niov;
+    assert(io_header->iovec_count == iov->niov); /* no overflow! */
+
+    /* Build a new CDB with the LBA and length patched in, in case
+     * DMA helpers split the transfer in multiple segments.  Do not
+     * build a CDB smaller than what the guest wanted, and only build
+     * a larger one if strictly necessary.
+     */
+    io_header->cmdp = req->cdb;
+    lba = offset / s->qdev.blocksize;
+    nb_logical_blocks = io_header->dxfer_len / s->qdev.blocksize;
+
+    if ((req->cmd >> 5) == 0 && lba <= 0x1ffff) {
+        /* 6-byte CDB */
+        stl_be_p(&req->cdb[0], lba | (req->cmd << 24));
+        req->cdb[4] = nb_logical_blocks;
+        req->cdb[5] = 0;
+        io_header->cmd_len = 6;
+    } else if ((req->cmd >> 5) <= 1 && lba <= 0xffffffffULL) {
+        /* 10-byte CDB */
+        req->cdb[0] = (req->cmd & 0x1f) | 0x20;
+        req->cdb[1] = req->cdb1;
+        stl_be_p(&req->cdb[2], lba);
+        req->cdb[6] = req->group_number;
+        stw_be_p(&req->cdb[7], nb_logical_blocks);
+        req->cdb[9] = 0;
+        io_header->cmd_len = 10;
+    } else if ((req->cmd >> 5) != 4 && lba <= 0xffffffffULL) {
+        /* 12-byte CDB */
+        req->cdb[0] = (req->cmd & 0x1f) | 0xA0;
+        req->cdb[1] = req->cdb1;
+        stl_be_p(&req->cdb[2], lba);
+        stl_be_p(&req->cdb[6], nb_logical_blocks);
+        req->cdb[10] = req->group_number;
+        req->cdb[11] = 0;
+        io_header->cmd_len = 12;
+    } else {
+        /* 16-byte CDB */
+        req->cdb[0] = (req->cmd & 0x1f) | 0x80;
+        req->cdb[1] = req->cdb1;
+        stq_be_p(&req->cdb[2], lba);
+        stl_be_p(&req->cdb[10], nb_logical_blocks);
+        req->cdb[14] = req->group_number;
+        req->cdb[15] = 0;
+        io_header->cmd_len = 16;
+    }
+
+    /* The rest is as in scsi-generic.c.  */
+    io_header->mx_sb_len = sizeof(r->req.sense);
+    io_header->sbp = r->req.sense;
+    io_header->timeout = UINT_MAX;
+    io_header->usr_ptr = r;
+    io_header->flags = SG_FLAG_DIRECT_IO;
+
+    aiocb = blk_aio_ioctl(s->qdev.conf.blk, SG_IO, io_header, cb, opaque);
+    assert(aiocb != NULL);
+    return aiocb;
+}
+
+static bool scsi_block_no_fua(SCSICommand *cmd)
+{
+    return false;
+}
+
+static BlockAIOCB *scsi_block_dma_readv(int64_t offset,
+                                        QEMUIOVector *iov,
+                                        BlockCompletionFunc *cb, void *cb_opaque,
+                                        void *opaque)
+{
+    SCSIBlockReq *r = opaque;
+    return scsi_block_do_sgio(r, offset, iov,
+                              SG_DXFER_FROM_DEV, cb, cb_opaque);
+}
+
+static BlockAIOCB *scsi_block_dma_writev(int64_t offset,
+                                         QEMUIOVector *iov,
+                                         BlockCompletionFunc *cb, void *cb_opaque,
+                                         void *opaque)
+{
+    SCSIBlockReq *r = opaque;
+    return scsi_block_do_sgio(r, offset, iov,
+                              SG_DXFER_TO_DEV, cb, cb_opaque);
+}
+
 static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf)
 {
     switch (buf[0]) {
+    case VERIFY_10:
+    case VERIFY_12:
+    case VERIFY_16:
+        /* Check if BYTCHK == 0x01 (data-out buffer contains data
+         * for the number of logical blocks specified in the length
+         * field).  For other modes, do not use scatter/gather operation.
+         */
+        if ((buf[1] & 6) != 2) {
+            return false;
+        }
+        break;
+
     case READ_6:
     case READ_10:
     case READ_12:
     case READ_16:
-    case VERIFY_10:
-    case VERIFY_12:
-    case VERIFY_16:
     case WRITE_6:
     case WRITE_10:
     case WRITE_12:
@@ -2573,21 +2712,8 @@ static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf)
     case WRITE_VERIFY_10:
     case WRITE_VERIFY_12:
     case WRITE_VERIFY_16:
-        /* If we are not using O_DIRECT, we might read stale data from the
-         * host cache if writes were made using other commands than these
-         * ones (such as WRITE SAME or EXTENDED COPY, etc.).  So, without
-         * O_DIRECT everything must go through SG_IO.
-         */
-        if (!(blk_get_flags(s->qdev.conf.blk) & BDRV_O_NOCACHE)) {
-            break;
-        }
-
-        /* MMC writing cannot be done via pread/pwrite, because it sometimes
+        /* MMC writing cannot be done via DMA helpers, because it sometimes
          * involves writing beyond the maximum LBA or to negative LBA (lead-in).
-         * And once you do these writes, reading from the block device is
-         * unreliable, too.  It is even possible that reads deliver random data
-         * from the host page cache (this is probably a Linux bug).
-         *
          * We might use scsi_disk_dma_reqops as long as no writing commands are
          * seen, but performance usually isn't paramount on optical media.  So,
          * just make scsi-block operate the same as scsi-generic for them.
@@ -2605,6 +2731,54 @@ static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf)
 }
 
 
+static int32_t scsi_block_dma_command(SCSIRequest *req, uint8_t *buf)
+{
+    SCSIBlockReq *r = (SCSIBlockReq *)req;
+    r->cmd = req->cmd.buf[0];
+    switch (r->cmd >> 5) {
+    case 0:
+        /* 6-byte CDB.  */
+        r->cdb1 = r->group_number = 0;
+        break;
+    case 1:
+        /* 10-byte CDB.  */
+        r->cdb1 = req->cmd.buf[1];
+        r->group_number = req->cmd.buf[6];
+    case 4:
+        /* 12-byte CDB.  */
+        r->cdb1 = req->cmd.buf[1];
+        r->group_number = req->cmd.buf[10];
+        break;
+    case 5:
+        /* 16-byte CDB.  */
+        r->cdb1 = req->cmd.buf[1];
+        r->group_number = req->cmd.buf[14];
+        break;
+    default:
+        abort();
+    }
+
+    if (r->cdb1 & 0xe0) {
+        /* Protection information is not supported.  */
+        scsi_check_condition(&r->req, SENSE_CODE(INVALID_FIELD));
+        return 0;
+    }
+
+    r->req.status = &r->io_header.status;
+    return scsi_disk_dma_command(req, buf);
+}
+
+static const SCSIReqOps scsi_block_dma_reqops = {
+    .size         = sizeof(SCSIBlockReq),
+    .free_req     = scsi_free_request,
+    .send_command = scsi_block_dma_command,
+    .read_data    = scsi_read_data,
+    .write_data   = scsi_write_data,
+    .get_buf      = scsi_get_buf,
+    .load_request = scsi_disk_load_request,
+    .save_request = scsi_disk_save_request,
+};
+
 static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag,
                                            uint32_t lun, uint8_t *buf,
                                            void *hba_private)
@@ -2615,7 +2789,7 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag,
         return scsi_req_alloc(&scsi_generic_req_ops, &s->qdev, tag, lun,
                               hba_private);
     } else {
-        return scsi_req_alloc(&scsi_disk_dma_reqops, &s->qdev, tag, lun,
+        return scsi_req_alloc(&scsi_block_dma_reqops, &s->qdev, tag, lun,
                               hba_private);
     }
 }
@@ -2771,10 +2945,14 @@ static void scsi_block_class_initfn(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);
+    SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass);
 
     sc->realize      = scsi_block_realize;
     sc->alloc_req    = scsi_block_new_request;
     sc->parse_cdb    = scsi_block_parse_cdb;
+    sdc->dma_readv   = scsi_block_dma_readv;
+    sdc->dma_writev  = scsi_block_dma_writev;
+    sdc->need_fua_emulation = scsi_block_no_fua;
     dc->desc = "SCSI block device passthrough";
     dc->props = scsi_block_properties;
     dc->vmsd  = &vmstate_scsi_disk_state;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] dma-helpers: change interface to byte-based
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 1/7] dma-helpers: change interface to byte-based Paolo Bonzini
@ 2016-05-23 15:06   ` Eric Blake
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Blake @ 2016-05-23 15:06 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

[-- Attachment #1: Type: text/plain, Size: 589 bytes --]

On 05/23/2016 06:54 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  dma-helpers.c        | 14 +++++++-------
>  hw/block/nvme.c      |  6 +++---
>  hw/ide/ahci.c        |  6 ++++--
>  hw/ide/core.c        |  8 +++++---
>  hw/scsi/scsi-disk.c  |  6 ++++--
>  include/sysemu/dma.h |  6 +++---
>  trace-events         |  2 +-
>  7 files changed, 27 insertions(+), 21 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] dma-helpers: change BlockBackend to opaque value in DMAIOFunc
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 2/7] dma-helpers: change BlockBackend to opaque value in DMAIOFunc Paolo Bonzini
@ 2016-05-23 15:43   ` Eric Blake
  2016-05-24  2:47   ` Fam Zheng
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Blake @ 2016-05-23 15:43 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]

On 05/23/2016 06:54 AM, Paolo Bonzini wrote:
> Callers of dma_blk_io have no way to pass extra data to the DMAIOFunc,
> because the original callback and opaque are gone by the time DMAIOFunc
> is called.  On the other hand, the BlockBackend is usually derived
> from those extra data that you could pass to the DMAIOFunc (in the
> next patch, that would be the SCSIRequest).
> 
> So change DMAIOFunc's prototype, decoupling it from blk_aio_readv
> and blk_aio_writev's.  The new prototype loses the BlockBackend
> and gains an extra opaque value which, in the case of dma_blk_readv
> and dma_blk_writev, is of course used for the BlockBackend.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  dma-helpers.c        | 48 +++++++++++++++++++++++++++++++++++-------------
>  hw/ide/core.c        | 14 ++++++++------
>  hw/ide/internal.h    |  6 +++---
>  hw/ide/macio.c       |  2 +-
>  include/sysemu/dma.h | 12 ++++++------
>  5 files changed, 53 insertions(+), 29 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] scsi-disk: introduce a common base class
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 3/7] scsi-disk: introduce a common base class Paolo Bonzini
@ 2016-05-23 15:53   ` Eric Blake
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Blake @ 2016-05-23 15:53 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

On 05/23/2016 06:54 AM, Paolo Bonzini wrote:
> This will be the place to add DMAIOFuncs in the next patch.  There
> are also a couple DeviceClass members that can be moved to the
> abstract class's initialization function.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/scsi/scsi-disk.c | 36 ++++++++++++++++++++++--------------
>  1 file changed, 22 insertions(+), 14 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] scsi-disk: introduce dma_readv and dma_writev
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 4/7] scsi-disk: introduce dma_readv and dma_writev Paolo Bonzini
@ 2016-05-23 16:09   ` Eric Blake
  2016-06-01 19:07   ` Mark Cave-Ayland
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Blake @ 2016-05-23 16:09 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

[-- Attachment #1: Type: text/plain, Size: 1907 bytes --]

On 05/23/2016 06:54 AM, Paolo Bonzini wrote:
> These are replacements for blk_aio_preadv and blk_aio_pwritev that allow
> customization of the data path.  They reuse the DMA helpers' DMAIOFunc
> callback type, so that the same function can be used in either the
> QEMUSGList or the bounce-buffered case.
> 
> This customization will be needed in the next patch to do zero-copy
> SG_IO on scsi-block.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/scsi/scsi-disk.c | 63 +++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 52 insertions(+), 11 deletions(-)
> 

>  
> +static
> +BlockAIOCB *scsi_dma_readv(int64_t offset, QEMUIOVector *iov,
> +                           BlockCompletionFunc *cb, void *cb_opaque,
> +                           void *opaque)
> +{
> +    SCSIDiskReq *r = opaque;
> +    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);

DO_UPCAST() is poorly named, and I've been trying to reduce its use in
the tree; can we use container_of() instead?

> +    return blk_aio_preadv(s->qdev.conf.blk, offset, iov, 0, cb, cb_opaque);
> +}
> +
> +static
> +BlockAIOCB *scsi_dma_writev(int64_t offset, QEMUIOVector *iov,
> +                            BlockCompletionFunc *cb, void *cb_opaque,
> +                            void *opaque)
> +{
> +    SCSIDiskReq *r = opaque;
> +    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +    return blk_aio_pwritev(s->qdev.conf.blk, offset, iov, 0, cb, cb_opaque);

Are we ever going to want to pass flags (such as BDRV_REQ_FUA) on to
blk_aio_pwritev(), which would imply a flags argument to
scsi_dma_writev()?  But it doesn't have to happen now, so it's not
holding up my review.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] scsi-disk: add need_fua_emulation to SCSIDiskClass
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 5/7] scsi-disk: add need_fua_emulation to SCSIDiskClass Paolo Bonzini
@ 2016-05-23 16:34   ` Eric Blake
  2016-05-24  3:04   ` Fam Zheng
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Blake @ 2016-05-23 16:34 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

[-- Attachment #1: Type: text/plain, Size: 1677 bytes --]

On 05/23/2016 06:54 AM, Paolo Bonzini wrote:
> scsi-block will be able to do FUA just by passing the request through
> to the LUN (which is also more efficient); there is no need to emulate
> it like we do for scsi-disk.
> 
> Add a new method to distinguish this.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/scsi/scsi-disk.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 

> @@ -239,7 +241,7 @@ static void scsi_write_do_fua(SCSIDiskReq *r)
>          goto done;
>      }
>  
> -    if (scsi_is_cmd_fua(&r->req.cmd)) {
> +    if (r->need_fua_emulation) {

BDRV_REQ_FUA is defined for writes,...

>          block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct, 0,
>                           BLOCK_ACCT_FLUSH);
>          r->req.aiocb = blk_aio_flush(s->qdev.conf.blk, scsi_aio_complete, r);
> @@ -416,7 +418,7 @@ static void scsi_read_data(SCSIRequest *req)
>  
>      first = !r->started;
>      r->started = true;
> -    if (first && scsi_is_cmd_fua(&r->req.cmd)) {
> +    if (first && r->need_fua_emulation) {

...but while FUA semantics are definitely defined in SCSI for reads, I
don't know if we've properly wired them up in our block layer for that
purpose. In particular, while bdrv_driver_pwritev() definitely has a
flush fallback if .supported_write_flags lacks BDRV_REQ_FUA,
bdrv_driver_preadv() does not currently look at flags.

The idea makes sense, but I don't know if it is incomplete without first
making sure FUA works on reads throughout the block layer.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 6/7] scsi-disk: introduce scsi_disk_req_check_error
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 6/7] scsi-disk: introduce scsi_disk_req_check_error Paolo Bonzini
@ 2016-05-23 19:16   ` Eric Blake
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Blake @ 2016-05-23 19:16 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

[-- Attachment #1: Type: text/plain, Size: 782 bytes --]

On 05/23/2016 06:54 AM, Paolo Bonzini wrote:
> Commonize all the checks for canceled requests and errors.  The next patch
> will add another case to check for, in order to handle passthrough commands.
> 
> There is no semantic change here; the only nontrivial modification is in
> scsi_write_do_fua, where cancellation has been checked earlier by both
> callers.  Thus, the check is replaced with an assertion.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/scsi/scsi-disk.c | 89 +++++++++++++----------------------------------------
>  1 file changed, 22 insertions(+), 67 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block
  2016-05-23 12:54 [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Paolo Bonzini
                   ` (6 preceding siblings ...)
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 7/7] scsi-block: always use SG_IO Paolo Bonzini
@ 2016-05-23 19:36 ` Mark Cave-Ayland
  2016-05-24 22:59   ` Mark Cave-Ayland
  2016-05-25 16:38 ` [Qemu-devel] [Qemu-block] " Kevin Wolf
  8 siblings, 1 reply; 29+ messages in thread
From: Mark Cave-Ayland @ 2016-05-23 19:36 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

[-- Attachment #1: Type: text/plain, Size: 2367 bytes --]

On 23/05/16 13:54, Paolo Bonzini wrote:

> scsi-block uses the block layer for reads and writes in order to avoid
> allocating bounce buffers as big as the transferred data.  We know how
> to split a large transfer to multiple reads and writes, and thus we can
> use scsi-disk.c's existing code to do I/O in multiple chunks (for non-s/g
> SCSI hosts) or through the DMA helpers (for s/g SCSI hosts).
> 
> Unfortunately, this has the side effect of eating the SCSI status except
> in the very few cases where we can convert an errno code back to a SCSI
> status.  It puts a big wrench in persistent reservations support in the
> guest, for example.
> 
> Luckily, splitting a large transfer into multiple SBC commands is just as
> easy, and this is what the last patch does.  It takes the original CDB,
> patches in a modified starting sector and sector count, and executes the
> SCSI command through blk_aio_ioctl.  It is also easy to pass a QEMUIOVector
> to SG_IO, so that s/g SCSI hosts keep the performance.
> 
> This rebases the patches on top of Eric's changes for byte-based
> BlockBackend access and fixes a few bugs I knew about in the RFC.
> 
> Patches 1, 5 and 6 are new.
> 
> Paolo
> 
> Paolo Bonzini (7):
>   dma-helpers: change interface to byte-based
>   dma-helpers: change BlockBackend to opaque value in DMAIOFunc
>   scsi-disk: introduce a common base class
>   scsi-disk: introduce dma_readv and dma_writev
>   scsi-disk: add need_fua_emulation to SCSIDiskClass
>   scsi-disk: introduce scsi_disk_req_check_error
>   scsi-block: always use SG_IO
> 
>  dma-helpers.c        |  54 +++++--
>  hw/block/nvme.c      |   6 +-
>  hw/ide/ahci.c        |   6 +-
>  hw/ide/core.c        |  20 ++-
>  hw/ide/internal.h    |   6 +-
>  hw/ide/macio.c       |   2 +-
>  hw/scsi/scsi-disk.c  | 409 +++++++++++++++++++++++++++++++++++++--------------
>  include/sysemu/dma.h |  20 +--
>  trace-events         |   2 +-
>  9 files changed, 371 insertions(+), 154 deletions(-)

Hi Paolo,

I thought I'd give this patchset a spin with a view to seeing whether I
could switch the macio device back to the now byte-based dma-helpers,
but came up with a couple of compile errors. Attached is the minor diff
I applied in order to get a successful compile (apologies for not being
inline, however I couldn't get my mail client to stop wrapping incorrectly).


ATB,

Mark.


[-- Attachment #2: dma-helpers-compile-fix.patch --]
[-- Type: text/x-diff, Size: 1050 bytes --]

diff --git a/dma-helpers.c b/dma-helpers.c
index 92c5d55..b521d84 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -192,7 +192,7 @@ static const AIOCBInfo dma_aiocb_info = {
 };
 
 BlockAIOCB *dma_blk_io(AioContext *ctx,
-    QEMUSGList *sg, uint64_t opaque,
+    QEMUSGList *sg, uint64_t offset,
     DMAIOFunc *io_func, void *io_func_opaque,
     BlockCompletionFunc *cb,
     void *opaque, DMADirection dir)
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 02faa3a..f7a23fc 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -540,7 +540,6 @@ static void scsi_write_data(SCSIRequest *req)
     if (r->req.sg) {
         dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_WRITE);
         r->req.resid -= r->req.sg->size;
-                                     scsi_dma_complete, r);
         r->req.aiocb = dma_blk_io(blk_get_aio_context(s->qdev.conf.blk),
                                   r->req.sg, r->sector << BDRV_SECTOR_BITS,
                                   sdc->dma_writev, r, scsi_dma_complete, r,

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] scsi-block: always use SG_IO
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 7/7] scsi-block: always use SG_IO Paolo Bonzini
@ 2016-05-23 19:49   ` Eric Blake
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Blake @ 2016-05-23 19:49 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]

On 05/23/2016 06:54 AM, Paolo Bonzini wrote:
> Using pread/pwrite or io_submit has the advantage of eliminating the
> bounce buffer, but drops the SCSI status.  This keeps the guest from
> seeing unit attention codes, as well as statuses such as RESERVATION
> CONFLICT.  Because we know scsi-block operates on an SBC device we can
> still use the DMA helpers with SG_IO; just remember to patch the CDBs
> if the transfer is split into multiple segments.
> 
> This means that scsi-block will always use the thread-pool unfortunately,
> instead of respecting aio=native.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/scsi/scsi-disk.c | 214 +++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 196 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index f823781..c0eefc0 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -83,6 +83,7 @@ typedef struct SCSIDiskReq {
>      struct iovec iov;
>      QEMUIOVector qiov;
>      BlockAcctCookie acct;
> +    unsigned char *status;
>  } SCSIDiskReq;
>  
>  #define SCSI_DISK_F_REMOVABLE             0
> @@ -190,6 +191,15 @@ static bool scsi_disk_req_check_error(SCSIDiskReq *r, int ret, bool acct_failed)
>          return scsi_handle_rw_error(r, -ret, acct_failed);
>      }
>  
> +    if (r->status && *r->status) {
> +        if (acct_failed) {
> +            SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);

Another use of DO_UPCAST(); would container_of() be better?


> +
> +static BlockAIOCB *scsi_block_do_sgio(SCSIBlockReq *req,
> +                                      int64_t offset, QEMUIOVector *iov,
> +                                      int direction,
> +                                      BlockCompletionFunc *cb, void *opaque)
> +{
> +    sg_io_hdr_t *io_header = &req->io_header;
> +    SCSIDiskReq *r = &req->req;
> +    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +    int nb_logical_blocks;
> +    uint64_t lba;
> +    BlockAIOCB *aiocb;
> +
> +    /* This is not supported yet.  It can only happen if the guest does
> +     * reads and writes that are not aligned to one logical sectors

s/one // ?

> +     * _and_ cover multiple MemoryRegions.
> +     */
> +    assert(offset % s->qdev.blocksize == 0);
> +    assert(iov->size % s->qdev.blocksize == 0);

Is asserting for something an ill-behaved guest can do the right behavior?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] dma-helpers: change BlockBackend to opaque value in DMAIOFunc
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 2/7] dma-helpers: change BlockBackend to opaque value in DMAIOFunc Paolo Bonzini
  2016-05-23 15:43   ` Eric Blake
@ 2016-05-24  2:47   ` Fam Zheng
  2016-05-24  7:05     ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: Fam Zheng @ 2016-05-24  2:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block, eblake

On Mon, 05/23 14:54, Paolo Bonzini wrote:
> Callers of dma_blk_io have no way to pass extra data to the DMAIOFunc,
> because the original callback and opaque are gone by the time DMAIOFunc
> is called.  On the other hand, the BlockBackend is usually derived
> from those extra data that you could pass to the DMAIOFunc (in the
> next patch, that would be the SCSIRequest).
> 
> So change DMAIOFunc's prototype, decoupling it from blk_aio_readv
> and blk_aio_writev's.  The new prototype loses the BlockBackend
> and gains an extra opaque value which, in the case of dma_blk_readv
> and dma_blk_writev, is of course used for the BlockBackend.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  dma-helpers.c        | 48 +++++++++++++++++++++++++++++++++++-------------
>  hw/ide/core.c        | 14 ++++++++------
>  hw/ide/internal.h    |  6 +++---
>  hw/ide/macio.c       |  2 +-
>  include/sysemu/dma.h | 12 ++++++------
>  5 files changed, 53 insertions(+), 29 deletions(-)
> 
> diff --git a/dma-helpers.c b/dma-helpers.c
> index af07aab..92c5d55 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -70,7 +70,7 @@ void qemu_sglist_destroy(QEMUSGList *qsg)
>  
>  typedef struct {
>      BlockAIOCB common;
> -    BlockBackend *blk;
> +    AioContext *ctx;
>      BlockAIOCB *acb;
>      QEMUSGList *sg;
>      uint64_t offset;
> @@ -80,6 +80,7 @@ typedef struct {
>      QEMUIOVector iov;
>      QEMUBH *bh;
>      DMAIOFunc *io_func;
> +    void *io_func_opaque;
>  } DMAAIOCB;
>  
>  static void dma_blk_cb(void *opaque, int ret);
> @@ -154,8 +155,7 @@ static void dma_blk_cb(void *opaque, int ret)
>  
>      if (dbs->iov.size == 0) {
>          trace_dma_map_wait(dbs);
> -        dbs->bh = aio_bh_new(blk_get_aio_context(dbs->blk),
> -                             reschedule_dma, dbs);
> +        dbs->bh = aio_bh_new(dbs->ctx, reschedule_dma, dbs);
>          cpu_register_map_client(dbs->bh);
>          return;
>      }
> @@ -164,8 +164,8 @@ static void dma_blk_cb(void *opaque, int ret)
>          qemu_iovec_discard_back(&dbs->iov, dbs->iov.size & ~BDRV_SECTOR_MASK);
>      }
>  
> -    dbs->acb = dbs->io_func(dbs->blk, dbs->offset, &dbs->iov, 0,
> -                            dma_blk_cb, dbs);
> +    dbs->acb = dbs->io_func(dbs->offset, &dbs->iov,
> +                            dma_blk_cb, dbs, dbs->io_func_opaque);
>      assert(dbs->acb);
>  }
>  
> @@ -191,23 +191,25 @@ static const AIOCBInfo dma_aiocb_info = {
>      .cancel_async       = dma_aio_cancel,
>  };
>  
> -BlockAIOCB *dma_blk_io(
> -    BlockBackend *blk, QEMUSGList *sg, uint64_t offset,
> -    DMAIOFunc *io_func, BlockCompletionFunc *cb,
> +BlockAIOCB *dma_blk_io(AioContext *ctx,
> +    QEMUSGList *sg, uint64_t opaque,

As pointed out by Mark, s/opaque/offset/

> +    DMAIOFunc *io_func, void *io_func_opaque,
> +    BlockCompletionFunc *cb,
>      void *opaque, DMADirection dir)
>  {
> -    DMAAIOCB *dbs = blk_aio_get(&dma_aiocb_info, blk, cb, opaque);
> +    DMAAIOCB *dbs = qemu_aio_get(&dma_aiocb_info, NULL, cb, opaque);
>  
> -    trace_dma_blk_io(dbs, blk, offset, (dir == DMA_DIRECTION_TO_DEVICE));
> +    trace_dma_blk_io(dbs, io_func_opaque, offset, (dir == DMA_DIRECTION_TO_DEVICE));
>  
>      dbs->acb = NULL;
> -    dbs->blk = blk;
>      dbs->sg = sg;
> +    dbs->ctx = ctx;
>      dbs->offset = offset;
>      dbs->sg_cur_index = 0;
>      dbs->sg_cur_byte = 0;
>      dbs->dir = dir;
>      dbs->io_func = io_func;
> +    dbs->io_func_opaque = io_func_opaque;
>      dbs->bh = NULL;
>      qemu_iovec_init(&dbs->iov, sg->nsg);
>      dma_blk_cb(dbs, 0);
> @@ -215,19 +217,39 @@ BlockAIOCB *dma_blk_io(
>  }
>  
>  
> +static
> +BlockAIOCB *dma_blk_read_io_func(int64_t offset, QEMUIOVector *iov,
> +                                 BlockCompletionFunc *cb, void *cb_opaque,
> +                                 void *opaque)
> +{
> +    BlockBackend *blk = opaque;
> +    return blk_aio_preadv(blk, offset, iov, 0, cb, cb_opaque);
> +}
> +
>  BlockAIOCB *dma_blk_read(BlockBackend *blk,
>                           QEMUSGList *sg, uint64_t offset,
>                           void (*cb)(void *opaque, int ret), void *opaque)
>  {
> -    return dma_blk_io(blk, sg, offset, blk_aio_preadv, cb, opaque,
> +    return dma_blk_io(blk_get_aio_context(blk),
> +                      sg, offset, dma_blk_read_io_func, blk, cb, opaque,
>                        DMA_DIRECTION_FROM_DEVICE);
>  }
>  
> +static
> +BlockAIOCB *dma_blk_write_io_func(int64_t offset, QEMUIOVector *iov,
> +                                  BlockCompletionFunc *cb, void *cb_opaque,
> +                                  void *opaque)
> +{
> +    BlockBackend *blk = opaque;
> +    return blk_aio_pwritev(blk, offset, iov, 0, cb, cb_opaque);
> +}
> +
>  BlockAIOCB *dma_blk_write(BlockBackend *blk,
>                            QEMUSGList *sg, uint64_t offset,
>                            void (*cb)(void *opaque, int ret), void *opaque)
>  {
> -    return dma_blk_io(blk, sg, offset, blk_aio_pwritev, cb, opaque,
> +    return dma_blk_io(blk_get_aio_context(blk),
> +                      sg, offset, dma_blk_write_io_func, blk, cb, opaque,
>                        DMA_DIRECTION_TO_DEVICE);
>  }
>  
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 7326189..029f6b9 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -441,13 +441,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>      }
>  }
>  
> -BlockAIOCB *ide_issue_trim(BlockBackend *blk,
> -        int64_t offset, QEMUIOVector *qiov, BdrvRequestFlags flags,
> -        BlockCompletionFunc *cb, void *opaque)
> +BlockAIOCB *ide_issue_trim(
> +        int64_t offset, QEMUIOVector *qiov,
> +        BlockCompletionFunc *cb, void *cb_opaque, void *opaque)
>  {
> +    BlockBackend *blk = opaque;
>      TrimAIOCB *iocb;
>  
> -    iocb = blk_aio_get(&trim_aiocb_info, blk, cb, opaque);
> +    iocb = blk_aio_get(&trim_aiocb_info, blk, cb, cb_opaque);
>      iocb->blk = blk;
>      iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);
>      iocb->ret = 0;
> @@ -871,8 +872,9 @@ static void ide_dma_cb(void *opaque, int ret)
>                                             ide_dma_cb, s);
>          break;
>      case IDE_DMA_TRIM:
> -        s->bus->dma->aiocb = dma_blk_io(s->blk, &s->sg, offset,
> -                                        ide_issue_trim, ide_dma_cb, s,
> +        s->bus->dma->aiocb = dma_blk_io(blk_get_aio_context(s->blk),
> +                                        &s->sg, offset,
> +                                        ide_issue_trim, s->blk, ide_dma_cb, s,
>                                          DMA_DIRECTION_TO_DEVICE);
>          break;
>      default:
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index ceb9e59..773928a 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -613,9 +613,9 @@ void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
>                          EndTransferFunc *end_transfer_func);
>  void ide_transfer_stop(IDEState *s);
>  void ide_set_inactive(IDEState *s, bool more);
> -BlockAIOCB *ide_issue_trim(BlockBackend *blk,
> -        int64_t offset, QEMUIOVector *qiov, BdrvRequestFlags flags,
> -        BlockCompletionFunc *cb, void *opaque);
> +BlockAIOCB *ide_issue_trim(
> +        int64_t offset, QEMUIOVector *qiov,
> +        BlockCompletionFunc *cb, void *cb_opaque, void *opaque);
>  BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t sector_num,
>                                 QEMUIOVector *iov, int nb_sectors,
>                                 BlockCompletionFunc *cb, void *opaque);
> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> index d7d9c0f..42ad68a 100644
> --- a/hw/ide/macio.c
> +++ b/hw/ide/macio.c
> @@ -230,7 +230,7 @@ static void pmac_dma_trim(BlockBackend *blk,
>      s->io_buffer_index += io->len;
>      io->len = 0;
>  
> -    s->bus->dma->aiocb = ide_issue_trim(blk, offset, &io->iov, 0, cb, io);
> +    s->bus->dma->aiocb = ide_issue_trim(offset, &io->iov, cb, io, blk);
>  }
>  
>  static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
> diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
> index f0e0c30..34c8eaf 100644
> --- a/include/sysemu/dma.h
> +++ b/include/sysemu/dma.h
> @@ -194,14 +194,14 @@ void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len);
>  void qemu_sglist_destroy(QEMUSGList *qsg);
>  #endif
>  
> -typedef BlockAIOCB *DMAIOFunc(BlockBackend *blk, int64_t offset,
> -                              QEMUIOVector *iov, BdrvRequestFlags flags,
> -                              BlockCompletionFunc *cb, void *opaque);

Wasn't flags parameter added for a reason in d4f510eb3f? Would it be useful for
things like offloading FUA?

Fam

> +typedef BlockAIOCB *DMAIOFunc(int64_t offset, QEMUIOVector *iov,
> +                              BlockCompletionFunc *cb, void *cb_opaque,
> +                              void *opaque);
>  
> -BlockAIOCB *dma_blk_io(BlockBackend *blk,
> +BlockAIOCB *dma_blk_io(AioContext *ctx,
>                         QEMUSGList *sg, uint64_t offset,
> -                       DMAIOFunc *io_func, BlockCompletionFunc *cb,
> -                       void *opaque, DMADirection dir);
> +                       DMAIOFunc *io_func, void *io_func_opaque,
> +                       BlockCompletionFunc *cb, void *opaque, DMADirection dir);
>  BlockAIOCB *dma_blk_read(BlockBackend *blk,
>                           QEMUSGList *sg, uint64_t offset,
>                           BlockCompletionFunc *cb, void *opaque);
> -- 
> 2.5.5
> 
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] scsi-disk: add need_fua_emulation to SCSIDiskClass
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 5/7] scsi-disk: add need_fua_emulation to SCSIDiskClass Paolo Bonzini
  2016-05-23 16:34   ` Eric Blake
@ 2016-05-24  3:04   ` Fam Zheng
  2016-05-24  7:02     ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: Fam Zheng @ 2016-05-24  3:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block

On Mon, 05/23 14:54, Paolo Bonzini wrote:
> scsi-block will be able to do FUA just by passing the request through
> to the LUN (which is also more efficient); there is no need to emulate
> it like we do for scsi-disk.

Even for scsi-disk, shall we just use the block layer FUA fallback already by
passing BDRV_REQ_FUA in blk_aio_pwritev and simply the code in this series?

> 
> Add a new method to distinguish this.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/scsi/scsi-disk.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 4b5db59..f9870e9 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -69,6 +69,7 @@ typedef struct SCSIDiskClass {
>      SCSIDeviceClass parent_class;
>      DMAIOFunc       *dma_readv;
>      DMAIOFunc       *dma_writev;
> +    bool            (*need_fua_emulation)(SCSICommand *cmd);
>  } SCSIDiskClass;
>  
>  typedef struct SCSIDiskReq {
> @@ -78,6 +79,7 @@ typedef struct SCSIDiskReq {
>      uint32_t sector_count;
>      uint32_t buflen;
>      bool started;
> +    bool need_fua_emulation;
>      struct iovec iov;
>      QEMUIOVector qiov;
>      BlockAcctCookie acct;
> @@ -239,7 +241,7 @@ static void scsi_write_do_fua(SCSIDiskReq *r)
>          goto done;
>      }
>  
> -    if (scsi_is_cmd_fua(&r->req.cmd)) {
> +    if (r->need_fua_emulation) {

Should this evaluate a call to r->need_fua_emulation() instead of the function
pointer itself?

>          block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct, 0,
>                           BLOCK_ACCT_FLUSH);
>          r->req.aiocb = blk_aio_flush(s->qdev.conf.blk, scsi_aio_complete, r);
> @@ -416,7 +418,7 @@ static void scsi_read_data(SCSIRequest *req)
>  
>      first = !r->started;
>      r->started = true;
> -    if (first && scsi_is_cmd_fua(&r->req.cmd)) {
> +    if (first && r->need_fua_emulation) {
>          block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct, 0,
>                           BLOCK_ACCT_FLUSH);
>          r->req.aiocb = blk_aio_flush(s->qdev.conf.blk, scsi_do_read_cb, r);
> @@ -2157,6 +2159,7 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, uint8_t *buf)
>  {
>      SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
>      SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
> +    SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
>      uint32_t len;
>      uint8_t command;
>  
> @@ -2215,6 +2218,7 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, uint8_t *buf)
>          scsi_check_condition(r, SENSE_CODE(LBA_OUT_OF_RANGE));
>          return 0;
>      }
> +    r->need_fua_emulation = sdc->need_fua_emulation(&r->req.cmd);
>      if (r->sector_count == 0) {
>          scsi_req_complete(&r->req, GOOD);
>      }
> @@ -2704,6 +2708,7 @@ static void scsi_disk_base_class_initfn(ObjectClass *klass, void *data)
>      dc->reset = scsi_disk_reset;
>      sdc->dma_readv = scsi_dma_readv;
>      sdc->dma_writev = scsi_dma_writev;
> +    sdc->need_fua_emulation = scsi_is_cmd_fua;
>  }
>  
>  static const TypeInfo scsi_disk_base_info = {
> -- 
> 2.5.5
> 
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] scsi-disk: add need_fua_emulation to SCSIDiskClass
  2016-05-24  3:04   ` Fam Zheng
@ 2016-05-24  7:02     ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-24  7:02 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-devel, qemu-block



On 24/05/2016 05:04, Fam Zheng wrote:
> On Mon, 05/23 14:54, Paolo Bonzini wrote:
> > scsi-block will be able to do FUA just by passing the request through
> > to the LUN (which is also more efficient); there is no need to emulate
> > it like we do for scsi-disk.
> 
> Even for scsi-disk, shall we just use the block layer FUA fallback already by
> passing BDRV_REQ_FUA in blk_aio_pwritev and simply the code in this series?

That's a good idea but it should be done separately, I think.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] dma-helpers: change BlockBackend to opaque value in DMAIOFunc
  2016-05-24  2:47   ` Fam Zheng
@ 2016-05-24  7:05     ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-24  7:05 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-devel, qemu-block



On 24/05/2016 04:47, Fam Zheng wrote:
> > -typedef BlockAIOCB *DMAIOFunc(BlockBackend *blk, int64_t offset,
> > -                              QEMUIOVector *iov, BdrvRequestFlags flags,
> > -                              BlockCompletionFunc *cb, void *opaque);
>
> Wasn't flags parameter added for a reason in d4f510eb3f? Would it be useful for
> things like offloading FUA?

It was added only to give DMAIOFunc the same prototype as
blk_aio_p{read,write}v.  This is not useful anymore.

In the case of scsi-disk, the flags argument is not useful for
offloading FUA.  Instead, the DMAIOFunc will look into the opaque
argument (a SCSIDiskReq) and build the flags argument from there.

In the case of dma_aio_readv/writev, if you want to add a flags argument
you'll have to build a

    struct DMAAioData {
        BlockBackend *blk;
        BdrvRequestFlags flags;
    }

and pass it as the opaque argument.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block
  2016-05-23 19:36 ` [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Mark Cave-Ayland
@ 2016-05-24 22:59   ` Mark Cave-Ayland
  2016-05-25  8:45     ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Cave-Ayland @ 2016-05-24 22:59 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

On 23/05/16 20:36, Mark Cave-Ayland wrote:

> On 23/05/16 13:54, Paolo Bonzini wrote:
> 
>> scsi-block uses the block layer for reads and writes in order to avoid
>> allocating bounce buffers as big as the transferred data.  We know how
>> to split a large transfer to multiple reads and writes, and thus we can
>> use scsi-disk.c's existing code to do I/O in multiple chunks (for non-s/g
>> SCSI hosts) or through the DMA helpers (for s/g SCSI hosts).
>>
>> Unfortunately, this has the side effect of eating the SCSI status except
>> in the very few cases where we can convert an errno code back to a SCSI
>> status.  It puts a big wrench in persistent reservations support in the
>> guest, for example.
>>
>> Luckily, splitting a large transfer into multiple SBC commands is just as
>> easy, and this is what the last patch does.  It takes the original CDB,
>> patches in a modified starting sector and sector count, and executes the
>> SCSI command through blk_aio_ioctl.  It is also easy to pass a QEMUIOVector
>> to SG_IO, so that s/g SCSI hosts keep the performance.
>>
>> This rebases the patches on top of Eric's changes for byte-based
>> BlockBackend access and fixes a few bugs I knew about in the RFC.
>>
>> Patches 1, 5 and 6 are new.
>>
>> Paolo
>>
>> Paolo Bonzini (7):
>>   dma-helpers: change interface to byte-based
>>   dma-helpers: change BlockBackend to opaque value in DMAIOFunc
>>   scsi-disk: introduce a common base class
>>   scsi-disk: introduce dma_readv and dma_writev
>>   scsi-disk: add need_fua_emulation to SCSIDiskClass
>>   scsi-disk: introduce scsi_disk_req_check_error
>>   scsi-block: always use SG_IO
>>
>>  dma-helpers.c        |  54 +++++--
>>  hw/block/nvme.c      |   6 +-
>>  hw/ide/ahci.c        |   6 +-
>>  hw/ide/core.c        |  20 ++-
>>  hw/ide/internal.h    |   6 +-
>>  hw/ide/macio.c       |   2 +-
>>  hw/scsi/scsi-disk.c  | 409 +++++++++++++++++++++++++++++++++++++--------------
>>  include/sysemu/dma.h |  20 +--
>>  trace-events         |   2 +-
>>  9 files changed, 371 insertions(+), 154 deletions(-)
> 
> Hi Paolo,
> 
> I thought I'd give this patchset a spin with a view to seeing whether I
> could switch the macio device back to the now byte-based dma-helpers,
> but came up with a couple of compile errors. Attached is the minor diff
> I applied in order to get a successful compile (apologies for not being
> inline, however I couldn't get my mail client to stop wrapping incorrectly).

After some head-scratching, I've finally managed to install and boot
Darwin PPC using the new byte-based DMA helpers facilitated by this
patch in the macio IDE driver. Just switching over to the new helpers
provided by this patch seemingly allowed the installation to proceed,
but the resulting image was corrupt and refused to boot.

I eventually traced the corruption down to this section of code in
dma_blk_cb() which was incorrectly truncating the unaligned iovecs:

if (dbs->iov.size & ~BDRV_SECTOR_MASK) {
    qemu_iovec_discard_back(&dbs->iov, dbs->iov.size &
        ~BDRV_SECTOR_MASK);
}

This was introduced in
http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg02236.html to
handle non-sector aligned SG lists, although given that this is a
restriction of one particular implementation (PRDT) I'm not sure whether
a plain revert is the correct thing to do or whether an alternative
solution needs to be found.


ATB,

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block
  2016-05-24 22:59   ` Mark Cave-Ayland
@ 2016-05-25  8:45     ` Paolo Bonzini
  2016-05-25  9:13       ` Mark Cave-Ayland
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-25  8:45 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel; +Cc: qemu-block



On 25/05/2016 00:59, Mark Cave-Ayland wrote:
> I eventually traced the corruption down to this section of code in
> dma_blk_cb() which was incorrectly truncating the unaligned iovecs:
> 
> if (dbs->iov.size & ~BDRV_SECTOR_MASK) {
>     qemu_iovec_discard_back(&dbs->iov, dbs->iov.size &
>         ~BDRV_SECTOR_MASK);
> }
> 
> This was introduced in
> http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg02236.html to
> handle non-sector aligned SG lists, although given that this is a
> restriction of one particular implementation (PRDT) I'm not sure whether
> a plain revert is the correct thing to do or whether an alternative
> solution needs to be found.

Yeah, I have a plan for this bit.  It's related to this code I'm adding
in patch 7:

+    /* This is not supported yet.  It can only happen if the guest does
+     * reads and writes that are not aligned to one logical sectors
+     * _and_ cover multiple MemoryRegions.
+     */
+    assert(offset % s->qdev.blocksize == 0);
+    assert(iov->size % s->qdev.blocksize == 0);

The idea behind the "if" is that the I/O code cannot deal with a number
of bytes that is not a multiple of the logical sector size.  These
assertions could be removed by generalizing the "if" to an arbitrary
mask, in this case s->qdev.blocksize - 1.

There are two things that are wrong however in the logic.  First, the
"if" must be moved before the "if (dbs->iov.size & ~BDRV_SECTOR_MASK)",
because the call to qemu_iovec_discard_back can result in a zero-byte
QEMUIOVector.  Second, the sg_cur_* variables must be rewound too.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block
  2016-05-25  8:45     ` Paolo Bonzini
@ 2016-05-25  9:13       ` Mark Cave-Ayland
  2016-05-25 10:11         ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Cave-Ayland @ 2016-05-25  9:13 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

On 25/05/16 09:45, Paolo Bonzini wrote:

> On 25/05/2016 00:59, Mark Cave-Ayland wrote:
>> I eventually traced the corruption down to this section of code in
>> dma_blk_cb() which was incorrectly truncating the unaligned iovecs:
>>
>> if (dbs->iov.size & ~BDRV_SECTOR_MASK) {
>>     qemu_iovec_discard_back(&dbs->iov, dbs->iov.size &
>>         ~BDRV_SECTOR_MASK);
>> }
>>
>> This was introduced in
>> http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg02236.html to
>> handle non-sector aligned SG lists, although given that this is a
>> restriction of one particular implementation (PRDT) I'm not sure whether
>> a plain revert is the correct thing to do or whether an alternative
>> solution needs to be found.
> 
> Yeah, I have a plan for this bit.  It's related to this code I'm adding
> in patch 7:
> 
> +    /* This is not supported yet.  It can only happen if the guest does
> +     * reads and writes that are not aligned to one logical sectors
> +     * _and_ cover multiple MemoryRegions.
> +     */
> +    assert(offset % s->qdev.blocksize == 0);
> +    assert(iov->size % s->qdev.blocksize == 0);
> 
> The idea behind the "if" is that the I/O code cannot deal with a number
> of bytes that is not a multiple of the logical sector size.  These
> assertions could be removed by generalizing the "if" to an arbitrary
> mask, in this case s->qdev.blocksize - 1.
> 
> There are two things that are wrong however in the logic.  First, the
> "if" must be moved before the "if (dbs->iov.size & ~BDRV_SECTOR_MASK)",
> because the call to qemu_iovec_discard_back can result in a zero-byte
> QEMUIOVector.  Second, the sg_cur_* variables must be rewound too.

Great - as long as you're aware of this I'll wait for the next version
of the patchset to come out, and then at least I can re-run the macio
installation test to ensure that there are no further corruption issues
with unaligned DMA accesses.

I'd like to add my thanks to everyone who has worked hard on allowing
byte-based accesses in the I/O layer since on my local test case it
makes a significant improvement to the macio emulation. With my basic
macio conversion applied on top of this patchset, the installation time
for a complete Darwin PPC install drops from 25 mins to about 9 mins on
my laptop which is a clear 2.5x speed increase over the existing code.


ATB,

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block
  2016-05-25  9:13       ` Mark Cave-Ayland
@ 2016-05-25 10:11         ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2016-05-25 10:11 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel; +Cc: qemu-block



On 25/05/2016 11:13, Mark Cave-Ayland wrote:
> Great - as long as you're aware of this I'll wait for the next version
> of the patchset to come out, and then at least I can re-run the macio
> installation test to ensure that there are no further corruption issues
> with unaligned DMA accesses.

If I get an ack for patches 1 and 2, I will just put these patches in a
pull request (obviously with your fixes---I had forgotten to redo the
git format-patch step).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block
  2016-05-23 12:54 [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Paolo Bonzini
                   ` (7 preceding siblings ...)
  2016-05-23 19:36 ` [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Mark Cave-Ayland
@ 2016-05-25 16:38 ` Kevin Wolf
  8 siblings, 0 replies; 29+ messages in thread
From: Kevin Wolf @ 2016-05-25 16:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, qemu-block

Am 23.05.2016 um 14:54 hat Paolo Bonzini geschrieben:
> scsi-block uses the block layer for reads and writes in order to avoid
> allocating bounce buffers as big as the transferred data.  We know how
> to split a large transfer to multiple reads and writes, and thus we can
> use scsi-disk.c's existing code to do I/O in multiple chunks (for non-s/g
> SCSI hosts) or through the DMA helpers (for s/g SCSI hosts).
> 
> Unfortunately, this has the side effect of eating the SCSI status except
> in the very few cases where we can convert an errno code back to a SCSI
> status.  It puts a big wrench in persistent reservations support in the
> guest, for example.
> 
> Luckily, splitting a large transfer into multiple SBC commands is just as
> easy, and this is what the last patch does.  It takes the original CDB,
> patches in a modified starting sector and sector count, and executes the
> SCSI command through blk_aio_ioctl.  It is also easy to pass a QEMUIOVector
> to SG_IO, so that s/g SCSI hosts keep the performance.
> 
> This rebases the patches on top of Eric's changes for byte-based
> BlockBackend access and fixes a few bugs I knew about in the RFC.

Thanks, applied patches 1 and 2 to the block branch (with the typo fix
in patch 2).

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] scsi-disk: introduce dma_readv and dma_writev
  2016-05-23 12:54 ` [Qemu-devel] [PATCH 4/7] scsi-disk: introduce dma_readv and dma_writev Paolo Bonzini
  2016-05-23 16:09   ` Eric Blake
@ 2016-06-01 19:07   ` Mark Cave-Ayland
  2016-06-03  2:56     ` xiaoqiang zhao
  1 sibling, 1 reply; 29+ messages in thread
From: Mark Cave-Ayland @ 2016-06-01 19:07 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block

On 23/05/16 13:54, Paolo Bonzini wrote:

> These are replacements for blk_aio_preadv and blk_aio_pwritev that allow
> customization of the data path.  They reuse the DMA helpers' DMAIOFunc
> callback type, so that the same function can be used in either the
> QEMUSGList or the bounce-buffered case.
> 
> This customization will be needed in the next patch to do zero-copy
> SG_IO on scsi-block.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/scsi/scsi-disk.c | 63 +++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 52 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index eaadfd6..4b5db59 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -55,7 +55,21 @@ do { printf("scsi-disk: " fmt , ## __VA_ARGS__); } while (0)
>  
>  #define TYPE_SCSI_DISK_BASE         "scsi-disk-base"
>  
> +#define SCSI_DISK_BASE(obj) \
> +     OBJECT_CHECK(SCSIDiskState, (obj), TYPE_SCSI_DISK_BASE)
> +#define SCSI_DISK_BASE_CLASS(klass) \
> +     OBJECT_CLASS_CHECK(SCSIDiskClass, (klass), TYPE_SCSI_DISK_BASE)
> +#define SCSI_DISK_BASE_GET_CLASS(obj) \
> +     OBJECT_GET_CLASS(SCSIDiskClass, (obj), TYPE_SCSI_DISK_BASE)
> +
>  typedef struct SCSIDiskState SCSIDiskState;
> +typedef struct SCSIDiskClass SCSIDiskClass;
> +
> +typedef struct SCSIDiskClass {
> +    SCSIDeviceClass parent_class;
> +    DMAIOFunc       *dma_readv;
> +    DMAIOFunc       *dma_writev;
> +} SCSIDiskClass;
>  
>  typedef struct SCSIDiskReq {
>      SCSIRequest req;
> @@ -317,6 +331,7 @@ done:
>  static void scsi_do_read(SCSIDiskReq *r, int ret)
>  {
>      SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +    SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
>  
>      assert (r->req.aiocb == NULL);
>  
> @@ -337,16 +352,16 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
>      if (r->req.sg) {
>          dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_READ);
>          r->req.resid -= r->req.sg->size;
> -        r->req.aiocb = dma_blk_read(s->qdev.conf.blk, r->req.sg,
> -                                    r->sector << BDRV_SECTOR_BITS,
> -                                    scsi_dma_complete, r);
> +        r->req.aiocb = dma_blk_io(blk_get_aio_context(s->qdev.conf.blk),
> +                                  r->req.sg, r->sector << BDRV_SECTOR_BITS,
> +                                  sdc->dma_readv, r, scsi_dma_complete, r,
> +                                  DMA_DIRECTION_FROM_DEVICE);
>      } else {
>          scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
>          block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
>                           r->qiov.size, BLOCK_ACCT_READ);
> -        r->req.aiocb = blk_aio_preadv(s->qdev.conf.blk,
> -                                      r->sector << BDRV_SECTOR_BITS, &r->qiov,
> -                                      0, scsi_read_complete, r);
> +        r->req.aiocb = sdc->dma_readv(r->sector, &r->qiov,
> +                                      scsi_read_complete, r, r);
>      }
>  
>  done:
> @@ -506,6 +521,7 @@ static void scsi_write_data(SCSIRequest *req)
>  {
>      SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
>      SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +    SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
>  
>      /* No data transfer may already be in progress */
>      assert(r->req.aiocb == NULL);
> @@ -542,15 +558,16 @@ static void scsi_write_data(SCSIRequest *req)
>      if (r->req.sg) {
>          dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_WRITE);
>          r->req.resid -= r->req.sg->size;
> -        r->req.aiocb = dma_blk_write(s->qdev.conf.blk, r->req.sg,
> -                                     r->sector << BDRV_SECTOR_BITS,
>                                       scsi_dma_complete, r);
> +        r->req.aiocb = dma_blk_io(blk_get_aio_context(s->qdev.conf.blk),
> +                                  r->req.sg, r->sector << BDRV_SECTOR_BITS,
> +                                  sdc->dma_writev, r, scsi_dma_complete, r,
> +                                  DMA_DIRECTION_TO_DEVICE);
>      } else {
>          block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
>                           r->qiov.size, BLOCK_ACCT_WRITE);
> -        r->req.aiocb = blk_aio_pwritev(s->qdev.conf.blk,
> -                                       r->sector << BDRV_SECTOR_BITS, &r->qiov,
> -                                       0, scsi_write_complete, r);
> +        r->req.aiocb = sdc->dma_writev(r->sector << BDRV_SECTOR_BITS, &r->qiov,
> +                                       scsi_write_complete, r, r);
>      }
>  }
>  
> @@ -2658,12 +2675,35 @@ static int scsi_block_parse_cdb(SCSIDevice *d, SCSICommand *cmd,
>  
>  #endif
>  
> +static
> +BlockAIOCB *scsi_dma_readv(int64_t offset, QEMUIOVector *iov,
> +                           BlockCompletionFunc *cb, void *cb_opaque,
> +                           void *opaque)
> +{
> +    SCSIDiskReq *r = opaque;
> +    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +    return blk_aio_preadv(s->qdev.conf.blk, offset, iov, 0, cb, cb_opaque);
> +}
> +
> +static
> +BlockAIOCB *scsi_dma_writev(int64_t offset, QEMUIOVector *iov,
> +                            BlockCompletionFunc *cb, void *cb_opaque,
> +                            void *opaque)
> +{
> +    SCSIDiskReq *r = opaque;
> +    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +    return blk_aio_pwritev(s->qdev.conf.blk, offset, iov, 0, cb, cb_opaque);
> +}
> +
>  static void scsi_disk_base_class_initfn(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> +    SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass);
>  
>      dc->fw_name = "disk";
>      dc->reset = scsi_disk_reset;
> +    sdc->dma_readv = scsi_dma_readv;
> +    sdc->dma_writev = scsi_dma_writev;
>  }
>  
>  static const TypeInfo scsi_disk_base_info = {
> @@ -2671,6 +2711,7 @@ static const TypeInfo scsi_disk_base_info = {
>      .parent        = TYPE_SCSI_DEVICE,
>      .class_init    = scsi_disk_base_class_initfn,
>      .instance_size = sizeof(SCSIDiskState),
> +    .class_size    = sizeof(SCSIDiskClass),
>  };
>  
>  #define DEFINE_SCSI_DISK_PROPERTIES()                                \

Hi Paolo,

This patch appears to break qemu-system-sparc booting from CDROM with
the following command line:

./qemu-system-sparc -cdrom debian-40r4a-sparc-netinst.iso -boot d

Instead of booting straight into SILO, OpenBIOS hangs when trying to
read from the CDROM device.


ATB,

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] scsi-disk: introduce dma_readv and dma_writev
  2016-06-01 19:07   ` Mark Cave-Ayland
@ 2016-06-03  2:56     ` xiaoqiang zhao
  2016-06-03  5:22       ` Mark Cave-Ayland
  0 siblings, 1 reply; 29+ messages in thread
From: xiaoqiang zhao @ 2016-06-03  2:56 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: Mark Cave-Ayland, qemu-block



在 2016年06月02日 03:07, Mark Cave-Ayland 写道:
> On 23/05/16 13:54, Paolo Bonzini wrote:
>
>> >These are replacements for blk_aio_preadv and blk_aio_pwritev that allow
>> >customization of the data path.  They reuse the DMA helpers' DMAIOFunc
>> >callback type, so that the same function can be used in either the
>> >QEMUSGList or the bounce-buffered case.
>> >
>> >This customization will be needed in the next patch to do zero-copy
>> >SG_IO on scsi-block.
>> >
>> >Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>> >---
>> >  hw/scsi/scsi-disk.c | 63 +++++++++++++++++++++++++++++++++++++++++++----------
>> >  1 file changed, 52 insertions(+), 11 deletions(-)
>> >
> Hi Paolo,
>
> This patch appears to break qemu-system-sparc booting from CDROM with
> the following command line:
>
> ./qemu-system-sparc -cdrom debian-40r4a-sparc-netinst.iso -boot d
>
> Instead of booting straight into SILO, OpenBIOS hangs when trying to
> read from the CDROM device.
Paolo:
   By using git bisect on master branch , I found this patch also break 
qemu-system-arm from booting.
   command line:
  qemu-system-arm -M versatilepb -kernel vmlinuz-3.2.0-4-versatile 
-initrd initrd.img-3.2.0-4-versatile -hda 
/home/hitmoon/debian_wheezy_armel_standard.qcow2 -append 'root=/dev/sda1'

  The booting process stops at mounting the root partition and after 
timeout droped into a initramfs shell. The device '/dev/sda1' is 
presented . I guess it can not read properly from sda1.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] scsi-disk: introduce dma_readv and dma_writev
  2016-06-03  2:56     ` xiaoqiang zhao
@ 2016-06-03  5:22       ` Mark Cave-Ayland
  2016-06-03  6:07         ` xiaoqiang zhao
  0 siblings, 1 reply; 29+ messages in thread
From: Mark Cave-Ayland @ 2016-06-03  5:22 UTC (permalink / raw)
  To: xiaoqiang zhao, Paolo Bonzini, qemu-devel; +Cc: qemu-block

On 03/06/16 03:56, xiaoqiang zhao wrote:

> 在 2016年06月02日 03:07, Mark Cave-Ayland 写道:
>> On 23/05/16 13:54, Paolo Bonzini wrote:
>>
>>> >These are replacements for blk_aio_preadv and blk_aio_pwritev that
>>> allow
>>> >customization of the data path.  They reuse the DMA helpers' DMAIOFunc
>>> >callback type, so that the same function can be used in either the
>>> >QEMUSGList or the bounce-buffered case.
>>> >
>>> >This customization will be needed in the next patch to do zero-copy
>>> >SG_IO on scsi-block.
>>> >
>>> >Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>>> >---
>>> >  hw/scsi/scsi-disk.c | 63
>>> +++++++++++++++++++++++++++++++++++++++++++----------
>>> >  1 file changed, 52 insertions(+), 11 deletions(-)
>>> >
>> Hi Paolo,
>>
>> This patch appears to break qemu-system-sparc booting from CDROM with
>> the following command line:
>>
>> ./qemu-system-sparc -cdrom debian-40r4a-sparc-netinst.iso -boot d
>>
>> Instead of booting straight into SILO, OpenBIOS hangs when trying to
>> read from the CDROM device.
> Paolo:
>   By using git bisect on master branch , I found this patch also break
> qemu-system-arm from booting.
>   command line:
>  qemu-system-arm -M versatilepb -kernel vmlinuz-3.2.0-4-versatile
> -initrd initrd.img-3.2.0-4-versatile -hda
> /home/hitmoon/debian_wheezy_armel_standard.qcow2 -append 'root=/dev/sda1'
> 
>  The booting process stops at mounting the root partition and after
> timeout droped into a initramfs shell. The device '/dev/sda1' is
> presented . I guess it can not read properly from sda1.

I've just sent through a patch which fixes the issue for me - please
test and report back!

Paolo - not sure if it's worth a follow-up patch that renames the
relevant _readv/_writev functions in scsi-disk.c to _preadv/_pwritev to
try and help avoid such confusion in future?


ATB,

Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] scsi-disk: introduce dma_readv and dma_writev
  2016-06-03  5:22       ` Mark Cave-Ayland
@ 2016-06-03  6:07         ` xiaoqiang zhao
  0 siblings, 0 replies; 29+ messages in thread
From: xiaoqiang zhao @ 2016-06-03  6:07 UTC (permalink / raw)
  To: Mark Cave-Ayland, Paolo Bonzini, qemu-devel; +Cc: qemu-block



在 2016年06月03日 13:22, Mark Cave-Ayland 写道:
> On 03/06/16 03:56, xiaoqiang zhao wrote:
>
>> 在 2016年06月02日 03:07, Mark Cave-Ayland 写道:
>>> On 23/05/16 13:54, Paolo Bonzini wrote:
>>>
>>>>> These are replacements for blk_aio_preadv and blk_aio_pwritev that
>>>> allow
>>>>> customization of the data path.  They reuse the DMA helpers' DMAIOFunc
>>>>> callback type, so that the same function can be used in either the
>>>>> QEMUSGList or the bounce-buffered case.
>>>>>
>>>>> This customization will be needed in the next patch to do zero-copy
>>>>> SG_IO on scsi-block.
>>>>>
>>>>> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>>>>> ---
>>>>>   hw/scsi/scsi-disk.c | 63
>>>> +++++++++++++++++++++++++++++++++++++++++++----------
>>>>>   1 file changed, 52 insertions(+), 11 deletions(-)
>>>>>
>>> Hi Paolo,
>>>
>>> This patch appears to break qemu-system-sparc booting from CDROM with
>>> the following command line:
>>>
>>> ./qemu-system-sparc -cdrom debian-40r4a-sparc-netinst.iso -boot d
>>>
>>> Instead of booting straight into SILO, OpenBIOS hangs when trying to
>>> read from the CDROM device.
>> Paolo:
>>    By using git bisect on master branch , I found this patch also break
>> qemu-system-arm from booting.
>>    command line:
>>   qemu-system-arm -M versatilepb -kernel vmlinuz-3.2.0-4-versatile
>> -initrd initrd.img-3.2.0-4-versatile -hda
>> /home/hitmoon/debian_wheezy_armel_standard.qcow2 -append 'root=/dev/sda1'
>>
>>   The booting process stops at mounting the root partition and after
>> timeout droped into a initramfs shell. The device '/dev/sda1' is
>> presented . I guess it can not read properly from sda1.
> I've just sent through a patch which fixes the issue for me - please
> test and report back!
>
> Paolo - not sure if it's worth a follow-up patch that renames the
> relevant _readv/_writev functions in scsi-disk.c to _preadv/_pwritev to
> try and help avoid such confusion in future?
>
>
> ATB,
>
> Mark.
>
>
Mark:
   I can confirm it works for me !

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-06-03  6:07 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-23 12:54 [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Paolo Bonzini
2016-05-23 12:54 ` [Qemu-devel] [PATCH 1/7] dma-helpers: change interface to byte-based Paolo Bonzini
2016-05-23 15:06   ` Eric Blake
2016-05-23 12:54 ` [Qemu-devel] [PATCH 2/7] dma-helpers: change BlockBackend to opaque value in DMAIOFunc Paolo Bonzini
2016-05-23 15:43   ` Eric Blake
2016-05-24  2:47   ` Fam Zheng
2016-05-24  7:05     ` Paolo Bonzini
2016-05-23 12:54 ` [Qemu-devel] [PATCH 3/7] scsi-disk: introduce a common base class Paolo Bonzini
2016-05-23 15:53   ` Eric Blake
2016-05-23 12:54 ` [Qemu-devel] [PATCH 4/7] scsi-disk: introduce dma_readv and dma_writev Paolo Bonzini
2016-05-23 16:09   ` Eric Blake
2016-06-01 19:07   ` Mark Cave-Ayland
2016-06-03  2:56     ` xiaoqiang zhao
2016-06-03  5:22       ` Mark Cave-Ayland
2016-06-03  6:07         ` xiaoqiang zhao
2016-05-23 12:54 ` [Qemu-devel] [PATCH 5/7] scsi-disk: add need_fua_emulation to SCSIDiskClass Paolo Bonzini
2016-05-23 16:34   ` Eric Blake
2016-05-24  3:04   ` Fam Zheng
2016-05-24  7:02     ` Paolo Bonzini
2016-05-23 12:54 ` [Qemu-devel] [PATCH 6/7] scsi-disk: introduce scsi_disk_req_check_error Paolo Bonzini
2016-05-23 19:16   ` Eric Blake
2016-05-23 12:54 ` [Qemu-devel] [PATCH 7/7] scsi-block: always use SG_IO Paolo Bonzini
2016-05-23 19:49   ` Eric Blake
2016-05-23 19:36 ` [Qemu-devel] [PATCH v2 0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block Mark Cave-Ayland
2016-05-24 22:59   ` Mark Cave-Ayland
2016-05-25  8:45     ` Paolo Bonzini
2016-05-25  9:13       ` Mark Cave-Ayland
2016-05-25 10:11         ` Paolo Bonzini
2016-05-25 16:38 ` [Qemu-devel] [Qemu-block] " Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.