All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] scsi-block: receive the right SCSI status on reads and writes
@ 2016-05-11 11:41 Paolo Bonzini
  2016-05-11 11:41 ` [Qemu-devel] [PATCH 1/4] scsi-disk: introduce a common base class Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Paolo Bonzini @ 2016-05-11 11:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: hare, vrozenfe, famz

scsi-block uses the block layer for reads and writes in order to avoid
allocating bounce buffers as big as the transferred data.  We know how
to split a large transfer to multiple reads and writes, and thus we can
use scsi-disk.c's existing code to do I/O in multiple chunks (for non-s/g
SCSI hosts) or through the DMA helpers (for s/g SCSI hosts).

Unfortunately, this has the side effect of eating the SCSI status except
in the very few cases where we can convert an errno code back to a SCSI
status.  It puts a big wrench in persistent reservations support in the
guest, for example.

Luckily, splitting a large transfer into multiple SBC commands is just as
easy, and this is what the last patch does.  It takes the original CDB,
patches in a modified starting sector and sector count, and executes the
SCSI command through blk_aio_ioctl.  It is also easy to pass a QEMUIOVector
to SG_IO, so that s/g SCSI hosts keep the performance.

The other patches are preparatory work.  Patch 2 modifies the DMA helpers
to make the API more flexible, because the SCSI subsystem would like to
have a SCSIRequest pointer passed to the I/O function instead of the
BlockBackend pointer.  Patches 1 and 3 introduce the two callbacks that
scsi-block can override.

This conflicts with Eric's work on the byte-based API; I plan to rebase it
once those patches are in.  In the meanwhile, I would especially appreciate
the review of patch 2.

Paolo Bonzini (4):
  scsi-disk: introduce a common base class
  dma-helpers: change BlockBackend to opaque value in DMAIOFunc
  scsi-disk: introduce dma_readv and dma_writev
  scsi-block: always use SG_IO

 dma-helpers.c        |  51 +++++++++---
 hw/ide/core.c        |  12 +--
 hw/ide/internal.h    |   4 +-
 hw/ide/macio.c       |   4 +-
 hw/scsi/scsi-disk.c  | 229 ++++++++++++++++++++++++++++++++++++++++++---------
 include/sysemu/dma.h |  11 +--
 6 files changed, 247 insertions(+), 64 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 1/4] scsi-disk: introduce a common base class
  2016-05-11 11:41 [Qemu-devel] [PATCH 0/4] scsi-block: receive the right SCSI status on reads and writes Paolo Bonzini
@ 2016-05-11 11:41 ` Paolo Bonzini
  2016-05-11 11:41 ` [Qemu-devel] [PATCH 2/4] dma-helpers: change BlockBackend to opaque value in DMAIOFunc Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2016-05-11 11:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: hare, vrozenfe, famz

This will be the place to add DMAIOFuncs in the next patch.  There
are also a couple DeviceClass members that can be moved to the
abstract class's initialization function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/scsi-disk.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index c3ce54a..45b604f 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -53,6 +53,8 @@ do { printf("scsi-disk: " fmt , ## __VA_ARGS__); } while (0)
 #define DEFAULT_MAX_UNMAP_SIZE      (1 << 30)   /* 1 GB */
 #define DEFAULT_MAX_IO_SIZE         INT_MAX     /* 2 GB - 1 block */
 
+#define TYPE_SCSI_DISK_BASE         "scsi-disk-base"
+
 typedef struct SCSIDiskState SCSIDiskState;
 
 typedef struct SCSIDiskReq {
@@ -2655,6 +2657,21 @@ static int scsi_block_parse_cdb(SCSIDevice *d, SCSICommand *cmd,
 
 #endif
 
+static void scsi_disk_base_class_initfn(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->fw_name = "disk";
+    dc->reset = scsi_disk_reset;
+}
+
+static const TypeInfo scsi_disk_base_info = {
+    .name          = TYPE_SCSI_DISK_BASE,
+    .parent        = TYPE_SCSI_DEVICE,
+    .class_init    = scsi_disk_base_class_initfn,
+    .instance_size = sizeof(SCSIDiskState),
+};
+
 #define DEFINE_SCSI_DISK_PROPERTIES()                                \
     DEFINE_BLOCK_PROPERTIES(SCSIDiskState, qdev.conf),               \
     DEFINE_PROP_STRING("ver", SCSIDiskState, version),               \
@@ -2702,17 +2719,14 @@ static void scsi_hd_class_initfn(ObjectClass *klass, void *data)
     sc->realize      = scsi_hd_realize;
     sc->alloc_req    = scsi_new_request;
     sc->unit_attention_reported = scsi_disk_unit_attention_reported;
-    dc->fw_name = "disk";
     dc->desc = "virtual SCSI disk";
-    dc->reset = scsi_disk_reset;
     dc->props = scsi_hd_properties;
     dc->vmsd  = &vmstate_scsi_disk_state;
 }
 
 static const TypeInfo scsi_hd_info = {
     .name          = "scsi-hd",
-    .parent        = TYPE_SCSI_DEVICE,
-    .instance_size = sizeof(SCSIDiskState),
+    .parent        = TYPE_SCSI_DISK_BASE,
     .class_init    = scsi_hd_class_initfn,
 };
 
@@ -2734,17 +2748,14 @@ static void scsi_cd_class_initfn(ObjectClass *klass, void *data)
     sc->realize      = scsi_cd_realize;
     sc->alloc_req    = scsi_new_request;
     sc->unit_attention_reported = scsi_disk_unit_attention_reported;
-    dc->fw_name = "disk";
     dc->desc = "virtual SCSI CD-ROM";
-    dc->reset = scsi_disk_reset;
     dc->props = scsi_cd_properties;
     dc->vmsd  = &vmstate_scsi_disk_state;
 }
 
 static const TypeInfo scsi_cd_info = {
     .name          = "scsi-cd",
-    .parent        = TYPE_SCSI_DEVICE,
-    .instance_size = sizeof(SCSIDiskState),
+    .parent        = TYPE_SCSI_DISK_BASE,
     .class_init    = scsi_cd_class_initfn,
 };
 
@@ -2762,17 +2773,14 @@ static void scsi_block_class_initfn(ObjectClass *klass, void *data)
     sc->realize      = scsi_block_realize;
     sc->alloc_req    = scsi_block_new_request;
     sc->parse_cdb    = scsi_block_parse_cdb;
-    dc->fw_name = "disk";
     dc->desc = "SCSI block device passthrough";
-    dc->reset = scsi_disk_reset;
     dc->props = scsi_block_properties;
     dc->vmsd  = &vmstate_scsi_disk_state;
 }
 
 static const TypeInfo scsi_block_info = {
     .name          = "scsi-block",
-    .parent        = TYPE_SCSI_DEVICE,
-    .instance_size = sizeof(SCSIDiskState),
+    .parent        = TYPE_SCSI_DISK_BASE,
     .class_init    = scsi_block_class_initfn,
 };
 #endif
@@ -2810,13 +2818,13 @@ static void scsi_disk_class_initfn(ObjectClass *klass, void *data)
 
 static const TypeInfo scsi_disk_info = {
     .name          = "scsi-disk",
-    .parent        = TYPE_SCSI_DEVICE,
-    .instance_size = sizeof(SCSIDiskState),
+    .parent        = TYPE_SCSI_DISK_BASE,
     .class_init    = scsi_disk_class_initfn,
 };
 
 static void scsi_disk_register_types(void)
 {
+    type_register_static(&scsi_disk_base_info);
     type_register_static(&scsi_hd_info);
     type_register_static(&scsi_cd_info);
 #ifdef __linux__
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 2/4] dma-helpers: change BlockBackend to opaque value in DMAIOFunc
  2016-05-11 11:41 [Qemu-devel] [PATCH 0/4] scsi-block: receive the right SCSI status on reads and writes Paolo Bonzini
  2016-05-11 11:41 ` [Qemu-devel] [PATCH 1/4] scsi-disk: introduce a common base class Paolo Bonzini
@ 2016-05-11 11:41 ` Paolo Bonzini
  2016-05-17  2:40   ` Fam Zheng
  2016-05-11 11:41 ` [Qemu-devel] [PATCH 3/4] scsi-disk: introduce dma_readv and dma_writev Paolo Bonzini
  2016-05-11 11:41 ` [Qemu-devel] [PATCH 4/4] scsi-block: always use SG_IO Paolo Bonzini
  3 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2016-05-11 11:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: hare, vrozenfe, famz

Callers of dma_blk_io have no way to pass extra data to the DMAIOFunc,
because the original callback and opaque are gone by the time DMAIOFunc
is called.  On the other hand, the BlockBackend is usually derived
from those extra data that you could pass to the DMAIOFunc (in the
next patch, that would be the SCSIRequest).

So change DMAIOFunc's prototype, decoupling it from blk_aio_readv
and blk_aio_writev's.  The new prototype loses the BlockBackend
and gains an extra opaque value which, in the case of dma_blk_readv
and dma_blk_writev, is of course used for the BlockBackend.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 dma-helpers.c        | 51 ++++++++++++++++++++++++++++++++++++++-------------
 hw/ide/core.c        | 12 +++++++-----
 hw/ide/internal.h    |  4 ++--
 hw/ide/macio.c       |  4 ++--
 include/sysemu/dma.h | 11 ++++++-----
 5 files changed, 55 insertions(+), 27 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index 4ad0bca..000620e 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -70,7 +70,7 @@ void qemu_sglist_destroy(QEMUSGList *qsg)
 
 typedef struct {
     BlockAIOCB common;
-    BlockBackend *blk;
+    AioContext *ctx;
     BlockAIOCB *acb;
     QEMUSGList *sg;
     uint64_t sector_num;
@@ -80,6 +80,7 @@ typedef struct {
     QEMUIOVector iov;
     QEMUBH *bh;
     DMAIOFunc *io_func;
+    void *io_func_opaque;
 } DMAAIOCB;
 
 static void dma_blk_cb(void *opaque, int ret);
@@ -154,8 +155,7 @@ static void dma_blk_cb(void *opaque, int ret)
 
     if (dbs->iov.size == 0) {
         trace_dma_map_wait(dbs);
-        dbs->bh = aio_bh_new(blk_get_aio_context(dbs->blk),
-                             reschedule_dma, dbs);
+        dbs->bh = aio_bh_new(dbs->ctx, reschedule_dma, dbs);
         cpu_register_map_client(dbs->bh);
         return;
     }
@@ -164,8 +164,9 @@ static void dma_blk_cb(void *opaque, int ret)
         qemu_iovec_discard_back(&dbs->iov, dbs->iov.size & ~BDRV_SECTOR_MASK);
     }
 
-    dbs->acb = dbs->io_func(dbs->blk, dbs->sector_num, &dbs->iov,
-                            dbs->iov.size / 512, dma_blk_cb, dbs);
+    dbs->acb = dbs->io_func(dbs->sector_num, &dbs->iov,
+                            dbs->iov.size / 512, dma_blk_cb, dbs,
+                            dbs->io_func_opaque);
     assert(dbs->acb);
 }
 
@@ -191,23 +192,25 @@ static const AIOCBInfo dma_aiocb_info = {
     .cancel_async       = dma_aio_cancel,
 };
 
-BlockAIOCB *dma_blk_io(
-    BlockBackend *blk, QEMUSGList *sg, uint64_t sector_num,
-    DMAIOFunc *io_func, BlockCompletionFunc *cb,
+BlockAIOCB *dma_blk_io(AioContext *ctx,
+    QEMUSGList *sg, uint64_t sector_num,
+    DMAIOFunc *io_func, void *io_func_opaque,
+    BlockCompletionFunc *cb,
     void *opaque, DMADirection dir)
 {
-    DMAAIOCB *dbs = blk_aio_get(&dma_aiocb_info, blk, cb, opaque);
+    DMAAIOCB *dbs = qemu_aio_get(&dma_aiocb_info, NULL, cb, opaque);
 
-    trace_dma_blk_io(dbs, blk, sector_num, (dir == DMA_DIRECTION_TO_DEVICE));
+    trace_dma_blk_io(dbs, io_func_opaque, sector_num, (dir == DMA_DIRECTION_TO_DEVICE));
 
     dbs->acb = NULL;
-    dbs->blk = blk;
     dbs->sg = sg;
+    dbs->ctx = ctx;
     dbs->sector_num = sector_num;
     dbs->sg_cur_index = 0;
     dbs->sg_cur_byte = 0;
     dbs->dir = dir;
     dbs->io_func = io_func;
+    dbs->io_func_opaque = io_func_opaque;
     dbs->bh = NULL;
     qemu_iovec_init(&dbs->iov, sg->nsg);
     dma_blk_cb(dbs, 0);
@@ -215,19 +218,41 @@ BlockAIOCB *dma_blk_io(
 }
 
 
+static
+BlockAIOCB *dma_blk_read_io_func(int64_t sector_num,
+                                 QEMUIOVector *iov, int nb_sectors,
+                                 BlockCompletionFunc *cb, void *cb_opaque,
+                                 void *opaque)
+{
+    BlockBackend *blk = opaque;
+    return blk_aio_readv(blk, sector_num, iov, nb_sectors, cb, cb_opaque);
+}
+
 BlockAIOCB *dma_blk_read(BlockBackend *blk,
                          QEMUSGList *sg, uint64_t sector,
                          void (*cb)(void *opaque, int ret), void *opaque)
 {
-    return dma_blk_io(blk, sg, sector, blk_aio_readv, cb, opaque,
+    return dma_blk_io(blk_get_aio_context(blk),
+                      sg, sector, dma_blk_read_io_func, blk, cb, opaque,
                       DMA_DIRECTION_FROM_DEVICE);
 }
 
+static
+BlockAIOCB *dma_blk_write_io_func(int64_t sector_num,
+                                  QEMUIOVector *iov, int nb_sectors,
+                                  BlockCompletionFunc *cb, void *cb_opaque,
+                                  void *opaque)
+{
+    BlockBackend *blk = opaque;
+    return blk_aio_writev(blk, sector_num, iov, nb_sectors, cb, cb_opaque);
+}
+
 BlockAIOCB *dma_blk_write(BlockBackend *blk,
                           QEMUSGList *sg, uint64_t sector,
                           void (*cb)(void *opaque, int ret), void *opaque)
 {
-    return dma_blk_io(blk, sg, sector, blk_aio_writev, cb, opaque,
+    return dma_blk_io(blk_get_aio_context(blk),
+                      sg, sector, dma_blk_write_io_func, blk, cb, opaque,
                       DMA_DIRECTION_TO_DEVICE);
 }
 
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 41e6a2d..4057924 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -441,13 +441,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
     }
 }
 
-BlockAIOCB *ide_issue_trim(BlockBackend *blk,
+BlockAIOCB *ide_issue_trim(
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
-        BlockCompletionFunc *cb, void *opaque)
+        BlockCompletionFunc *cb, void *cb_opaque, void *opaque)
 {
+    BlockBackend *blk = opaque;
     TrimAIOCB *iocb;
 
-    iocb = blk_aio_get(&trim_aiocb_info, blk, cb, opaque);
+    iocb = blk_aio_get(&trim_aiocb_info, blk, cb, cb_opaque);
     iocb->blk = blk;
     iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);
     iocb->ret = 0;
@@ -869,8 +870,9 @@ static void ide_dma_cb(void *opaque, int ret)
                                            ide_dma_cb, s);
         break;
     case IDE_DMA_TRIM:
-        s->bus->dma->aiocb = dma_blk_io(s->blk, &s->sg, sector_num,
-                                        ide_issue_trim, ide_dma_cb, s,
+        s->bus->dma->aiocb = dma_blk_io(blk_get_aio_context(s->blk),
+                                        &s->sg, sector_num,
+                                        ide_issue_trim, s->blk, ide_dma_cb, s,
                                         DMA_DIRECTION_TO_DEVICE);
         break;
     default:
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index d2c458f..1a3f3e8 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -613,9 +613,9 @@ void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
                         EndTransferFunc *end_transfer_func);
 void ide_transfer_stop(IDEState *s);
 void ide_set_inactive(IDEState *s, bool more);
-BlockAIOCB *ide_issue_trim(BlockBackend *blk,
+BlockAIOCB *ide_issue_trim(
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
-        BlockCompletionFunc *cb, void *opaque);
+        BlockCompletionFunc *cb, void *cb_opaque, void *opaque);
 BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t sector_num,
                                QEMUIOVector *iov, int nb_sectors,
                                BlockCompletionFunc *cb, void *opaque);
diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index 76256eb..9773923 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -232,8 +232,8 @@ static void pmac_dma_trim(BlockBackend *blk,
     s->io_buffer_index += io->len;
     io->len = 0;
 
-    s->bus->dma->aiocb = ide_issue_trim(blk, (offset >> 9), &io->iov,
-                             (bytes >> 9), cb, io);
+    s->bus->dma->aiocb = ide_issue_trim((offset >> 9), &io->iov,
+                             (bytes >> 9), cb, io, blk);
 }
 
 static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index b0fbb9b..ddc4afc 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -197,14 +197,15 @@ void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
 #endif
 
-typedef BlockAIOCB *DMAIOFunc(BlockBackend *blk, int64_t sector_num,
+typedef BlockAIOCB *DMAIOFunc(int64_t sector_num,
                               QEMUIOVector *iov, int nb_sectors,
-                              BlockCompletionFunc *cb, void *opaque);
+                              BlockCompletionFunc *cb, void *cb_opaque,
+                              void *opaque);
 
-BlockAIOCB *dma_blk_io(BlockBackend *blk,
+BlockAIOCB *dma_blk_io(AioContext *ctx,
                        QEMUSGList *sg, uint64_t sector_num,
-                       DMAIOFunc *io_func, BlockCompletionFunc *cb,
-                       void *opaque, DMADirection dir);
+                       DMAIOFunc *io_func, void *io_func_opaque,
+                       BlockCompletionFunc *cb, void *opaque, DMADirection dir);
 BlockAIOCB *dma_blk_read(BlockBackend *blk,
                          QEMUSGList *sg, uint64_t sector,
                          BlockCompletionFunc *cb, void *opaque);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 3/4] scsi-disk: introduce dma_readv and dma_writev
  2016-05-11 11:41 [Qemu-devel] [PATCH 0/4] scsi-block: receive the right SCSI status on reads and writes Paolo Bonzini
  2016-05-11 11:41 ` [Qemu-devel] [PATCH 1/4] scsi-disk: introduce a common base class Paolo Bonzini
  2016-05-11 11:41 ` [Qemu-devel] [PATCH 2/4] dma-helpers: change BlockBackend to opaque value in DMAIOFunc Paolo Bonzini
@ 2016-05-11 11:41 ` Paolo Bonzini
  2016-05-19 15:11   ` Eric Blake
  2016-05-11 11:41 ` [Qemu-devel] [PATCH 4/4] scsi-block: always use SG_IO Paolo Bonzini
  3 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2016-05-11 11:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: hare, vrozenfe, famz

These are replacements for blk_aio_readv and blk_aio_writev that allow
customization of the data path.  They reuse the DMA helpers' DMAIOFunc
callback type, so that the same function can be used in either the
QEMUSGList or the bounce-buffered case.

This customization will be needed in the next patch to do zero-copy
SG_IO on scsi-block.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/scsi-disk.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 56 insertions(+), 8 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 45b604f..3ecfc46 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -55,7 +55,21 @@ do { printf("scsi-disk: " fmt , ## __VA_ARGS__); } while (0)
 
 #define TYPE_SCSI_DISK_BASE         "scsi-disk-base"
 
+#define SCSI_DISK_BASE(obj) \
+     OBJECT_CHECK(SCSIDiskState, (obj), TYPE_SCSI_DISK_BASE)
+#define SCSI_DISK_BASE_CLASS(klass) \
+     OBJECT_CLASS_CHECK(SCSIDiskClass, (klass), TYPE_SCSI_DISK_BASE)
+#define SCSI_DISK_BASE_GET_CLASS(obj) \
+     OBJECT_GET_CLASS(SCSIDiskClass, (obj), TYPE_SCSI_DISK_BASE)
+
 typedef struct SCSIDiskState SCSIDiskState;
+typedef struct SCSIDiskClass SCSIDiskClass;
+
+typedef struct SCSIDiskClass {
+    SCSIDeviceClass parent_class;
+    DMAIOFunc       *dma_readv;
+    DMAIOFunc       *dma_writev;
+} SCSIDiskClass;
 
 typedef struct SCSIDiskReq {
     SCSIRequest req;
@@ -318,6 +332,7 @@ done:
 static void scsi_do_read(SCSIDiskReq *r, int ret)
 {
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+    SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
     uint32_t n;
 
     assert (r->req.aiocb == NULL);
@@ -339,14 +354,16 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
     if (r->req.sg) {
         dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_READ);
         r->req.resid -= r->req.sg->size;
-        r->req.aiocb = dma_blk_read(s->qdev.conf.blk, r->req.sg, r->sector,
-                                    scsi_dma_complete, r);
+        r->req.aiocb = dma_blk_io(blk_get_aio_context(s->qdev.conf.blk),
+                                  r->req.sg, r->sector,
+                                  sdc->dma_readv, r, scsi_dma_complete, r,
+                                  DMA_DIRECTION_FROM_DEVICE);
     } else {
         n = scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
         block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
                          n * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-        r->req.aiocb = blk_aio_readv(s->qdev.conf.blk, r->sector, &r->qiov, n,
-                                     scsi_read_complete, r);
+        r->req.aiocb = sdc->dma_readv(r->sector, &r->qiov, n,
+                                      scsi_read_complete, r, r);
     }
 
 done:
@@ -506,6 +523,7 @@ static void scsi_write_data(SCSIRequest *req)
 {
     SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+    SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
     uint32_t n;
 
     /* No data transfer may already be in progress */
@@ -543,14 +561,16 @@ static void scsi_write_data(SCSIRequest *req)
     if (r->req.sg) {
         dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_WRITE);
         r->req.resid -= r->req.sg->size;
-        r->req.aiocb = dma_blk_write(s->qdev.conf.blk, r->req.sg, r->sector,
-                                     scsi_dma_complete, r);
+        r->req.aiocb = dma_blk_io(blk_get_aio_context(s->qdev.conf.blk),
+                                  r->req.sg, r->sector,
+                                  sdc->dma_writev, r, scsi_dma_complete, r,
+                                  DMA_DIRECTION_TO_DEVICE);
     } else {
         n = r->qiov.size / 512;
         block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
                          n * BDRV_SECTOR_SIZE, BLOCK_ACCT_WRITE);
-        r->req.aiocb = blk_aio_writev(s->qdev.conf.blk, r->sector, &r->qiov, n,
-                                      scsi_write_complete, r);
+        r->req.aiocb = sdc->dma_writev(r->sector, &r->qiov, n,
+                                       scsi_write_complete, r, r);
     }
 }
 
@@ -2657,12 +2677,39 @@ static int scsi_block_parse_cdb(SCSIDevice *d, SCSICommand *cmd,
 
 #endif
 
+static
+BlockAIOCB *scsi_dma_readv(int64_t sector_num,
+                           QEMUIOVector *iov, int nb_sectors,
+                           BlockCompletionFunc *cb, void *cb_opaque,
+                           void *opaque)
+{
+    SCSIDiskReq *r = opaque;
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+    return blk_aio_readv(s->qdev.conf.blk, sector_num, iov, nb_sectors,
+                         cb, cb_opaque);
+}
+
+static
+BlockAIOCB *scsi_dma_writev(int64_t sector_num,
+                           QEMUIOVector *iov, int nb_sectors,
+                           BlockCompletionFunc *cb, void *cb_opaque,
+                           void *opaque)
+{
+    SCSIDiskReq *r = opaque;
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+    return blk_aio_writev(s->qdev.conf.blk, sector_num, iov, nb_sectors,
+                          cb, cb_opaque);
+}
+
 static void scsi_disk_base_class_initfn(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass);
 
     dc->fw_name = "disk";
     dc->reset = scsi_disk_reset;
+    sdc->dma_readv = scsi_dma_readv;
+    sdc->dma_writev = scsi_dma_writev;
 }
 
 static const TypeInfo scsi_disk_base_info = {
@@ -2670,6 +2717,7 @@ static const TypeInfo scsi_disk_base_info = {
     .parent        = TYPE_SCSI_DEVICE,
     .class_init    = scsi_disk_base_class_initfn,
     .instance_size = sizeof(SCSIDiskState),
+    .class_size    = sizeof(SCSIDiskClass),
 };
 
 #define DEFINE_SCSI_DISK_PROPERTIES()                                \
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 4/4] scsi-block: always use SG_IO
  2016-05-11 11:41 [Qemu-devel] [PATCH 0/4] scsi-block: receive the right SCSI status on reads and writes Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-05-11 11:41 ` [Qemu-devel] [PATCH 3/4] scsi-disk: introduce dma_readv and dma_writev Paolo Bonzini
@ 2016-05-11 11:41 ` Paolo Bonzini
  3 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2016-05-11 11:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: hare, vrozenfe, famz

Using pread/pwrite or io_submit has the advantage of eliminating the
bounce buffer, but drops the SCSI status.  This keeps the guest from
seeing unit attention codes, as well as statuses such as RESERVATION
CONFLICT.  Because we know scsi-block operates on an SBC device we can
still use the DMA helpers with SG_IO; just remember to patch the CDBs
if the transfer is split into multiple segments.

This means that scsi-block will always use the thread-pool unfortunately,
instead of respecting aio=native.

Reported-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        This version of the patch has an overflow if, for example, a
        READ(12) command straddles the LBA 2^32 boundary and is split after
        the LBA 2^32 boundary.  In this case, the second part needs to use
        a READ(16) command.  I'll fix this before posting the final version.

 hw/scsi/scsi-disk.c | 129 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 114 insertions(+), 15 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 3ecfc46..9374a1a 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2599,6 +2599,97 @@ static void scsi_block_realize(SCSIDevice *dev, Error **errp)
     scsi_generic_read_device_identification(&s->qdev);
 }
 
+typedef struct SCSIBlockReq {
+    SCSIDiskReq req;
+    sg_io_hdr_t io_header;
+    uint8_t cdb[SCSI_CMD_BUF_SIZE];
+} SCSIBlockReq;
+
+static BlockAIOCB *scsi_block_do_sgio(SCSIBlockReq *req,
+                                      int64_t sector_num, QEMUIOVector *iov,
+                                      int nb_sectors, int direction,
+                                      BlockCompletionFunc *cb, void *opaque)
+{
+    sg_io_hdr_t *io_header = &req->io_header;
+    SCSIDiskReq *r = &req->req;
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+    int nb_logical_blocks;
+    uint32_t word0;
+    uint64_t lba;
+    BlockAIOCB *aiocb;
+
+    io_header->interface_id = 'S';
+
+    /* The data transfer comes from the QEMUIOVector.  */
+    io_header->dxfer_direction = direction;
+    io_header->dxfer_len = nb_sectors * BDRV_SECTOR_SIZE;
+    io_header->dxferp = (void *)iov->iov;
+    io_header->iovec_count = iov->niov;
+    assert(io_header->iovec_count == iov->niov); /* no overflow! */
+
+    /* The CDB needs to have the LBA and length patched in, in case
+     * DMA helpers split the transfer in multiple segments.
+     */
+    io_header->cmdp = req->cdb;
+    io_header->cmd_len = r->req.cmd.len;
+    lba = sector_num / (s->qdev.blocksize / BDRV_SECTOR_SIZE);
+    nb_logical_blocks = io_header->dxfer_len / s->qdev.blocksize;
+    switch(req->cdb[0] >> 5) {
+    case 0:
+        /* 6-byte CDB */
+        word0 = deposit32(ldl_be_p(&req->cdb[0]), 0, 21, lba);
+        stl_be_p(&req->cdb[0], lba | word0);
+        req->cdb[4] = nb_logical_blocks;
+    case 1:
+    case 2:
+        /* 10-byte CDB */
+        stl_be_p(&req->cdb[2], lba);
+        stw_be_p(&req->cdb[7], nb_logical_blocks);
+        break;
+    case 4:
+        /* 16-byte CDB */
+        stq_be_p(&req->cdb[2], lba);
+        stl_be_p(&req->cdb[10], nb_logical_blocks);
+        break;
+    case 5:
+        /* 12-byte CDB */
+        stl_be_p(&req->cdb[2], lba);
+        stl_be_p(&req->cdb[6], nb_logical_blocks);
+        break;
+    }
+
+    /* The rest is as in scsi-generic.c.  */
+    io_header->mx_sb_len = sizeof(r->req.sense);
+    io_header->sbp = r->req.sense;
+    io_header->timeout = UINT_MAX;
+    io_header->usr_ptr = r;
+    io_header->flags |= SG_FLAG_DIRECT_IO;
+
+    aiocb = blk_aio_ioctl(s->qdev.conf.blk, SG_IO, io_header, cb, opaque);
+    assert(aiocb != NULL);
+    return aiocb;
+}
+
+static BlockAIOCB *scsi_block_dma_readv(int64_t sector_num,
+                                        QEMUIOVector *iov, int nb_sectors,
+                                        BlockCompletionFunc *cb, void *cb_opaque,
+                                        void *opaque)
+{
+    SCSIBlockReq *r = opaque;
+    return scsi_block_do_sgio(r, sector_num, iov, nb_sectors,
+                              SG_DXFER_FROM_DEV, cb, cb_opaque);
+}
+
+static BlockAIOCB *scsi_block_dma_writev(int64_t sector_num,
+                                         QEMUIOVector *iov, int nb_sectors,
+                                         BlockCompletionFunc *cb, void *cb_opaque,
+                                         void *opaque)
+{
+    SCSIBlockReq *r = opaque;
+    return scsi_block_do_sgio(r, sector_num, iov, nb_sectors,
+                              SG_DXFER_TO_DEV, cb, cb_opaque);
+}
+
 static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf)
 {
     switch (buf[0]) {
@@ -2616,21 +2707,8 @@ static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf)
     case WRITE_VERIFY_10:
     case WRITE_VERIFY_12:
     case WRITE_VERIFY_16:
-        /* If we are not using O_DIRECT, we might read stale data from the
-         * host cache if writes were made using other commands than these
-         * ones (such as WRITE SAME or EXTENDED COPY, etc.).  So, without
-         * O_DIRECT everything must go through SG_IO.
-         */
-        if (!(blk_get_flags(s->qdev.conf.blk) & BDRV_O_NOCACHE)) {
-            break;
-        }
-
-        /* MMC writing cannot be done via pread/pwrite, because it sometimes
+        /* MMC writing cannot be done via DMA helpers, because it sometimes
          * involves writing beyond the maximum LBA or to negative LBA (lead-in).
-         * And once you do these writes, reading from the block device is
-         * unreliable, too.  It is even possible that reads deliver random data
-         * from the host page cache (this is probably a Linux bug).
-         *
          * We might use scsi_disk_dma_reqops as long as no writing commands are
          * seen, but performance usually isn't paramount on optical media.  So,
          * just make scsi-block operate the same as scsi-generic for them.
@@ -2648,6 +2726,24 @@ static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf)
 }
 
 
+static int32_t scsi_block_dma_command(SCSIRequest *req, uint8_t *buf)
+{
+    SCSIBlockReq *r = (SCSIBlockReq *)req;
+    memcpy(r->cdb, req->cmd.buf, sizeof(r->cdb));
+    return scsi_disk_dma_command(req, buf);
+}
+
+static const SCSIReqOps scsi_block_dma_reqops = {
+    .size         = sizeof(SCSIBlockReq),
+    .free_req     = scsi_free_request,
+    .send_command = scsi_block_dma_command,
+    .read_data    = scsi_read_data,
+    .write_data   = scsi_write_data,
+    .get_buf      = scsi_get_buf,
+    .load_request = scsi_disk_load_request,
+    .save_request = scsi_disk_save_request,
+};
+
 static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag,
                                            uint32_t lun, uint8_t *buf,
                                            void *hba_private)
@@ -2658,7 +2754,7 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag,
         return scsi_req_alloc(&scsi_generic_req_ops, &s->qdev, tag, lun,
                               hba_private);
     } else {
-        return scsi_req_alloc(&scsi_disk_dma_reqops, &s->qdev, tag, lun,
+        return scsi_req_alloc(&scsi_block_dma_reqops, &s->qdev, tag, lun,
                               hba_private);
     }
 }
@@ -2817,10 +2913,13 @@ static void scsi_block_class_initfn(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);
+    SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass);
 
     sc->realize      = scsi_block_realize;
     sc->alloc_req    = scsi_block_new_request;
     sc->parse_cdb    = scsi_block_parse_cdb;
+    sdc->dma_readv   = scsi_block_dma_readv;
+    sdc->dma_writev  = scsi_block_dma_writev;
     dc->desc = "SCSI block device passthrough";
     dc->props = scsi_block_properties;
     dc->vmsd  = &vmstate_scsi_disk_state;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] dma-helpers: change BlockBackend to opaque value in DMAIOFunc
  2016-05-11 11:41 ` [Qemu-devel] [PATCH 2/4] dma-helpers: change BlockBackend to opaque value in DMAIOFunc Paolo Bonzini
@ 2016-05-17  2:40   ` Fam Zheng
  0 siblings, 0 replies; 8+ messages in thread
From: Fam Zheng @ 2016-05-17  2:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, hare, vrozenfe

On Wed, 05/11 13:41, Paolo Bonzini wrote:
> diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
> index b0fbb9b..ddc4afc 100644
> --- a/include/sysemu/dma.h
> +++ b/include/sysemu/dma.h
> @@ -197,14 +197,15 @@ void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len);
>  void qemu_sglist_destroy(QEMUSGList *qsg);
>  #endif
>  
> -typedef BlockAIOCB *DMAIOFunc(BlockBackend *blk, int64_t sector_num,
> +typedef BlockAIOCB *DMAIOFunc(int64_t sector_num,
>                                QEMUIOVector *iov, int nb_sectors,
> -                              BlockCompletionFunc *cb, void *opaque);
> +                              BlockCompletionFunc *cb, void *cb_opaque,
> +                              void *opaque);
>  

The patch itself looks good, but I'm wondering if it's time to introduce:

    typedef struct {
        BlockCompletionFunc *cb;
        void *opaque;
    } BlockCompletion;

for more readability in such cases.

Fam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] scsi-disk: introduce dma_readv and dma_writev
  2016-05-11 11:41 ` [Qemu-devel] [PATCH 3/4] scsi-disk: introduce dma_readv and dma_writev Paolo Bonzini
@ 2016-05-19 15:11   ` Eric Blake
  2016-05-19 15:13     ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Blake @ 2016-05-19 15:11 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: famz, hare, vrozenfe

[-- Attachment #1: Type: text/plain, Size: 2063 bytes --]

On 05/11/2016 05:41 AM, Paolo Bonzini wrote:
> These are replacements for blk_aio_readv and blk_aio_writev that allow
> customization of the data path.  They reuse the DMA helpers' DMAIOFunc
> callback type, so that the same function can be used in either the
> QEMUSGList or the bounce-buffered case.
> 
> This customization will be needed in the next patch to do zero-copy
> SG_IO on scsi-block.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/scsi/scsi-disk.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 56 insertions(+), 8 deletions(-)
> 

> @@ -339,14 +354,16 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
>      if (r->req.sg) {
>          dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_READ);
>          r->req.resid -= r->req.sg->size;
> -        r->req.aiocb = dma_blk_read(s->qdev.conf.blk, r->req.sg, r->sector,
> -                                    scsi_dma_complete, r);
> +        r->req.aiocb = dma_blk_io(blk_get_aio_context(s->qdev.conf.blk),
> +                                  r->req.sg, r->sector,
> +                                  sdc->dma_readv, r, scsi_dma_complete, r,
> +                                  DMA_DIRECTION_FROM_DEVICE);

Is it worth considering byte-based rather than sector-based interfaces,
as part of this series?

> +static
> +BlockAIOCB *scsi_dma_readv(int64_t sector_num,
> +                           QEMUIOVector *iov, int nb_sectors,
> +                           BlockCompletionFunc *cb, void *cb_opaque,
> +                           void *opaque)
> +{
> +    SCSIDiskReq *r = opaque;
> +    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +    return blk_aio_readv(s->qdev.conf.blk, sector_num, iov, nb_sectors,
> +                         cb, cb_opaque);
> +}

Especially since commit 7b1deac killed blk_aio_readv() in favor of
byte-based blk_aio_preadv().

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] scsi-disk: introduce dma_readv and dma_writev
  2016-05-19 15:11   ` Eric Blake
@ 2016-05-19 15:13     ` Paolo Bonzini
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2016-05-19 15:13 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: famz, hare, vrozenfe

[-- Attachment #1: Type: text/plain, Size: 1281 bytes --]



On 19/05/2016 17:11, Eric Blake wrote:
>> > @@ -339,14 +354,16 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
>> >      if (r->req.sg) {
>> >          dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_READ);
>> >          r->req.resid -= r->req.sg->size;
>> > -        r->req.aiocb = dma_blk_read(s->qdev.conf.blk, r->req.sg, r->sector,
>> > -                                    scsi_dma_complete, r);
>> > +        r->req.aiocb = dma_blk_io(blk_get_aio_context(s->qdev.conf.blk),
>> > +                                  r->req.sg, r->sector,
>> > +                                  sdc->dma_readv, r, scsi_dma_complete, r,
>> > +                                  DMA_DIRECTION_FROM_DEVICE);
> 
> Is it worth considering byte-based rather than sector-based interfaces,
> as part of this series?

I think it's a separate change.  I'm okay with doing the switch to
byte-based interfaces first.  Another related change would be to add a
"mask" argument, so that I could generalize this:

    if (dbs->iov.size & ~BDRV_SECTOR_MASK) {
        qemu_iovec_discard_back(&dbs->iov,
                                dbs->iov.size & ~BDRV_SECTOR_MASK);
    }

For this series the mask needs to be the logical block size.

Thanks,

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-05-19 15:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-11 11:41 [Qemu-devel] [PATCH 0/4] scsi-block: receive the right SCSI status on reads and writes Paolo Bonzini
2016-05-11 11:41 ` [Qemu-devel] [PATCH 1/4] scsi-disk: introduce a common base class Paolo Bonzini
2016-05-11 11:41 ` [Qemu-devel] [PATCH 2/4] dma-helpers: change BlockBackend to opaque value in DMAIOFunc Paolo Bonzini
2016-05-17  2:40   ` Fam Zheng
2016-05-11 11:41 ` [Qemu-devel] [PATCH 3/4] scsi-disk: introduce dma_readv and dma_writev Paolo Bonzini
2016-05-19 15:11   ` Eric Blake
2016-05-19 15:13     ` Paolo Bonzini
2016-05-11 11:41 ` [Qemu-devel] [PATCH 4/4] scsi-block: always use SG_IO Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.